Reflection #14 – [04/17] – [Hamza Manzoor]

[1] Lelkes, Y., Sood, G., & Iyengar, S. (2017). The hostile audience: The Effect of Access to Broadband Internet on Partisan Affect. American Journal of Political Science, 61(1), 5-20.

In this paper, Lelkes et al. used exogenous variation in access to broadband internet stemming from differences in right-of-way laws, which significantly increase the access to content because it affects the cost of building internet infrastructure and thus the price and availability of broadband access. They used this variation to identify the impact of broadband internet access on partisan polarization. The data was collected from various sources. The data on right-of-way laws come from an index of these laws. The data on broadband access is from the Federal Communication Commission (FCC). For data on partisan affect, they use the 2004 and 2008 National Annenberg Election Studies (NAES). For media data, they use comScore and they use Economic Research Service’s terrain topology to classify the terrain into 21 categories. The authors find that the access to the broadband internet increases partisan hostility and boosts partisans’ consumption of partisan media. The results of their study show that if all the states had adopted the least restrictive right-of-way regulations, partisan animus would have been roughly 2 percentage more.

I really liked reading the paper and their analysis was very thorough. While reading the paper, I had various doubts regarding the analysis but they kept on clearing those doubts as I kept on reading. For example, when they compared the news media consumption using broadband and dial-up connection, they did not discuss the number of users in each group of 50,000 users but later they use CEM that reduces the imbalance on set covariates. CEM (Coarsened Exact Matching) seems like a good technique for the imbalance in the dataset. CEM is a Monotonic Imbalance Bounding (MIB) matching method which means that the balance between the treated and control groups is chosen by the user based on forecasts rather than discovered through the usual laborious process of checking after the fact and repeatedly re-estimating.

Anyway, the authors still made many assumptions in the paper. People visiting a left-leaning website are considered left-leaning and vice versa. Can we naively assume that? I have been on Fox news a few times and if I was under study at that period of time, I would be considered a conservative (which I am not). Secondly, the 50,000 panelists knew that they were under study and their browsing history was recorded. This might have induced bias because people who know that their activity is monitored won’t browse as they normally do. Another interesting thing in the paper was that only 3% of Democrats say that they regularly encounter liberal media whereas, 8% of Republicans say the same. In case of Republicans, it is more than double. What could be the reason? Even in dial-up, the difference is significant between groups and secondly, does that mean that Republicans are more likely to polarize? This difference reduces in case of broadband (19% and 20%).

Another interesting analysis was performed on internet propagation with respect to the terrain. Though the authors did not claim any such thing, they separately say three things:

  1. There are fewer broadband providers where the terrain is steeper
  2. More broadband providers increase the use of broadband internet
  3. People with broadband access consume a lot more partisan media especially from sources congenial to their partisanship

Considering these three claims, it can be inferred that the places with less steep terrain have more polarization. Can we claim that? San Francisco, Seattle, and Pittsburgh are considered the hilliest places in the US. Does that mean that people in New York are more polarized than the people in these three places? Also, do more people in New York City have access to broadband than the people in Seattle?

Also, people with more money will have access to the better internet. Therefore, it can also be interred that rich people are more liable to polarize than the poor. Whereas, it is not generally true (I assume). Education is directly proportional to income in most cases and hence more educated will have access to the better internet because they can afford it. I can indirectly infer that education results in polarization just because educated people can afford better internet. Now, this claim seems funny, and hence I think their study too. I think that the assumption that “anyone who visits the left-leaning websites is left-leaning” is not the correct way to go about this study.

Read More

Reflection #14 – [04-17] – [Jiameng Pu]

  1. Lelkes, Y., Sood, G., & Iyengar, S. (2017). The hostile audience: The Effect of Access to Broadband Internet on Partisan Affect. American Journal of Political Science, 61(1), 5-20


Over the past fifty years, partisan animus has been increasing, the reach of partisan information sources has been expanding in the meantime. With increasing polarization on the Internet, this paper talks about identifying the impact of access to broadband Internet on affective polarization by exploiting differences in broadband availability brought about by variation in state right-of-way regulations (ROW). Lelkes et al. measured an increase in partisan hostility by collecting data from multiple resources, e.g., Federal Communication Commission (the data on broadband access), National Annenberg Election Studies(the data on the partisan effect). The author concludes that access to broadband Internet increases partisan hostility but is stable across levels of political interest.


The topic itself is intriguing, and I have had a strong feeling that people become more hostility not only in the Partisan Affect but on diverse topics in the community. People are passionate about educating people who have different opinions and values. One interesting discussion in this paper is to explore the relationship between the number of subscribers and the number of providers at the census tract level or at the zip code level. I liked the process of find convincing proxy for the measure needed in the research, which could potentially introduce more interesting findings. Although many county-level factors and indicators are examined, e.g., unemployment rate, median age, the male-to-female ratio, percent black, another view I come up with is that whether new media apps/websites people mainly use in their daily life have a significant effect on partisan polarization, because it’s intuitive that people’s viewpoints are easily influenced by specific subscribed news feed and people also tend to choose news media share the same viewpoints with them.

Besides partisan affect, this paper reminds me of people’s hostility in different communities and fields, because I feel this is the trend in today’s virtual social space regardless of topics and platforms. Like I said before, people nowadays are passionate about educating people who have different opinions and values with them, which induces unprecedented malicious discussion on the Internet and makes the access to broadband internet itself become a relatively narrow viewpoint to explore.

Read More

Reflection #14 – [04-17] – [Patrick Sullivan]

The Hostile Audience: The Effect of Access to Broadband Internet on Partisan Affect by Lelkes, Sood, and Iyengar.

The data behind this research covers from 2004 to 2008. But there are recent changes stemming from the government and ISP laws could be coming due to the recent attacks on Net Neutrality. These changes could quickly overshadow or render useless many of the findings and conclusions made in this paper. Measuring polarization due to Internet access becomes much more convoluted when considering that Internet traffic can manipulated and treated so differently depending on the household as to constitute censorship. And there would be no way to defend the claim that ISPs can be unbiased providers of news, information, and communications.

These kinds of issues makes me conflicted over whether governments, laws, and societies should change rapidly or slowly. A ‘fast’ government would have such a short-term outlook on solving issues that progress would interfere with ongoing research while also making previous research inapplicable. But a ‘slow’ government would restrict the application of research findings be laborious enough to not be reactive or effective. Perhaps it is research methodology that should change instead, allowing studies that are faster and less thorough to be admissible as strong conclusive evidence. This could change the public outlook on reported scientific evidence as a quicker and more useful source of information, while losing a (hopefully) small amount of trustworthiness.

I also doubt that the cost of building Internet infrastructure be the single factor to determine broadband accessibility.  There are many other factors that can interfere with this assumption. There are many known instances where broadband access is only provided by a single ISP in an area, allowing them complete control over Internet pricing. This makes all households on either extreme of financial stability be misrepresented or wrongly categorized in studies that assume Internet infrastructure costs are driving factor of Internet access. I am glad they address this in future sections in the publication, talking about the other factors behind broadband demand and references Larcinese and Miner’s previous work when determining a proxy for broadband uptake.

Read More

Reflection #14 – [4/17] – Aparna Gupta

1.Lelkes, Y., Sood, G., & Iyengar, S. (2017). The hostile audience: The Effect of Access to Broadband Internet on Partisan Affect. American Journal of Political Science61(1), 5-20.

This article talks about the impact of broadband access on the polarization by exploiting the differences in broadband availability brought about by variation in state right-of-way (ROW). To measure an increase in partisan hostility the authors merged state-level regulation data with county-level broadband penetration data and a large-N sample of survey data from 2004 to 2008. The data is acquired from multiple sets of datasets like data on right-of-way laws come from previous research work by Beyer and Kende, the data on broadband access is from the Federal Communication Commission (FCC), the data on the partisan effect is collected from National Annenberg Election Studies (2004 and 2008) and, the media data is collected from comScore.

The study was interesting however I have a few concerns like, why have the authors considered only broadband penetration as a measure to analyze the partisan Affect or polarization in an era where people always have internet connection on their mobile phones or tablets? and have they considered the geographic locations where getting a high-speed internet connection is still an issue as stated in this article ( Does this mean people in these areas are less polarized?

The author also claims that access to broadband Internet boosts partisan’s consumption of partisan media. I wonder if Isn’t this quite obvious? since a person will tend to consume news he/she is more inclined towards. There is a plethora of free information available on the Internet ready to be consumed by people of any age group, any political inclination, and  any region. Can there be a scenario where being exposed to the fire hose of information (through broadband) influenced people to change their polarization?





Read More

Reflection #14 – [04/17] – [Vartan Kesiz-Abnousi]

[1] Lelkes, Y., Sood, G., & Iyengar, S. (2017). The hostile audience: The effect of access to broadband internet on partisan affect. American Journal of Political Science61(1), 5-20.

1. Summary

The main purpose of this paper is to identify the causal impact of broadband access on affective polarization by exploiting differences in broadband availability brought about by variation in state right of-way regulations (ROW), which significantly affect the cost of building Internet infrastructure and thus the price and availability of broadband access. The data on right-of-way laws come from an index of these laws. The data on broadband access are from the Federal Communication Commission (FCC). For data on partisan affect, we use the 2004 and 2008 National Annenberg Election Studies (NAES). For media data, they use comScore. Their results suggest that had all states adopted the least restrictive right-of-way regulations observed in the data, partisan animus would have been roughly 2 percentage points higher. Finally, they demonstrate that an alternative set of instruments for broadband availability (surface topography) yields very similar results.

2. Reflections

The authors start the paper by trying to convince the readers that polarization, partisanship is increasing. For instance, they mention that partisan prejudice exceeds implicit racial prejudice [2] (Iyengar and Westwood 2014). I skimmed that paper and the data is publicly available on Dataverse, it’s cross-sectional. There is no actual proof of that statement, and it was not necessary. As the authors stress: “In contemporary America, the strength of these norms has made virtually any discussion of racial differences a taboo subject to the point that citizens suppress their true feelings” [2].

My point is, this entire “increase in polarization” depends on:

  1. When do we start the “counter” (the clock)?
  2. Establishing causal relationship between X factor(s) an “polarization” is not easy. Especially when are trying to isolate one factor: broadband internet.

In addition, the phrase “media consumption is strongly elastic, increasing sharply with better access”, should have probably stressed that it is “elastic with respect to internet speed”, which is what the authors meant. In general, people tend to associate elasticity with price, and I doubt media consumption is not “strongly elastic” with respect to price.

2. 1. Assumptions

“[..] access to broadband primarily increases the size of the pie, without having much impact on the ratio of the individual slices.  Assuming patterns of consumption remain roughly the same, any increase in consumption necessarily means greater exposure to imbalanced political information.

This is a bold assumption. I am not sure how we can assume something like that and there are no citations whatsoever to back this up. To put thing into perspective, this means “your grandfather’s generation media consumption pattern is roughly the same as yours, Millennials”

Another assumption is the following:

“Right of-way regulations (ROW), which significantly affect the cost of building Internet infrastructure and thus the price and availability of broadband access.”

While this make “sense”, there are huge theoretical leaps here. Here is why ROW (A) implies cost of infrastructure (B) but does not imply price and availability (C): cross-subsidization. Corporations can “afford” reducing the prices in specific areas to increase their market share even if the initial costs are high. This is quite common in the telecommunication and broadband services.

2.2. Technical issues

While the authors test the “strength” of the instrument in a somewhat crude way, they do not test anything else which is problematic. Specifically, I didn’t see any test anywhere regarding the validity of the instrument. There is a citation provided for a dissertation thesis that also has no test. For future reference, this is a framework on how to go forward when you decide to utilize IV/2SLS methodologies based on the Godfrey-Hutton Procedure:

  1. Weak Instrument Test (or strength of instrument)
    1. One way is to implement Godfrey’s two step method and get “Shea’s Partial R square”.
  2. Over-identification Test (validity test/instrument exogeneity)
    1. Sargan’s Overidentification Test (sometimes called J test)
  3. If pass step 1 and 2 proceed with a Hausman Test
    1. This confirms the existence of endogeneity.


  1. Are we now more polarized as a country than during the Vietnam War? If the answer no, what does this mean? I doubt there was broadband internet back then.
  2. Two assumptions that I mention in the main text.
    1. Patterns of consumption remain the same?
    2. The increase in costs does not necessarily affect the price of broadband internet.
  3. Technical issues, lack of a validity test which might yield biased results.

Read More

Reflection #14 – [04/17] – [Nuo Ma]

1.Lelkes, Y., Sood, G., & Iyengar, S. (2017). The hostile audience: The effect of access to broadband internet on partisan affect. American Journal of Political Science61(1), 5-20.

In this article, the author identifies the impact of access to broadband Internet on affective polarization. They exploited differences in broadband availability brought about by variation in state right-of-way regulations (ROW), which significantly boost access to content. The author concludes that access to broadband Internet increases partisan hostility and is stable across levels of political interest. The author also finds that access to broadband Internet boosts partisans’ consumption of partisan media.

The author identified the impact of broadband access on affective polarization by exploiting differences in broadband availability brought about by variation in state right-of-way regulations (ROW), which significantly affect the cost of building Internet infrastructure and thus the price and availability of broadband access. Assuming broadband access availability is consistent with the number of service provider, and regression model was used to prove this is related to ROW score. Other major causes like terrain and weather were also briefly discussed. I liked the methodology in this part, I’m also surprised to see the amount of dial-up connections showed in the study. Maybe also consider cellular data consumption as a part of future study, which can be measured by the average sale price of smartphones in an area. And for your group project, I can’t exactly remember how did you acquire internet speed data?

This also reminded me the similar topic of internet neutrality, which you pay to get faster access to certain websites. Since internet speed really has an impact on what kind of website you visit / app you use, it’s so painful to think that we can be so easily manipulated. Or companies like facebook pay ISP to ensure user has higher traffic priority to their website, but our personal data were sold in some way to pay for this.?


Read More

Reflection #13 – [04/10] – [Meghendra Singh]

  1. Hu, Nan, Ling Liu, and Jie Jennifer Zhang. “Do online reviews affect product sales? The role of reviewer characteristics and temporal effects.” Information Technology and Management9.3 (2008): 201-214.

The paper discusses an interesting study that tries to link customer reviews with product sales on, while taking into account the temporal nature of customer reviews. The authors present evidence for the fact that the market understands the difference between favorable and unfavorable news (reviews), therefore bad news (review) would lead to lower sales and vice versa. The authors also find that consumers pay attention to reviewer characteristics and like reputation and exposure. Hence, the hypothesis that higher quality reviewers will drive the sales of a product on e-commerce avenues. The authors use Wilcoxon Z-statistic and multiple regression to support their findings.

First, it would be interesting to see if a review coming from a reviewer who has actually purchased the product (a “verified purchase”) has more impact on the product sales than one coming from someone who hasn’t purchased the product. This is one of the things I look for when using Another aspect is that products like Books, DVDs and Videos are neither consumables nor necessities. What I mean is that, people can have very particular tastes when it comes to what they read and watch. For example, Alice might be a big fan of Sci-Fi movies while Bob might like Drama more. In my opinion the sales for these products would depend more on what the distribution of these “genre-preferences” are like in the market (i.e. people who have the time and money to relish these products). It would be interesting to re-do this study with a greater variety of product categories. I feel that reviews would play a much bigger role when it comes to the sales of products like, consumables (for e.g., groceries, cosmetics, food) and necessities (for e.g., bags, umbrellas, electronics) because almost everyone “needs” these products and trusted online reviews would act as a signal for the quality of these products. It would be interesting to verify this intuition.

Second, given the authors mention bounded rationality and opportunism, I feel that when buying a product on, it is very unlikely that I would look for the “reputation/quality” of the reviewer before making a purchase. Again, what would matter the most for me is if the review is coming from an actual customer (a verified purchase). Additionally, the amount of time I would spend researching a product and digging into it’s reviews is directly proportional to the cost of the product. Also, the availability of discounts, free shipping, etc. can greatly bias my purchase decisions. I am not sure whether classifying reviewers as high/low quality and products as high/low-coverage based on the median of the sample is a good idea. What happens to those who lie on the median? Why choose the median? Why not the mean? Moreover, it would be interesting to see the distribution of sales-rank over the three product categories. Do these distributions follow a power law?

In summary, I enjoyed reading the paper and feel that it was very novel for 2008 and as the authors mention, the work can be extended in various ways now that we have more data, computing power and analysis techniques.

Read More

Reflection #13 – [04-10] – [Jiameng Pu]

[1] Pryzant, Reid, Young-joo Chung and Dan Jurafsky. “Predicting Sales from the Language of Product Descriptions.” (2017).
[2] Hu, N., Liu, L. & Zhang, J.J. “Do online reviews affect product sales? The role of reviewer characteristics and temporal effects”. Inf Technol Manage (2008) 9: 201.


The first paper posits that textual product descriptions are also important determinants of consumer choice. They mine more than 90,000 product descriptions on the Japanese e-commerce marketplace Rakuten and propose a novel neural network architecture that leverages an adversarial objective to control for confounding factors, and attentional scores over its input to automatically elicit textual features as a domain-specific lexicon. They show that how textual features and word narratives can predict the sales of each product. However, the second paper focuses on online product reviews provided by consumers, such as reviewer quality, reviewer exposure, product coverage, and temporal effects.


I really enjoy the first paper, since it’s based on neural network architecture and I’m a neural-nets person, which means I’d like to try many research topics on neural nets and feel neural nets are like black box but also like legos: researchers can feel free both invent some creative components and build parts together according to the need of your tasks. Plus, product description + neural nets is an interesting direction. One thing I’ve never expected is that they combined feature selection task with prediction task inside the proposed neural nets, which I feel great because people always use neural nets to do a lot of similar things. In the experiment, some of classmates mentioned health products do not have any brand loyalty, which I don’t think is an issue. If you are an experienced patient, you would know brand loyalty always exists in every category… I would suggest another two things for this paper: 1. give more intuition to the design of neural network architecture due to its black-box property; 2. I’m curious about whether and how the technique can be apply to other different languages besides Japanese.

The pair of paper are perfectly related, because most people can empirically feel that two most important factors influencing their purchase are product description and online reviews. Thus the second paper dives into how online reviews are associated with sales. I feel more difficult to read the second paper with five hypothesis and tons of tables, which are pretty old school, but I’m still impressed by its simplicity and practicability. It seems the paper mainly use data from’s Web Service (AWS), thus I’m a little curious about if the dataset can significantly influence the analysis, because I feel different E-commerce websites truly have distinct styles in online review. For example, online review on Chinese website Taobao is more vivid and customer-engaged, e.g., with tons of pictures and customer conversations in the review section. In that case, I guess researchers probably need to reconsider features of online reviews involved in the analysis. Personally, I’m not that sure which recommendation system, i.e.,  yes/no or 1-star to 5-star scale, because sometimes I feel difficult to decide whether to recommend an item with both merit and demerit, that’s where 5-star scale helps for people who struggle to make choices.

Read More

Reflection #13 – [04-10] – [Patrick Sullivan]

“Predicting Sales from the Language of Product Descriptions”
by Reid Pryzant, Young-joo Chung, and Dan Jurafsky

“Do Online Reviews Affect Product Sales? The Role of Reviewer Characteristics and Temporal Effects”
by Nan Hu, Ling Liu, and Jie Jennifer Zhang

Both papers here are focused on how corporations are increasingly using social science to better connect with customers and conduct business. This trend has been gaining a large amount of traction recently, but is now undergoing controversy since the privacy questions aimed at Facebook and Cambridge Analytica. These topics should be scrutinized closely in order to verify positive societal growth while avoiding manipulation and malfeasance against the public.

Pryzant et al are trying to estimate buying behavior based off of the textual features present in product descriptions. But was there any preliminary analysis to determine if this complex and novel RNN+GF model was necessary? Could a simpler model be just as effective and have less computational cost? It would still have novelty just by approaching the textual analysis of product descriptions rather than the basic summary statistics that were in the previous studies.

Pryzant’s research focuses on just chocolate and health categories. In particular, ‘health’ must have been an incredibly broad range of products, from fitness tools, weight loss foods, vitamins, books, and medicine. I feel that sticking to just these two categories would bias the results of both categories researched towards phrases such as ‘healthy’ or ‘low-fat’.

I don’t see much information on what Pryzant et al did to verify their training data. Some items listed may have been severely misrepresented in the description. I also believe that the presence of pictures on product sites are a large impact on buying behavior. They immediately convey much information that may not be covered in the descriptions. There is surely a reason so many company marketing agendas focus on graphics and visuals

Does Pryzant’s research translate well to other cultures? Politeness in Japanese culture is fairly well known as a very prominent characteristics of the majority of people there’. The research here gives Politeness status as an influential word group that increases buying behavior in this Japanese market. Could the same work be applied to the USA or other cultures that may not place the same importance socially on politeness?

I am glad Hu et al preferred the simple yes/no recommendation review system. The popular 5-star review system is more subjective and can be unhelpful. In a 5-star review system, a 4-star review is sometimes considered harmful to a product or service. This is partially due to the subjectivity of those who are doing the rating, who can become biased or manipulated quite easily. So a review system that is built to give a finer degree of resolution on their reviews, can actually lead to more bias and noise in the dataset. Perhaps it would be best that most review systems adopt a simple recommend / do not recommend system of review, as there is almost no question as to where the author stands in this straightforward setup.

I am not sure about the Hu’s conclusion that the impact online reviews on sales diminish over time. Another explanation would be that interest in a product naturally decays over time, and leads to lower sales. It would be difficult to effectively measure this effect and also compare it how the impact of online reviews also decline over time.

Read More

Reflection #13 – [04/10] – [Jamal A. Khan]

  1. “Predicting Sales from the Language of Product Descriptions”

Since I’ll be presenting one of the papers for tomorrows(4/10) class, I’ll be writing a reflection for the other paper only.

I thoroughly enjoyed reading this paper for two reasons:

  • The idea of using a neural network for feature selection.
  • coverage of different aspects of results.

To start off I’ll go against my first reason of liking the paper because i think the comment needs to be made. Neural networks (NN) by design are more or less feature generators, not feature selectors. Hence, using one for selection of features seems pretty counter intuitive e.g. When Yan Lecun Convolutions Neural Networks have been so wildly successful because they’re able to automatically learn filters detect features like vertical edge, shapes or contours without being told what to extract. Thinking along these lines the paper seems pretty ill-motivated because one should be able to use the same gradient reversal technique and attention mechanism in a sequence or convolutional model to abstract out the implicit effects of confounds like brand loyalty and pricing strategies. So why didn’t they do it?

The answer, or atleast one of the reasons that is very straight forward, is interpretability. While there’s a good chance that the NN will generate features that are better than any of the handmade ones, they won’t make much sense to us. This is why is like the authors idea of leveraging the NNs power to select features instead of having it engineering them.

Coming onto the  model, i believe that the authors could’ve done a better job at explaining the architecture. It’s a convention to state the input and output shapes that the layers take in and spit out, a very good example of which is one of the most popular architectures, inception V3. It took me a bit to figure out the dimensionality of the attention layer.

Also the authors do little to comment on the extensiblity of the study to different type of products and languages. So how applicable is the same/similar model to let’s say English which has very different grammatical structure? Also since the topic is feature selection, can a similar technique be used to rank other features i.e. something not textual? as a transactional thought, I think the application is limited to text.

While it’s all well and good that the authors want to remove the effects of confounds, the only thing that the paper has illustrated is the models ability to select good tokens. I think the authors themselves have underestimated the model. Models having LSTM layers followed by attention layers to generate summary encodings are able to perform language translation (which is a very difficult learning task for machines), hence by intuition i would say that the model would’ve been able to detect what sort of writing style attracts most customers. So my questions is that when the whole idea is to see mine features of reviews to help better sell a product, why was language style completely ignored?

Just as  food for thought for people who might be into deep learning (read with a pinch of salt though) . I think the model is an overkill and the k-nearby-words method of training skipgram embeddings (the method used for Word2Vec generation) would’ve been able to do the same and perhaps more efficiently. The only thing that would need to be modified would be the loss function, where instead of trying to find vector representation that capture similarity of words only we would introduce the notion of log(sales). This way the model would capture both words that are similar in meaning and sale power. Random ideas like the one i’ve proposed need to be tested though, so you can probably ignore it.

Finally, sections like Neural Network layer reviews add nothing and break the flow. Perhaps these have been included to increase the length because the actual work done could be concisely delivered in 6 pages. I agree with John’s comment that this seems more like a workshop paper (a good one though).



One last thing that i forgot to put into the reflection (and am too lazy to restructure now) is that this line of work isn’t actually novel either. Interested readers should check out the following paper from Google. Be warned though it’s a tough read but if  you’re good with feed forward NN math, you should be fine.


Read More