Tuesday, September 30, 2014

Six Provocations for Big Data by Danah Boyd and Kate Crawford due Oct. 4

Introduction: Big Data is not notable to these authors because of size, but rather because of relationality to other data. It is fundamentally networked and its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about groups of people or simply about the structure of information itself. Big Data is important because it refers to an analytic phenomenon playing out in academia and industry. Big data is the kind of data that encourages the practice of apophenia which is seeing patterns where not actually exists, simply because massive quantities of data can offer connections that radiate in all directions. Because of this, the authors raise questions about the assumptions, methodological frameworks, and biases in the Big Data phenomenon. As well as what the data means, who has access to it, how deployed, and to what ends.
Different Sections:
1. Automating Research Changes the Definition of Knowledge: This section talks about how Big Data refers to a computational turn in thought and research. From this emerged a system of knowledge that is changing the objects of knowledge while also informing how we understand human networks and community.  Big Data re frames key questions about ethics, process of research, constitution of knowledge, and reality. Boyd and Crawford respond to Anderson's piece which I wrote about last week. They read his piece as revealing an arrogant undercurrent in the debate- where other forms of analysis are sidelined and Big Data is seen as privileged as having a direct line to knowledge. Big Data has limitations and restrictions. The issue of time is one where their data is about right now with no historical context that is predictive. This makes research limited because of the sheer difficulty or impossibility of accessing older data. Rather than what Anderson asked "What can science learn from Google?" these authors believe that we should ask how Google and similar companies might change the meaning of learning and what limitations/ possibilities come with it?
2. Claims to Objectivity and Accuracy are Misleading: This section is about how Big Data and its  issues with objectivity and accuracy. It is supposed to offer the humanistic discipline a new way to claim the status of quantitative science and objective method. However, Big Data is still subjective and what it quantifies doesn't necessarily have a closer claim on objective truth. Furthermore, large sets of data from Internet sources are often unreliable. You have to understand the properties and limits of a data set regardless of its size. It could not be random or representative. We need to know where data is coming from and to know and account for the weaknesses in that data including bias
3. Bigger Data are not Always Better Data: This section is about how quantity doesn't equal quality, which is something that supporters of Big Data aren't taking into account. It is also increasingly important to recognize that value of small data. Research insights can be found at any level including modest scales. The size of data being sampled should fit the research question being asked. Sometimes smaller is better.
4. Not all Data are Equivalent: Some researchers assume that analysis done with small data can be done better with Big Data, however this is presuming that data is interchangeable. Taken out of context, data loses meaning and value. The example the authors give is with types of networks. When sociologists and anthropologists were the primary scholars interested in social networks, data about people’s relationships was collected through surveys, interviews, observations, and experiments. Using this data, social scientists focused on describing one’s ‘personal networks’ which is the set of relationships that individuals develop and maintain. These connections were evaluated based on a series of measures developed over time to identify personal connections. Big Data introduces two new popular types of social networks derived from data traces: ‘articulated networks’ and ‘behavioral networks.’Articulated networks are those that result from people specifying their contacts through a mediating technology These articulated networks take the form of email or cell phone contacts and buddy/friend/followers lists on social media. Behavioral networks are derived from communication patterns, cell coordinates, and social media interactions. Both behavioral and articulated networks have great value to researchers, but they are not equivalent to personal networks. For example, although often contested, the concept of ‘tie strength’ is understood to indicate the importance of individual relationships. When a person chooses to list someone as their ‘Top Friend’ on MySpace, this may or may not be their closest friend; there are all sorts of social reasons to not list one’s most intimate connections first.
5. Just Because Its Accessible Doesn't Mean Its Ethical: This section starts off with an ethical research problem where the privacy of unknowing Facebook users having their data collected was compromised.  So what is the status of "public data" on social media sites? Can it simply be used without permission? Research on human subjects always brings up privacy issues. Little is understood about the ethical implications of the research being done. In order to act in ethical manner, scholars need to reflect on the importance of accountability.  Protect rights and well-being of human participants. Big Data researchers often don't acknowledge a difference between being in public and being public.
6. Limited Access to Big Data Creates New Digital Divides: This section questions the assumption that Big Data has easy access. Really only social media companies have access to large social data especially transactional data. some companies restrict access, others sell privilege of access for high fees, etc. Basically, whoever has money can get access whether it be huge universities or wealthy companies. Skill is another aspect to consider. You need to have a computational background- be able to read numbers. There are also more male researchers who have these skills and determining what questions are asked.
Reaction: I found reading this after reading the Anderson piece to be very helpful in understanding how this new system works. This was a very convincing counterargument. What I got from reading this piece is that when doing research don't always go to Big Data, use what data is good for your question, be ethical in the use of Big Data, and sometime other methods besides Big Data would be better. I am very concerned about the points made in number six. This digital divide is a huge problem. Not everyone has the money to get access to it. Computational skills can be learned but the money and gender issue is a problem to me. It seems to me that society is flying forward with the concept of Big Data. We should embrace it and try to use it effectively and ethically. Other data collecting methods shouldn't be thrown out the window though. Those methods still have value. We also need to address this divide because this information needs to be more accessible for it to benefit research in a big way.

No comments:

Post a Comment