By Michael Salvo and Adam Strantz
Whenever we read or hear the phrase “Big Data,” we have two simultaneous competing thoughts. The first is a bit cynical, and we cannot help but riffle though the half dozen buzzwords and trends, more or less related, which have fallen short of predictions for disrupting patterns of work. The second is more egocentric. Technical communicators want to know what Big Data can do for us. After all, we each produce the raw materials, the bits of data that become terabytes and exabytes—the bits and bytes that, together in aggregate, really make up Big Data.
We won’t claim that people are Big Data, but we are certainly its cause and its source. IBM states that 90% of all data that exists has been produced in the past two years (IBM Big Data Success Stories, ftp://ftp.software.ibm.com/software/data/sw-library/big-data/ibm-big-data-success.pdf). It seems a ludicrous claim until one narrows the definition of data to match: IBM is talking about digital data. In addition, all the data that we think of as disposable—search engine queries, GPS directions, Web reading histories—is not only recorded but also duplicated and stored. It becomes warehoused data. Once-disposable digital data does not seem professionally compelling to a technical communicator. But if there were ways to access user-centered analysis data, designed to show us things we care about in the data—things that might improve our lives—well then, that is an entirely different thing. What individuals might find compelling are Big Data services that respect our roles as contributors to the oceans of data by delivering significant results.
As more people generate content through such platforms as social media, mobile devices, and personalized applications, our digital leavings coalesce into substantial amounts of personalized data. While companies collect much of this data, mostly behind the scenes, much is simply left floating amid the flotsam generated online—a signal lost amid a sea of digital noise. Beyond the number of individuals we are connected to online, the amount of information generated by and available to our devices is virtually endless—it is this network of individuals and information that forms the backdrop to Big Data research practices. Being able to study our own digital histories, then, allows us to contextualize our data within the networked structures that create opportunities for digital work.
The pervasiveness of mobile technologies and social media sites highlight the always-on, always-networked nature of the individual writer and researcher. As technologies embed themselves into the ways we interact with, see, and approach our work, new technologies encourage behavioral changes that play out through users. Futurists used to dream of an embedded chip jacking our brains directly into the network, but it turns out it’s easier to just have our connection in our pockets. Current technology can be represented as equally invasive as those nightmarish neural sensors, but the challenge remains in making the significance of data streams clear, making our data traces visible to users, and enabling them to interact meaningfully with data of our own making. A forward-thinking view of the technical communicator’s place in this discussion would align with previous efforts to make technology available to users—whether in supporting mundane activities, reproducing the microcomputer revolution with data analysis, or identifying roles for users in utilizing technology, generating data, and accessing that data with a purpose.
The key to making sense of these messy, complex amounts of data (see Graham et al.) is the ability to meaningfully interact with the accrued data. Amid distopian concerns for nefarious uses of hidden data collected by social media sites, the potential benefits of the growing amount of publicly available data require scrutiny as well. For instance, groups on social media like Facebook are creating digital traces that are unretrievable—unusable. That history, and the tangles that form institutions, professional groups, and social circles, are visible as endlessly scrolling lists, but are unsearchable—they have no memories. While everything seems to be available, nothing is truly accessible. And so, once again, disruptive technology presents opportunities to create user-centered interfaces. Big Data requires tools to allow users to manipulate and investigate that data. This is what the field is currently calling “wayfinding”—we need to help users confidently find their way through the landscapes generated by this data. Whether by designing interfaces that aid users in exploring their own data or by signposting and organizing data in ways others can follow, these navigational metaphors support users tracing their own digital paths. In this way, we can better create user-focused tools that invite, rather than repel, interaction and present navigation as integral aspects of not just studying but living with Big Data.
Whether provided by the sites themselves or developed through third-party use of application programming interfaces (APIs), these tools offer ways to manage the data we generate in our digital lives. Some tools can be used to investigate our own patterns as we work. For example, users with Android phones are by default opted-in to Google Maps tracking, recording their movements with Google. While the data can be represented as a breach of privacy, the ability for users to access maps opens possibilities for reflection by putting information and tools in users’ hands. The trade-off between tracking motion through space and reflecting on one’s own data is key to contextualizing the impact of these technologies—if algorithms adapt and learn from our behavior, so can we. But we need to translate the data into interfaces that make sense to users and can be put to work in real time, available at the moment of decision-making.
Open source tools and APIs for sites such as Facebook and Twitter offer further opportunities for users to interact with data collected from personal and public online sources. While this data is usually freely accessible on the sites themselves, social media (and the vast amounts of data generated) open opportunities for developers to offer analytic tools to support users’ investigations of themselves, which leads to seeing themselves in relationship to communities in which they participate, reflecting both global popular and niche/local trends.
Aside from the tools available for tracing individuals, analytic tools are increasingly focused on collecting and visualizing the interactions happening between users. Reddit Insight, an analytic tool for the popular link-sharing website Reddit, can visualize trends and aggregate views and posts by a single user or a community. Everything from a single user’s most popular post to the ranking of a community (or subreddit) against the most popular communities on Reddit can be tracked. Similarly, Twitter analytics allow a user or group account to visualize their posts over time as well as member engagement: conversations, retweets, favorites, even the number of people who clicked on shared links are compiled into a quick snapshot. Analytics take these vague questions of what our work is doing, or if it’s having any effect, and attempts to quantify an answer. Can we prove effectiveness or impact? Maybe not, but taken together these analytic tools offer a way to view user data as something more: Big Data. Or networked data that exists not in a vacuum, but in the interactions between communities and their competing digital signatures.
While it may be gratifying to view our own social media, the ability to monitor the reach of participation (see Verzosa Hurley & Kimme Hea) and connections helps to contextualize the internetworked nature of participation of these digital spaces. Not only do online communities of users generate huge amounts of data (oftentimes thousands of comments and pageviews), but these spaces also exist in conversation with other entities, producing a stream of individual data points as well as larger data trends tracing community interaction. Concepts such as the reach of a post, popularity (through likes, upvotes, favorites), and visibility in the network have become important information for tracking online communities. Number of views doesn’t cut it anymore, and the complexity of these social media sites and the content their users generate becomes apparent once this interconnectedness is made visible. Instead of guessing at the impact of an individual’s work in these spaces, emerging analytic tools can provide a way to articulate data and visualize the conversations happening in the network. While Liza Potts’s Social Media in Disaster Response (2014) gestures toward a plastic future of useful, flexible tools that allow users to shape their digital media experience in real time, Twitter searches are but the first step in creating an interface for analysis.
There is a rather hopeless, perennial post demanding that Facebook users get paid for their participation, their contributions, to the Zuckerberg ocean of data. How Facebook makes money off that body of data, floats flotillas of businesses. We would have monetized our expertise long ago if we had insight into how all that works. Any kind of literal payment just seems improbable if not simply naive. Instead, what we envision is increased access both to the findings and the analytical tools developed to sort through the data. We want user-centered interfaces that present results of Big Data we can use to improve our lives—a signal amid all the noise. Bill Hart-Davidson and Jeffrey Grabill articulate the discussion as one of ambient data rather than Big Data, and Hart-Davidson’s recent work looks at individual health data—fitbits and quantifiable health data people are collecting about their own exercise and diets—and makes larger data streams individually significant. Similarly, the Writing in Digital Environments (WIDE) announcement of the Faciloscope app makes huge data sets individually accessible and coherent. It makes discrete parts of the larger data stream intelligible (www.cal.msu.edu/faciloscope).
While businesses wring efficiencies and articulate new markets through analysis of terabytes of untapped data, technical communicators can participate. Big Data presents opportunities consistent with our core competencies and traditions of work rather than renegotiating what it is we do. Ultimately, the inquiry becomes something other than suggesting that technical communicators become data scientists. We didn’t become engineers, scientists, or programmers in response to earlier opportunities. While becoming technologically literate and relevant, we resisted being driven by the technology. Instead, our strength and value continues in the humanization of technology. The passing age of personal computing opened so many opportunities for technical communicators and the field of technical communication to participate in the development and documentation of the hardware and software of desktop computing, and then the design and deployment of the Internet, and its redevelopment into a participatory medium in Web 2.0 (one of the buzzwords now thankfully fading in frequency of use). Usability, user-advocacy, interface design, and visualization: these competencies repeatedly are underdeveloped and undervalued, but they are vitally important to bringing value to Big Data, as they have proven to be in the wake of each so-called disruptive technology.
As a field, our aim should be to develop usable applications and meaningful artifacts out of the data with a human-accessible face. That is, technical communication’s role in Big Data can and must continue to be the role we have played in data mining, knowledge and information management, and all the previous constructions of similar realms of application: representing people and their needs, as well as participating in the emergence of useful and usable artifacts.
In October 2014, Brent Faber offered a powerful workshop at the Association for Computing Machinery’s Special Interest Group on the Design of Communication (ACM SIGDOC) focused on Big Data. The lab Faber runs at Worcester Polytechnic Institute has produced a stunning list of early successes. While the room was full of interested, smart people reflecting on the workshop and discussion (there was over 200 years of experience in technical, professional, and digital communication and usability in the room), it became clear that Big Data and Ambient Data are not yet yielding visible results for people in their everyday lives.
At the moment, Big Data is most useful to data scientists, most valuable to large organizations, utilized for huge business gains in the global economy, and best approached by specialists. See especially IBM’s business cases and consultancy website(s) (see References below) which describes opportunities for individuals but benefits for businesses. There are many sites offering Big Data analysis tools, and sites tout the advantages for businesses as well as the opportunities available to specialists who can articulate such findings. Like computing before the personal computing revolution, there seems little opportunity for individual level access to the fruits of Big Data analysis. What’s in it for us—and by us, we mean the individual, the person, rather than the multinational organization making money out of once-disposable terabytes?
We are in the UNIVAC age of Big Data: like computing in the 1940s through the 1960s, there was potential and specialists were making business cases and early gains by articulating their programming, hardware, and computational skills. But there was dubious value offered to the public, and little, if any, to individual users. It was a true “trickle-down” ideology—life, we were promised, would get (vaguely) better through digital computing. Or perhaps we are moving into the hobbyist stage of the 1970s and 1980s, as fitbits and personal health data are the first realizations of a coming wave of products and services utilizing Ambient Data.
Big Data remains specialized, obfuscated, and elite. There is not yet a clear way forward. There is lots of potential evident in technology and analytic methods, but no clear, direct benefit for end-users. Even the term end-user is a purposely chosen, historically situated term: while dated, it brings back an earlier age of concern for and design for an audience. A user-centered revolution in Big Data is necessary. The user experience of data is in need of redesign. Architectures and interfaces of data analysis need to be reconceived for the user.
Popular representations tend toward the dystopian: elimination of privacy, hacking, and dehumanization all seem hallmarks of popular representations of Big Data application. There is opportunity to harness Big Data for the use and application of individuals. With articulated benefits to end-users, Big Data application may be seen as less of a burden and a threat, and become more of a service and a personal benefit for which trading some privacy for access is a mutually agreed-upon exchange.
There is as yet no “killer app” for Big Data, no spreadsheet nor word processor. No desktop publishing, which was so important to establishing technical communication in the 1980s. And there is certainly no Internet of Big Data. Is that what’s necessary? An Internet of Big Data? Certainly, there are opportunities for rearticulating Big Data analysis as a benefit for those producing the data itself, if we can just for a moment speculate more broadly.
Graham, S. Scott, Sang-Yeon Kim, Danielle M. DeVasto, and William Keith. Statistical Genre Analysis: Toward Big Data Methodologies in Technical Communication. Technical Communication Quarterly 24.1 (2015): 70–104.
Hart-Davidson, William, and Jeffrey T. Grabill. The Value of Computing, Ambient Data, Ubiquitous Connectivity for Changing the Work of Communication Designers. Communication Design Quarterly Review. Retrieved 25 February 2015 from ACM Digital Library, http://dl.acm.org/citation.cfm?id=2448921.
IBM. “Big Data and Analytics Hub,” www.ibmbigdatahub.com.
IBM. IBM Big Data Success Stories, ftp://ftp.software.ibm.com/software/data/sw-library/big-data/ibm-big-data-success.pdf.
IBM. “What is Big Data?” www-01.ibm.com/software/data/bigdata/what-is-big-data.html.
Potts, Liza. Social Media in Disaster Response: How Experience Architects Can Build for Participation. Routledge, 2013.
Reddit. Reddit Metrics, http://redditmetrics.com/.
Verzosa Hurley, Elise, and Amy C. Kimme Hea. The Rhetoric of Reach: Preparing Students for Technical Communication in the Age of Social Media. Technical Communication Quarterly 23.1 (2014): 55–68.
Writing in Digital Environments (WIDE). The Faciloscope’s Goal: Everything in Moderation. www.cal.msu.edu/faciloscope/.
Michael J. Salvo is associate professor and director of professional writing at Purdue University. He is currently researching innovation in midwest manufacturing as sites of future technical communication practice as well as rhetoric and experience architecture with Liza Potts.
Adam Strantz is a PhD candidate at Purdue University. He is currently working on his dissertation entitled Wayfinding Localized Research Practices through Mobile Technology.