62.3, August 2015

Implications of Desnoyers’ Taxonomy for Standardization of Data Visualization: A Study of Students’ Choice and Knowledge

By Rachel Rayl

Abstract

Purpose: Current research on data visuals focuses on their creation and use; however, there are few attempts at standardizing data visuals to help facilitate better interdisciplinary communication. Can Desnoyers’ taxonomy facilitate better interdisciplinary communication in STEM (Science, Technology, Engineering, and Mathematics) fields by helping practitioners choose more efficient data visuals? In addition, would adopting Desnoyers’ taxonomy bypass the current discrepancies between academic and journal data visuals?

Methods: To test Desnoyers’ taxonomy’s impact on efficient use of data visuals, I did an exploratory, pretest/posttest survey of 101 STEM students and their choices of data visuals before and after exposure to Desnoyers’ taxonomy.

Results: Students chose more complex and more efficient data visuals on the posttest, after exposure to Desnoyers’ taxonomy. However, level in school did not change the effect of exposure.

Conclusion: Students’ reported use of data visuals supports prior research about discrepancies between academic and journal data visuals. Additionally, students might benefit from having more exposure and training in efficient data visuals. Further control group studies are needed to show if Desnoyers’ taxonomy itself can increase students’ comprehension and use of efficient data visuals as compared to pure explanation of data visuals. If the further studies demonstrate that, then researchers and creators in the field of data visualization could confidently adopt Desnoyers’ taxonomy as a way to teach and reference data visuals consistently.

Keywords: data visuals, graphs and charts, interdisciplinary communication, taxonomy, visualization

Practitioner’s Takeaway:

  • Interdisciplinary visual communication, especially in the STEM fields, is hindered by a lack of standardization.
  • Desnoyers’ taxonomy of data visualization may help teach students how to use data visuals in an efficient manner.
  • This study shows that Desnoyers’ taxonomy of data visualization offers a potential way to standardize the usage of data visuals, independent of language specific sorting schemes.

Introduction

Data visuals influence how we view explanations of data; but so far, we have yet to consistently sort the many different kinds of visuals into a cohesive whole. We live in an increasingly visual and digital culture, and as such, researchers have been studying the field of data visualization for several decades. Edward Tufte wrote a seminal book on data visuals, The Visual Display of Quantitative Information (1983), and consequently defined how to create data visuals. However, even with the extensive defining of data visuals and subsequent research on their usage, researchers cannot agree upon one way to sort them. Current proposed sorting schemes (Keim, 2002; Ménard & Dorey, 2014; Shedroff, 2000) depend on words that require translation, making it difficult to create a globally consistent sorting scheme. Luc Desnoyers argued, “A consensual glossary of visuals would facilitate the development of harmonized guidelines and therefore help students, scientists, and writers in the selection of appropriate graphs” (Desnoyers, 2011, p. 121). This would solve the problem that Richard Emanuel and Siu Challons-Lipton noted, specifically that most undergraduate curricula do not teach students to be visually literate (Emanuel & Challons-Lipton, 2013, p. 12). This causes problems in data heavy fields, such as STEM (Science, Technology, Engineering, and Mathematics), where students and professionals are expected to communicate highly technical data to an increasingly visual culture. In 2011, Desnoyers proposed building a taxonomy of data visuals sorted in a manner similar to the scientific biological taxonomies, that is, “categories going from the more general to the more specific level, according to precise rules” (Desnoyers, 2011, p. 123), with the denominations of the taxonomy being based on Greek and Latin terms.

Because of the lack of a unified data visual lexicon, academics and practitioners can and do miscommunicate when using visuals (Rybarczyk, 2011). As such, I explore how using Desnoyers’ taxonomy of data visuals might help unify visuals by sorting them into a cohesive whole based on efficiency. The question of “would exposing students to this taxonomy and its denominations change which data visuals they prefer to present information?” laid the foundation for this exploratory research project. Because of the lack of published, follow up research to Desnoyers’ original article, I chose to do an exploratory study to test if exposing STEM students to Desnoyers’ taxonomy and proposed denominations along with data visuals definitions changed what data visuals students prefer to present information. My research question weaves in both the idea of training future STEM professionals, that is, STEM students, and the idea that certain visuals correlate with specific data sets. Because Desnoyers designed his taxonomy with STEM data visuals in mind, I chose to focus on the STEM application and use students from relevant STEM fields as my participant group. However, because the subject of data visualization is a universal one, technical communicators with all specialties (inside and outside of STEM) can benefit from this research.

Literature Review

Throughout the past decades, there has been much discussion about what constitutes an “efficient” data visual, but researchers generally agree that efficient visuals allow for accurate identification of data as well as drawing attention to the data (Desnoyers, 2011; Dragga & Voss, 2001; Kostelnick, 2008). Additionally, they agree that the principles of efficient visuals affect all professionals who create data visuals, including STEM professionals. To make efficient visuals, professionals should follow all principles of aesthetic design while emphasizing ethics, proportionality, clear labeling, and context, all with the purpose of improving comprehension of the data (Dragga & Voss, 2001, p. 266; Heer, Bostock, & Ogievetsky, 2010, p. 59; Tufte, 1983, p. 56, 74). Technical communicators typically have many resources at their disposal to figure how to apply these principles when making their data visuals. However, STEM professionals typically have inadequate resources of efficient ways to present data. STEM fields mostly emphasize presenting data in its raw form, mainly ungainly chunks of data in spreadsheets, or processing data (Finson & Pederson, 2011; Gorodov & Gubarev, 2013; Rybarczyk, 2011). STEM fields place little emphasis on differentiating efficient and non-efficient ways of presenting data in a polished manner. Helping STEM professionals to adopt principles of efficient visuals would allow for better presentation of STEM ideas in a highly technical world.

Efficiency Defined for STEM Data Visuals

STEM professionals can include data visuals to support the logical appeals of any given science, and efficient visuals present information in ways that improve logical comprehension of the data. However, Frankle and DePace (2011) pointed out that “a visual representation of a scientific concept is a re-presentation, and not the thing itself” (p. 3). Technically, one could take any data set and represent it using most data visuals; however, only a select few would represent the data in a logical manner consistent with its intended use. For example, people could decide that they want to compare the number of endangered iguanas to the total number of endangered animals in the world. To represent that comparison, they could use any number of data visuals, and Figures 1a-d show a small sample of the possibilities.

The sheer magnitude of difference makes it almost impossible to see the data points for endangered iguanas on Figures 1a, 1b, and 1c. Those figures are only proportional and clearly labeled while not showing the actual numbers, which limits comprehension of the data, thereby creating unethical visuals (Dragga & Voss, 2001). On the other hand, Figure 1d is ethical, clear, improves comprehension, and allows people to accurately identify the information; therefore, it more efficiently represents the given data. As shown, STEM data visuals need to follow both general efficiency rules as well as present data comparisons to logically support scientific arguments. However, which data visual people chose changes how much it supports their arguments, thereby showing that not all STEM data visuals are equally efficient.

What STEM Professionals and Students Are Taught

To properly choose the efficient visuals, STEM professionals must learn how to visualize data. Dan Lipsa et al., a collaboration visualization group, recommended that data visual creators should work closely with the physical scientists by “reviewing recent visualization papers [in those fields]” (2012, p. 2338). By doing so, data visual creators can keep up with the needs of the STEM community. Given that STEM students are professionals in training, having something to connect both academia and the “real-world” type visuals would help students develop visualization skills now that they will use in the future.

To get students to focus more on what visual communication can offer them, professors can pull in real-world examples of visual communication to interest students, but that approach has its own drawbacks. In 2011, Brian Rybarczyk compared the differences between scientific visuals in textbooks and scientific visuals in journals. His results suggested, “there is a mismatch between the types of scientific visualization in textbooks compared with how science is documented in [journals]” (Rybarczyk, 2011, p. 111). Specifically, textbook visuals used in academia are oversimplified and lacking in variety; whereas journal visuals present more of the complexities involved in interpreting real data. These mismatches between textbooks and journals are the result of a disconnection between academia and professions. Practitioners working in the field have data that does not fit nicely into any one given data visual, as demonstrated by the wider variety of data visuals used to try to convey very singular results. However, as academics continue to use textbooks, with their oversimplified and lacking visuals, as their standard for exposing and teaching students about data visuals, it hinders the development of visualization skills. STEM students need to learn visualization skills that will serve them well in future work. Standardizing visuals to understand their efficiency in particular situations can overcome this. If both academia and practitioners referred to the same definitions of data visuals, then they would create and use consistent data visuals, as they could capture both the complexities of real data and the simplicities of “textbook example” visuals.

rayl_fig1

How to Sort (and Standardize) Data Visuals

Consistency within use of data visuals would require consistency in description and sorting of the data visuals. In 1977, Michael MacDonald-Ross reviewed empirical studies of ways to display quantitative data. He began by establishing a consistent lexicon of terms, vital when comparing papers written at different times by people in different fields. For example:

  • Bar chart: bars of constant width and variable length that may have more than one dependent variable (p. 364)
  • Cartesian grid: a coordinate grid bearing arithmetic scales (p. 364)
  • Cartogram: a map displaying quantitative data (p. 364)

However, his need to create this lexicon demonstrates a problem: while people have a general idea of what certain data visuals look like, they do not always speak of them in the same terms. As Desnoyers (2011) points out, “the terminology used by different authors varies” (p. 121). A simple example of this would be the difference in definition of “cartogram”: MacDonald-Ross (1977) defines it as “a map displaying quantitative data” (p. 364) whereas Heer et al. (2010) define it as a map that “distorts the shape of geographic regions” based on data (p. 63). Different terminology leads to a lack of consistency and thereby creates barriers to understanding which type of data visual efficiently presents which kinds of data.

Making a consistent lexicon for data visuals requires figuring out the correlation between efficient data visuals and different kinds of data sets. This goes back to the heart of defining data visuals. As Daniel Keim, an influential author in the field of data visualization, summarized: “The basic idea of visual data exploration is to present the data in some visual form, allowing the human to get insight into the data, draw conclusions, and directly interact with the data” (Keim, 2002, p. 1). Commonly, researchers standardize data visuals’ terminology by sorting them (Cairo, 2012; Meirelles, 2013). Sorting by terminology allows for consistent reference both when creating data visuals and when dissecting them to understand their use and creation. However, the “what,” “how,” and “why” of sorting still depends on the person proposing the given sorting scheme.

General Sorting Schemes. The language dependence of most sorting schemes causes a key, language dependent, problem. As Edward Tufte (1983) says, “the design of statistical graphics is a universal matter—like mathematics—and is not tied to the unique feature of a particular language” (p. 12). Yet, some proposed sorting schemes heavily depend on language specific factors.

In Information Design, Nathan Shedroff (2000) differentiates seven ways to possibly sort data: by alphabet, location, time, continuum, number, category, or randomly organized. Additionally, Daniel Keim (2002) proposes three different ways to sort data: type of data (numerical vs. textual), actual output design, and how humans will interact with the data visual to extract meaning from it. However, sorting data based on type, output, or interactivity can lead to vastly different sorting schemes depending on the criteria used for sorting, which can lead back to our problem of inconsistent lexicons or inconsistent sorting overall. In 2014, a paper was published that focused on using a taxonomy to sort the visuals because “[the purpose of taxonomies] includes domain simplification, description and charting for reliable and speedy navigation” (Ménard & Dorey, 2014, p. 114). But the taxonomy relies on consistent translation between the two languages and does not incorporate other languages. However, Desnoyers’ taxonomy, proposed in 2011, bypasses the language translation problems by using Greek and Latin (scientific standards that require no translation).

Desnoyers’ Taxonomy

Luc Desnoyers proposed his taxonomy in 2011 as a way to simplify the language barrier facing other proposed sorting schemes. As he states, “A consensual glossary of visuals would facilitate the development of harmonized guidelines…” (Desnoyers, 2011, p. 121). To develop a standardized (or consensual) glossary, Desnoyers abandoned the idea of using Standard English or another living language, and instead resorted to a taxonomy with denominations based on Latin and Greek vocabulary (Figure 2).

rayl_fig2

By resorting to “dead” languages, he bypassed conventional problems with language dependency, and he added a level of familiarity for STEM professionals, as most STEM fields use Greek or Latin words to describe ideas or things. Using three major classes (Cosmograms, Typograms, and Analograms), Desnoyers’ taxonomy covers all static STEM data visuals, including photographs. He chose to leave out interactive visuals and compound visuals. However, he leaves open the idea that those types of data visuals could themselves form other branches of his proposed taxonomy. Desnoyers’ taxonomy sorts data visuals by component similarities into three branching denominations: classes, orders, and families. By doing this, he follows the same developmental process that Carolus Linnaeus followed to form the biological taxonomy used for plants and animals. Desnoyers developed this sorting scheme based on over 30 years of training graduate students in science communication. However, Desnoyers admits that he has yet to complete the taxonomy and the proposed denominations remain untested for how well they help people utilize data visuals (Desnoyers, 2011, p. 131).

As Desnoyers said, “students frequently rely on their self-acquired mastery of software like Microsoft Excel, which offers indiscriminate use of different types of awkwardly named visuals for any type of data” (2011, p. 121).

As such, my first sub-question inquired if students originally gravitate toward reigrams, cellulograms, puncti-curvigrams, and absolute histograms, which they can generate from simple software programs, or if they used a wider variety of data visuals (Figure 3). Second, I wanted to see if students gravitate toward more efficient type visuals after exposure to Desnoyers’ taxonomy.

Methods

I used a standard exploratory, pretest-posttest design, anonymized according to the IRB approval that I received, using exposure to Desnoyers’ taxonomy as the treatment between the pretest and posttest. The overall structure follows the standard format for exploratory, pretest-posttest designs (Creswell, 2009, p. 160; Greeno, 2002, p. 73). By using a quantitative study, as opposed to qualitative, I could compare the statistical differences in efficient answers before and after exposure to the Desnoyers’ taxonomy and the denominations to see if students increased their understanding of data visuals after exposure. I structured the test itself as follows:

  • Pretest: which I used to evaluate students’ initial choice of data visuals (see Appendix A for the full pretest). This included demographic questions along with asking students to list the data visuals they used most often in the past twelve months, and how many they have used.
  • Educational treatment (exposure): where I used a five-minute presentation to expose students to Desnoyers’ taxonomy along with definitions of different data visuals within the denominations.
  • Posttest: which I used (in conjunction with the pretest) to evaluate how students’ choices of data visuals changed after exposure.

Exposure to Desnoyers’ Taxonomy

Once the students all finished their pretest surveys, I gave a five-minute presentation about Desnoyers’ taxonomy and its denomination. There were various reasons for the relatively short duration of the exposure: the exactness and consistency of the wording across classes, the attention span of students (for an extra-class activity), and the class time that the professors of the classes designated for the entire research session. The presentation itself was simplistic with first a general overview of the taxonomy classes (refer back to Figure 2), followed by an explanation of each denomination within them. Each class and respective branching denominations within the taxonomy were on their own slide, and was accompanied by oral explanation based on a prewritten script, to keep the explanation consistent for all participants. I did not permit participants to ask questions about the taxonomy until after the posttest, again for purposes of consistency. Finally, I pulled the explanations and types of example data visuals from Desnoyers’ article so that the information would remain true to its source material.

Procedure

For this study, I avoided using a structured “test” setting, so that the students could relax and perform for the study how they normally would. Additionally, all of the classrooms had projectors built in so that I could present the educational treatment part of the study without bringing additional equipment. According to required IRB regulations, students could stop participating at any time, and I explained to them that this test would not affect their grades. I piloted the study with three NMT recently graduated alumni because I expected that they would represent current students’ understanding at NMT.

rayl_fig3

I began with a 16-question pretest, and after all students had indicated that they finished the pretest, I began the five-minute presentation about Desnoyers’ taxonomy. Finally, I handed out a posttest that contained the ten data sets from the pretest but in different order, which the students had about five minutes to finish. As students finished their posttest, I had them then staple their pre- and posttest together and place the packet in a box, to keep the appropriate surveys together while retaining anonymity. After all students had deposited their packets into the box, I opened the floor to questions about Desnoyers’ taxonomy or data visuals in general, that way I could generally see if the students learned anything from the exposure or if they merely participated because everyone else did. However, evaluating the post-survey students’ questions was outside of the scope of this study, and as such I did not formally gather that data.

Participants

I recruited 101 participants from writing classes at NMT and from one student club because they belong to the target demographic, that is, STEM students, and because they were a convenient sample. I used volunteer student participants from the New Mexico Institute of Mining and Technology (NMT), with the goal of having 100-120 participants so that I could generalize to a larger audience (Creswell, 2009). Working with core-requirements writing classes and a club with members from different degrees and levels in school allowed me to get a cross-sampling of degrees that I would not have gotten with, for example, a chemistry class (see Table 1).

rayl_table1

I visited each class and club only once, and only students who showed up to class that day could participate. This allowed me to avoid having people participate more than once, which would skew my data. For this study, I collected demographic data about degree and year in school. I limited the scope of demographic data to only those two questions because I only wanted to see the overall effects of exposure to Desnoyers’ taxonomy. Bartell, Schultz, and Spyridakis (2006) used a similar limitation of demographics when they tested how difference in text signals changed comprehension between online and print documents. I got a range of data that reflects mostly normal distributions of degrees and year in school at NMT, specifically more engineering type degrees, as well as undergraduate level in school. However, because two of the classes were limited to only mechanical engineers and one of the classes had mostly technical communication students, I have a slightly higher than normal percentage of those two populations. Unfortunately, my data does not fully reflect those higher percentages because roughly one third of the total participants did not write their degree (called Major(s) on the pretest, see Appendix A) on their copy of the survey.

Pretest and Posttest Measures

The first six questions on the pretest (see Appendix A) were general demographic questions, including ones about majors and level in school, whereas the final ten questions were data sets that the students had to pair with data visuals. The posttest had identical data sets as the pretest, but in different orders, to make the comparisons valid while preventing students from answering the posttest questions in “autopilot mode.”

My dependent variable was the students’ answers for the data sets on the pretest and posttest, which I then evaluated in light of overall change and change within each major and level in school. To capture these changes, students only had to fill out the pretest and posttest surveys, and listen to the presentation: no other tasks were required. For purposes of calculations, because I could not account for people putting both efficient and non-efficient answers on a single data set (see Appendix A for the list of which answers were considered efficient for each data set), I coded their answer as efficient so long as one of their options was efficient for the given data set.

For example, data set 4 of the pretest read, “As part of your senior project, you need to compare the durability of five types of wood resin composites” (see Appendix A). The data set included three specifics that the students had to notice: specifically the words “compare,” “durability,” and “five.” “Compare” told the students the objective of what they needed to do with the data. “Durability” told the students what quantitative quality they would need to focus on, and “five” told the students that they had to present only a small set of data. Given these three specifics, the efficient answers were: F, I, and L. Choice F (a cellulogram) would allow the students to compare exact numbers for multiple measures of durability. Choice I (a punctigram) would allow the students to compare two measures of durability by using a point to represent each of the five wood resin composites. Finally, choice L (an absolute histogram) would allow the students to compare the composites based on a single measure of durability.

Data Analysis

Of the 16 questions on the pre-test, 10 were data sets, and I used paired Student’s t-tests to analyze the primary results by calculating an aggregate score for the pretest and a second aggregate score for the posttest. I then used a one-way ANOVA test to compare performance of upperclassmen to underclassmen because of the large difference in group sizes (76 compared to 25). However, I could use neither a t-test nor an ANOVA test to compare performance of degrees because 1/3 of the students chose not to put down any information for this demographic factor.

If the t-test results showed a statistically significant increase in the number of efficient chosen data visuals after exposure to the Desnoyers’ taxonomy, then I would need to calculate the Cohen’s d value based on means and standard deviations (Ravid, 2011, p. 150). The Cohen’s d value indicates the effect size of the difference found. If Cohen’s d is 0.8 or larger, then it would indicate that the results have a large effect, and that exposure to Desnoyers’ taxonomy could significantly help students use more efficient data visuals. A Cohen’s d value of 0.5 would indicate a medium effect. A Cohen’s d of 0.2 or smaller would indicate that the difference, although statistically significant, is only small.

Results

In this section, I first present the overall change in efficient answers after exposure to Desnoyers’ taxonomy. Next, I present the results of the sub-questions, specifically the results of their reported use of data visuals and how the demographics of the participants affected their choice of efficient data visuals. Finally, I report how the exposure to Desnoyers’ taxonomy changed students’ willingness to use more complex visuals for the data sets.

Change in Efficient Answers after Exposure

Comparing overall pretest results (M = 7.02, SD = 1.59) to posttest results (M = 7.38, SD = 1.66) I found a statistically significant difference, t(100) = 2.11 (p < 0.025) between the two, as shown in Table 2.

rayl_table2

As noted, the mean of the overall answers only increased by 0.36 points out of 10 possible points, a mere 5% increase. The positive t value (2.11) shows that the students chose more efficient answers on the posttest, showing that students would choose more efficient data visuals after exposure to discussion of the taxonomy and the data visuals’ purpose. However, the small Cohen’s effect size value (d = 0.21) limits how much of an impact this increase is on students’ reported usage of efficient data visuals. I will explore a few possible reasons for this limited increase along with the associated implications for education in my Discussion section.

Students’ Reported Use of Data Visuals

In question 5 of the pretest, I asked the participants to list the data visuals they used the most often in the past 12 months (see last page of Appendix A for potential data visuals). I hypothesized that students would use cellulograms, reigrams, puncti-curvigrams, and absolute histograms. In Figure 4, I list the top 7 most used data visuals as reported by the participants In my results, I discovered that many students could not decide on just three data visuals (question 5 of the pretest, see Appendix A). Figure 4 shows that students reported using cellulograms and puncti-curvigrams as the top four most used, whereas they reported reigrams and absolute histograms as the sixth and seventh most used.

The data visuals that students selected reflected the STEM nature of the university because those are the types of data visuals required by field conventions in the NMT student lab reports, but the visuals also show that there is not much variation in how students present data.

In question 3 of the pretest, I asked students to mark how many data visuals they had used in the previous twelve months. Table 3 summarizes my results, which show that roughly 75% of my participants created over 21 data visuals for classes and projects in the past twelve months.

rayl_fig4

rayl_table3

In the Discussion section, I explore the implications of students’ creating that many data visuals and a possible ramification on the types of data visuals created.

Differences between Degrees and Levels of Participants

As seen in Tables 4 and 5, the degrees and different levels in school show a slight difference in how many efficient data visuals students chose before and after exposure to the taxonomy. These two tables taken together show the spread of participants over my target demographics, that is, STEM university students.

rayl_table4

Overall, students increased the mean of their efficient answers by only 0.36 points out of ten, and the standard deviation increased. When I compared mechanical engineers and other applied sciences to that, their means increased by less than that amount, but their standard deviations also increased. However, natural sciences increased by 2.16 points out of ten, while also decreasing their standard deviation. I did not gather enough data on the split of degrees to do more than surface comparisons because, as noted, 1/3 of my participants chose to not write in their degrees (question 1 on the pretest, see Appendix A).

rayl_table5

Note: F(1,99) = .000, n.s.

Comparing the different levels in school to the overall, I ended up with an interesting result: the underclassmen and upperclassmen had virtually no difference in performance on the test. In the discussion, I explore one possible reason for why this is.

Effects of Desnoyers’ Taxonomy on posttest data visuals

I saw an increase in more complex data visuals (A, B, C, D, H, N, O, P as listed in the Appendix) on the posttest. These data visuals I consider “complex” because basic spreadsheet software does not include them, requiring students to make them by hand, and therefore students might not normally take the time to make them. Ten percent or less of the participants reported using these complex visuals within the past twelve months, with exception of C. This trend caught my attention because participants were willing to use more complex data visuals after exposure to Desnoyers’ taxonomy even without formalized training.

Discussion

The results from this exploratory study seemed very promising, as they indicate that Desnoyers’ taxonomy may help students choose more efficient data visuals; however, the results highlighted more problems than solutions. First, students’ reported use of data visuals supports the idea of discrepancies between academia and journals. Second, while students create many data visuals per year, they are lacking quality instruction on how to create efficient visuals. Third, instruction needs to include complex visuals and not just simple ones generated by software.

Discrepancies in Visual Communication

For my first research question, I proposed that students would only use four specific types of data visuals before exposure to Desnoyers’ taxonomy. I accurately predicted students reporting that they often use cellulograms and puncti-curvigrams. However, the next most used data visuals fall under curvigrams or puncti-curvigrams, which I did not anticipate. This usage makes sense because I conducted the study at a STEM university that requires all students to take laboratory sciences. For example: students have to write weekly lab reports for Physics I & II (required for all students to take). In these lab reports, they must use line graphs to show trends and raw data. I did not anticipate students using punctigrams or curvigrams (the third and fourth most used data visuals) because students normally have to report trends and raw data on the same data visual, which is what puncti-curvigrams efficiently do. However, students’ use of these other two forms of line graphs shows that they do not always have to report trends and raw data simultaneously. This lack of variation in visuals indicates that students either present very similar data sets or simply do not use a variety of efficient data visuals or chose to follow field conventions regardless of efficiency.

Additionally, students’ reported use of data visuals supports prior research about discrepancies between academia textbooks and journal data visuals: in STEM journals, professionals typically do not use cellulograms or cosmograms (Rybarczyk, 106, p.110). As such, students are learning and using textbook data visuals that they might use for laboratory research, but perhaps not for publication. If students in the STEM fields were maintaining the discrepancy, then it would stand to reason that we would need to ask if the field of technical communication has similar discrepancies and if our students are being trained to maintain them.

Teaching Students Efficient Data Visualization

NMT students have experience creating data visuals even without much formal training. The majority of STEM students participating in this study create over 20 data visuals yearly, which means they can create either many effective data visuals or many ineffective ones. I anticipated that the majority of students would create this amount of visuals specifically because of the required 26 credit hours (minimum) of laboratory sciences that all students at NMT have to take. Professors routinely give students instruction on how to create line graphs, as each degree field has different requirements for what variables go on which axes. However, instruction in how to create other efficient data visuals is absent.

I noticed that students gravitated toward both more efficient answers and more complex data visuals after exposure. Again, as defined in my results, I consider complex data visuals A, B, C, D, H, N, O, P as complex data visuals (page three of Appendix A), and I consider them “complex” because basic spreadsheet software does not include them and therefore students must make them by hand. Possibly, students chose more complex visuals because of the exposure. As such, students might have chosen the complex visuals because they finally knew what to use them for, whereas before they might not have known how to use them. The fact that 10% (or less) of the participating students reported using those data visuals and the increases in standard deviations for most of the participant groups after exposure supports this idea. Understanding complexity and efficiency of data visuals might not directly tie to Desnoyers’ taxonomy, but it could help students understand the kind of visuals used in journals and other STEM fields.

Perhaps STEM students are unusual in the number of data visuals that they create, thereby marking them as a different subset of students that technical communicators should study as part of data visualization research. This supports the research done by David Hutto (2007) that found that working STEM professionals “record information in graphic form and [use] graphics during design work” (p. 88), except in this case students and not STEM professionals record information in visual form. Without proper instruction on how to create data visuals, though, students waste time and possibly miss data while recording data in improper formats while they attempt to figure out how to create visuals on their own. This idea generalizes to all users of data visuals; even technical communication professionals could waste time and miss data if they lack proper instruction on how to create and how to choose data visuals in accordance with the data and the purpose of the intended visual.

Desnoyers’ Taxonomy’s Potential Use

We can see from the results that upperclassmen and underclassmen had almost identical results during this experimental study. Admittedly, the exposure to Desnoyers’ taxonomy was brief (only five minutes), so perhaps a more substantial and interactive exposure would reveal how prior knowledge might create differences between the two groups.

So how does this result influence our choice of whether or not to continue pursuing Desnoyers’ taxonomy as a useful way of sorting data visuals? First, Desnoyers’ taxonomy requires knowledge of how to differentiate between different kinds of data (that is, spatial, temporal, intrinsic properties, for example). Second, Desnoyers’ taxonomy requires knowledge of what features in a given data set one want to highlight, if at all. The lack of a difference between underclassmen and upperclassmen means that their knowledge (or lack thereof) of these two factors is very similar. This then implies that upperclassmen do not receive further knowledge in their upper level courses about how to create data visuals or why/when to use them. With this, perhaps students would benefit from having more exposure and training in efficient data visuals starting their freshmen year and increasing in complexity by senior year. By starting simply, we can help students grasp the core concepts of data visual creation (Dragga & Voss, 2001; Heer, Bostock, & Ogievetsky, 2010; Tufte, 1983). Core concepts allow students to better differentiate between different kinds of data while helping them understand what information they can highlight from the data. Then, by increasing the complexity of data visuals as students advance through college, we can steer students toward efficient visuals regardless of the methods needed to create the visual. Desnoyers’ taxonomy, due to its inherent organization, offers a logical framework to present visuals, and could easily be manipulated by educators to focus on either simple or complex visuals and how they relate to other visuals.

Limitations to the Study

I proposed and confirmed that students would gravitate toward more efficient data visuals after exposure to Desnoyers’ taxonomy. However, for several reasons, we cannot be sure that the exposure to the Desnoyers’ taxonomy was the only factor of influence in this study.

Possibly, descriptions of different types of data visuals during the exposure introduced students to new data visuals. For example, after the survey students asked me many questions about morphograms (i.e. spider charts).The descriptions of how to properly use the different kinds of data visuals might have helped the students understand how to properly use data visuals more so than the exposure to the taxonomy itself.

Secondly, the increase in efficient data visuals may also be attributed to learning because of thinking about the visualization twice. Remember, the data sets on both the pretest and posttest were identical, just in different orders, so the students had to think about how to present the data twice. This leads to a takeaway specifically for university and college level faculty, in that perhaps students need more exposure to a wide variety of data visuals and explanations of how to properly create and use them, regardless of whether that exposure comes from Desnoyers’ taxonomy. Every faculty member has a different understanding of data visuals, so by having multiple explanations from different perspectives, students can think about visualization in different ways resulting in the creation and use of more efficient data visuals.

Perhaps the largest limitation to this study is the lack of a control group with alternative instruction. While I chose an exploratory study where control groups are not required, the results highlight the need for such a comprehensive study, leaving open the door for future research.

Future Research

This study supports other calls to research about data visual sorting schemes and opens the doors for several specific studies. First, technical communicators should see if STEM students are unusual in the number of data visuals they create. If STEM students are unusual, then that difference as compared to humanities students can open an entirely new area of data visualization research. Second, a comprehensive study that compares Desnoyers’ taxonomy to “just” explanations of data visuals will answer the question of whether or not it can fill that void of a consistent lexicon. Third, additional testing should be on the differences in data visual efficiency between degree fields, so as to highlight where we can focus further efforts of data visualization standardization.

Finally, the results of this study suggest both that students need more exposure to data visuals and that Desnoyers’ taxonomy can help prompt students to ask the needed questions when trying to match data sets to data visuals. As such, it might be worthwhile to do a study about how we could adopt this taxonomy as a teaching tool. For example, drawing teaching methods from life science fields and how they use the biological taxonomy developed by Carolus Linnaeus as a framework to teach classification of living things.

Conclusion

My study does not confirm that Desnoyers’ taxonomy is the perfect way to sort data visuals, but it does show that the taxonomy has potential. It supports the idea that a consistent sorting scheme will help scientists and STEM academia choose efficient data visuals to present their data. Again, aesthetic efficiency for data visuals has been defined, but efficiency for their usage has not. Such a sorting scheme will help everyone to focus on the creation of efficient data visuals themselves rather than focusing on defining efficiency for scattered types of visuals. Efficient data visuals will support the recent advances in science, and will make those advances more easily understood by those not in the respective scientific field of development. Additionally, technical communication research in data visualization would benefit from a consistent way of referring to data visuals. Consistent lexicons will allow researchers to focus on the visuals themselves and not the terminology itself. Ultimately, if further studies show that it can increase awareness of data visuals’ efficiency, then researchers could confidently adopt Desnoyers’ taxonomy as a way to teach and reference data visuals consistently.

Acknowledgments

Thanks be to God who gave me the idea for this research topic. Many thanks to the assistance of three professors at NMT who let me conduct the study in their classes. Additionally, thank you to Elisabeth Kramer-Simpson, who assisted me with setting up the initial research, and Mark Samuels, who reviewed my calculations for accuracy.

Appendix: Visualizing STEM Data Pretest

[Note: Question numbers added during data analysis to make it easier to reference in the write up. Also, the letters in italics are the efficient answers for the given data sets.]

1) What is/are your major(s)?

2) Check which applies

    ___ Freshman/Sophomore  ___ Graduate

    ___ Junior/Senior       ___ Other

3) How many Data Visuals (such as tables, charts, graphs, figures, maps, diagrams, etc.) did you create for school in the last 12 months? (See page 3 for examples) 

___ 0-10     ___ 11-20     ___21-31     ___31-40     ____41+

4) How many research projects and/or papers have you participated in/completed in the past 12 months?

5). Which 3 Data Visuals from the list on page 3 did you use the most often in the past 12 months? (List the letters in the space below.)

6) How much do you care about presenting data in the most effective manner possible? 

 1----------2------------3-----------4-----------5----------6----------7

For the next 10 data sets, mark the letter of the Data Visuals listed on page 3 that you would most likely use to present the data (you can re-use letters)

7) Data Set 1: You are writing a memo about a new widget in the lab and you need to include all of the widget’s functions. 

    Letter of the Data Visual _____ (E, F)

8) Data Set 2: Your team discovered a new species of iguana and want to show in a report how many iguanas in the world are part of that species. 

    Letter of the Data Visual _____ (L, M)

9) Data Set 3: In a lab report, you want to show the change in the angular momentum of a wheel over time. 

    Letter of the Data Visual _____ (J, K)

10) Data Set 4: As part of your senior project, you need to compare the durability of five types of wood resin composites. 

    Letter of the Data Visual _____ (F, I, L)

11) Data Set 5: Over the course of a year, you measured trees on campus and noticed that some are growing faster than others are. You want to figure out where trees grow faster based on location. 

    Letter of the Data Visual _____ (A, L)

12) Data Set 6: You need to compare the parts of widget green to widget blue. You decide to compare them based on materials used to build the parts. 

    Letter of the Data Visual _____ (F, N, O)

13) Data Set 7: You noticed a patch of lichen growing outside the Chemistry building and decided to measure the lichen’s diameter over the course of year, with bi-monthly measurements. After gathering the data, you decided to show the relative change in diameter as it grew. 

    Letter of the Data Visual _____ (D, J, L)

14) Data Set 8: You need to illustrate the difference in assembly between a concrete bridge and a metal bridge. 

    Letter of the Data Visual _____ (C)

15) Data Set 9: After cleaning up your dorm room, you decide to do the very techie thing and compare three different cleaning compounds to see which chemicals are similar and which ones are different. 

    Letter of the Data Visual _____ (F, H)

16) Data Set 10: For your junior project, your team decides to compare the horsepower, torque, and price of three engines to see which one would be the best for your purple widget. 

    Letter of the Data Visual _____ (F, N, O)

rayl_Pg3_of_Appendix

References

Bartell, A. L., Schultz, L. D., Spyridakis, J. H., (2006). The effect of heading frequency on comprehension of print versus online information. Technical Communication, 53(4), 416-426.

Cairo, A. (2012). The functional art: An introduction to information graphics and visualization. Berkeley, California: New Riders.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. Los Angeles, California: Sage Publication, Inc.

Desnoyers, L. (2011). Toward a taxonomy of visuals in science communication. Technical Communication, 58(2), 119-134.

Dragga, S., & Voss, D. (2001). Cruel pies: The inhumanity of technical illustrations. Technical Communication, 48(3), 265-274.

Emanuel, R., & Challons-Lipton, S. (2013). Visual literacy and the digital native: Another look. Journal of Visual Literacy, 32(1), 7-26.

Frankel, F. C., & DePace, A. H. (2012). Visual strategies: A practical guide to graphics for scientists and engineers. New Haven, Connecticut: Yale University Press.

Finson, K., & Pederson, J. (2011). What are visual data and what utility do they have in science education? Journal of Visual Literacy, 30(1), 66-85.

Gorodov, E. Y., & Gubarev, V. V. (2013). Analytical review of data visualization methods in application to big data. Journal of Electrical and Computer Engineering, 1-7.

Greeno, C. G. (2002). Major alternatives to the classic experimental design. Family Process, 41(4), 733-736.

Heer, J., Bostock, M., & Ogievetsky, V. (2010). A tour through the visualization zoo: A survey of powerful visualization techniques from the obvious to the obscure. Communications of the ACM, 53(6), 59-67.

Hutto, D. (2007). Graphics and invention in engineering writing. Technical Communication, 54(1), 88-98.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), DOI: 10.1371/journal.pmed.0020124.

Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics, 8(1), 1-8.

Kostelnick, C. (2008). The visual rhetoric of data displays: The conundrum of clarity. IEEE Transactions on Professional Communication, 51(1), 116-130.

Lipsa, D. R., Laramee, R. S., Cox, S. J., Roberts, J. C., Walker, R., Borkin, M. A., & Pfister, H. (2012). Visualization for the physical sciences. Computer Graphics Forum, 31(6), 2317-2347.

MacDonald-Ross, M. (1977). How numbers are shown: A review of research on the presentation of quantitative data in texts. AV Communication Review, Winter, 359-409.

Meirelles, I. (2013). Design for information: An introduction to the histories, theories, and best practices behind effective information visualizations. Beverly, Massachusetts: Rockport Publishers.

Ménard, E., & Dorey, J. (2014). TIIARA: A new bilingual taxonomy for image indexing. Knowledge Organization, 41(2), 113-122.

Ravid, R. (2011). Practical statistics for educators. Lanham, Maryland: Rowman & Littlefield Publishers, Inc.

Rybarczyk, B. (2011). Visual literacy in biology: A comparison of visual representations in textbooks and journal articles. Journal of College Science Teaching, 41(1), 106-114.

Shedroff, N. (2000). Information interaction design: A unified theory of design. In R. Jacobson (Ed.), Information Design (pp. 267-292). Cambridge, MA: MIT Press.

Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, Connecticut: Graphics Press.

About the Author

Rachel Rayl is an undergraduate student in the CLASS department at the New Mexico Institute of Mining and Technology (NMT). Last year, she co-authored another article about data visualization in the European Scientific Journal. During her time at NMT, she has worked with academic, nonprofit, and industry clients designing data visualizations along with providing other technical communication support. Contact: rrayl316@gmail.com

Manuscript received 11 June 2015; revised 25 August 2015; accepted 25 August 2015.