69.1 February 2022

Signaling Context in Topic-Based Writing

doi: https://doi.org/10.55177/tc812725

By Jason Swarts


In topic-based writing delivered as web help or interactive PDF, readers are able to access topics non-linearly, reading only those topics they feel a need to read. Consequently, readers can easily lose a sense of a topic’s broader context of related topics and concepts, which is knowledge presumed of a “qualified reader.”

Purpose: This paper investigates how relative “that” and “which” clauses are used to signal context in writing that is intended to be free of obligatory contextual connections to other topics in a documentation set.

Method: This analysis relies on a computer-assisted, descriptive analysis of relative pronoun use in a corpus of published, topic-based documentation. The analysis focuses on “that” and “which,” typically used in English to refer to and add information (e.g., a context) about an antecedent noun.

Results: Relative “that” and “which” clauses are shown to be used in a variety of ways in topic-based writing to signal associations between topics, making it easier for readers who need context to find it.

Conclusions: The author offers implications for writing practice that include deliberate, strategic use of “that” and “which” and complementary documentation design that enables readers to locate contextual information signaled by those pronouns.

KEYWORDS: topics, context, documentation, user experience, navigation

Practitioner’s Takeaway

  • Writers should choose, deliberately, when to use “that” clauses in their documentation. Use “that” to signal important contextual information because readers may be attuned to expect important information to follow use of “that.”
  • Writers should follow the use of “that” and “which” with concrete, task-oriented words that are distinct enough to be found in the titles and headings of other topics.
  • Students should be taught to pay attention to “that” and “which” as syntactic cues that may assist readers by signaling context. This signaling function may be one, among others, that writers should be trained to recognize.
  • The use of “that” as a contextual signal may indicate the need for usability research on interpretive and navigational ambiguities that are part of the user experience of topic-based writing online. How vital are syntactic cues at shaping user experience?


In the professional technical communication context of structured authoring, content is challenging to create because its topics are not necessarily coherent or unified in the way they are in static documentation. Topics are:

Freed from the confines of static documents
. . . [and] conform to rules defined by standards and schemas, which ensure that the topics are consistently structured and can be assembled into different information products that are rendered in different outputs for different delivery channels. (Andersen, 2013, p. 116)

Conventional strategies for creating coherence frustrate the style of writing Andersen describes. For one, making text stick together in large sections (e.g., chapters) can resist recombination into new outputs. The writing Andersen describes is topic-based, and it has been around for decades, perhaps most visibly following the increased adoption of component content management systems (Batova, 2014) and information models like the Darwin Information Typing Architecture (DITA), which made it possible to create content for multiple document outputs (Priestley, Hargis, & Carpenter, 2001; Rockley, 2001).

Topic-based writing, especially that which is intended for delivery as web help or interactive PDF, requires content that is modular, highly specific and “bite-sized” (Andersen, 2013, p. 126). These topics must be “highly adaptable and portable” and “not limited to one purpose, technology or output” (2013, p. 131). Topics that are both highly portable and capable of supporting multiple uses must not be bound to any one, larger context of interpretation that runs across multiple topics. Instead, topics should be understandable outside of any particular context while still providing readers guidance toward a broader context into which the content potentially fits (Baker, 2013, p. 113).

Technical communicators have been doing topic-based writing for at least 20 years. It is nearly 30 years if we trace the origins to John Carroll (1990) and minimalism as Carlos Evia (2019) does. It is 50 years if we trace to work that Robert Horn and his colleagues were doing on information mapping and block-style information development (Horn, Nicol, Kleinman, & Grace, 1969). Regardless, writers have had lots of practice, and this experience suggests that writers will have developed practices for building contextual coherence without fixing a topic to any particular context of meaning. Yet, much of the advice that appears about topic-based writing focuses on what topics avoid doing rather than on what they do. Topics:

  • have no interpretive obligation to other content (Baker, 2013, p. 76; Rockley, Manning, & Cooper, 2009, p. 24); and
  • are devoid of referential material and relational language (Bellamy, Carey, & Schlotfeldt, 2012, p. 196).

Topics are more complicated than this. If Eble (2003) is correct, topics do still need to signal a range of contexts into which they potentially fit and become coherent.

The purpose of this research is to examine topic-based writing, delivered as web help or interactive PDF, to identify how those topics build a coherent sense of the whole by using the relative pronouns “that” and “which” to signal contexts to which topics potentially connect. My approach is to work through a computer-assisted, corpus analysis of topic-based writing. Analyzing a corpus of topics allows us to get around a problem of scale. We can analyze broad language patterns in a corpus that then guide a more traditional, close analysis of texts selected as representative of the broader language pattern. In this case, analysis will focus on a language marker associated with context formation: the relative pronouns “that” and “which” (Fabb, 1990). There are other ways to approach the same analysis of context signaling, however. For example, words that refer (this, that, the, a, an) or words that join or link ideas (and, so, consequently, or) direct reader attention to related content (see Halliday & Matthiessen, 2004, pp. 536–537). My more limited approach should only be taken as an indication of one syntactic choice that writers should be aware of.

Leading up to the corpus analysis, I review the literature on topic-based writing to better articulate what topics should do to signal context. Particularly, I focus on the assumption of the “qualified reader” (Baker, 2013, pp. 156–157), an audience type that writers should plan for and support. I then review literature on “that” and “which,” used as relative pronouns, to support using them as an indicator of context signaling in and across topics. Qualified readers, I argue, become qualified when writers signal a context and signpost where the contextual information can be found. I then walk through a corpus analysis of topic-based writing, illustrate the findings with some examples, and finally address some implications for practice, teaching, and research.

Topic-Based Writing

Authors who write about topics are consistent in their descriptions of them. According to Rockley, Manning, and Cooper (2009), who identify one common set of characteristics, a topic (in part):

  • Is “designed to stand on [its] own with cross-references to other topics” (p. 4)
  • Is “a discrete piece of content that is about a specific subject, has an identifiable purpose, and can stand alone” (p. 24)
  • “Answers a single question” (p. 46)
  • “Consist[s] of only one subject” (p. 46)
  • Must “be read without the need to read any preceding or following information to understand it” (p. 47)
  • Is “not specific to any one usage” (p. 47)

Topics do this work because they have “wording that is independent of any other input” (Evia, 2019, p. 53). They are self-contained and complete with their own purpose statements and context (Baker, 2013, p. 86). Their language makes no “assumptions about where [readers] have come from and enforces no prescription about where they should go next” (Baker, 2011, para. 24).

Outside of including “see also” links, topics should avoid making reference to other topics or to a single context that the reader must understand to use the topic they have accessed (Bellamy et al., 2012, p. 196). And if topic-based writing derives from John Carroll’s work on minimalism, it should also rely less on “control” language that we put in topics to order and sequence information in the manner we want readers to receive it (Carroll, 1990, p. 9; Ament, 2003, p. 5), with the intent to “reduce the interference in a user understanding content” (Gillespie, 2017, p. 2).

We also have contradictory explanations about topic independence. Baker (2013) notes a simultaneous need for topics to be both self-contained (i.e., meaning is not dependent on context) but also contextual by putting a topic into a broader context (pp. 118, 155; see also Eble, 2003). As Evia notes, “trying to define the concept of topic in rhetoric is not that easy” (2019, p. 53), but doing so is important to the adoption of topic-based writing (Andersen & Batova, 2015; Flanagan, 2015).

Baker (2013) describes the audience for topics as the “qualified reader,” which he defines as “a reader who knows everything needed to perform the specific and limited purpose of the topic except the specifics of the case that the topic covers” (2013, p. 127). For example, a topic about using collections in a citation manager may not talk about how one adds items to a citation library, but the topic likely assumes that readers either already know how to add items or can recognize that they should learn. However, we cannot assume that readers come to our topics as fully qualified readers. One additional purpose of topic-based writing is to support development of qualified readers, as needed. “Readers who are not fully qualified can read other topics to get the information they need” (Baker, 2013, p. 78), except that they need to know what other topics to read. That is, they need awareness of the context that they are assumed to know.

To be clearer, consider the challenge of moving content from a static book-based writing style to a topic-based one. Following the stylistic guidelines described in the bullet list above would not lead to very successful topics. “Simply chunking your chapter or narrative-based content into separate topics is not an effective way to make the transition to topic-based authoring. Think about how much connective meaning occurs when you write a more narratively structured chapter” (Samuels, 2013, para 6). The reason, as Samuels points out, is that writers rely on a great deal of “connective” meanings to build a sense of a coherent whole: how one topic fits with others. Qualified readers are those who either know the topical context ahead of time or can inform themselves by reading the topics needed to round out that sense of context. The implication is that topics that effectively address qualified readers have some element of connective language that signals the broader contexts into which a topic fits (Eble, 2003, p. 345, also Rockley, 2001, p. 192). The writer’s goal is to maximize rather than to minimize the potential for combining topics with other topics, all while not letting topic contexts collide or restrict one another by creating topic dependencies that curtail topic reuse.

Others writing about topics agree that some broader coherence is required for readers to use topic-based content effectively. Kantner, Shroyer, and Rosenbaum (2002) say that effectively written content must support readers becoming oriented to the content and how that content relates to other content (p. 340). And Wachter-Boettcher (2012) argues that chunks of content still need to signal their relationship to the whole (p. 47). But creating this coherence without locking readers in to one way of reading the content across topics is the challenge.

Regarding coherence, Williams (1997) writes that the problem is that coherence is not “directly ‘in’ what you write. Like all other characteristics of prose, it is created by your readers out of what you put before them” (p. 101). What do writers put before their readers that kindles the coherence that supports qualified readers? It must be subtle but strong enough that readers “will infer logical connections that you do not state” (p. 101). Readers “will impose a shape on parts that you do not explicitly relate” (p. 101). But to do this, “readers need some cues, some signals of the coherence in your own mind” (p. 101).

Language for Creating Coherence

When studying language, we can pay attention to words that carry the content, and through that approach derive an understanding of what a text is about (Scott, 1997). Another approach is to examine words that do not carry explicit meaning but instead link other words and concepts together. These words perform a function, such as pointing, relating, connecting. Function words are those “that connect, shape, and organize content words” (Pennebaker, 2011, p. 22). This function language is subtle enough that readers may not notice, but it shapes how we read and pay attention. Furthermore, function words are a good point of focus because it is through them that we can see subtle choices writers make to optimize comprehension (e.g., Kohl, 1999, p. 149) and build coherence.

Coherence provides a sense of focus (Williams, 1997, p. 106). When looking for coherence, readers will look for ways that fragments of information can be connected (Brown & Yule, 1983, p. 224). We achieve coherence by relying on recognizable genre forms or other conventional structures, like conversational turns, that help readers see how fragments of conversation and text are connected (see Stubbs, 1996, Chapter 8; Geisler & Swarts, 2019, Chapter 3). Readers also rely on explicit cognitive markers (e.g., words like “because” and “therefore”), which help them process text and intuit relationships between the content one is reading and the content one has yet to read. Such words create a framework for readers to “integrate new information with information already stated previously” (Sanders, Land, & Mulder, 2007, p. 220). But these markers can also be confining when specifying too exact a connection.

Writers also signal coherence through the use of structural features like headings, previews, and logical connectors (see, for example, Spyridakis, 1989; Horn et al., 1969), particularly in long and complex documents. When these markers are in place, readers are less confused and better able to recall content (Sanders & Noordman, 2000). The stronger the stated relationship between pieces of content, the better the recall and relationship building (p. 52).

Yet for these strategies to build coherence, they need to be visible to the readers but not so visible that they become obstructive uses of control language that detract from the value of having highly flexible and modular content. An example comes from the Motorola documentation for the mg7550 router:

To Change the Network Name and Password

For the 5 GHz band:

  1. Select and delete the old Network Name, then type in the new Network Name.
  2. Click the Save button.
  3. You can click the Show Key box to check your typing for Password.
  4. Select and delete the old Password, then type in the new Password.
  5. Click the Save button.

For the 2.4 GHz band:

  1. Select and delete the old Network Name, then type in the new Network Name.
  2. Click the Save button.
  3. You can click the Show Key box to check your typing for Password.
  4. Select and delete the old Password, then type in the new Password.
  5. Click the Save button (Motorola, 2020, p. 33).

The documentation references the context of changing wireless settings, and the continuous numbering across steps for the 5 GHz band and 2.4 GHz band indicates that these steps are part of a single task. The assumption, apparent in the title of the section, is that the reader’s intent is to rename both bands and change their password. However, if one is accessing the content (via search) to determine how to manipulate the bands by splitting and renaming them, as some internet-capable appliances (e.g., outlets, water heaters) require, the steps make too many assumptions that are not appropriate to the task. Furthermore, instructing the reader to “type in the new Password” suggests that the bands remain combined, accessible by a single password.

Function words might also serve a purpose that structural coherence markers cannot. Depending on the kind of function words used, they might achieve both subtlety in visual presence but also emphasis in terms of their perceived importance for understanding what is written. In English, much has been written, particularly, about the use of “which” and “that” as relative pronouns that introduce restrictive and non-restrictive modifying clauses. Traditionally, “that” has been thought of as a “restrictive” (essential) modifier that defines, and “which” has been thought of as a “non-restrictive” (non-essential) modifier that does not necessarily define (Fowler & Crystal, 2009). However, this common-sense (to many) look at language use is defied by the examination of language in use, which shows both “that” and “which” used in restrictive and non-restrictive contexts (Bache & Jakobsen, 1980, p. 253). They both signal the addition of information for the nouns being modified.

A more reliable way of differentiating between restrictive and non-restrictive uses of “that” and “which” concerns their “presentation” effect. According to Bache and Jakobsen (1980), the use of a restrictive modifier is called for in situations where the receiver of the spoken or written sentence is thought to need some assistance in understanding what the addresser is talking about, versus adding supplemental information to what the receiver already likely understands. Authors might choose “that” in situations where they anticipate ambiguity or anticipate that readers will experience a lack of information (Temperley, 2003, p. 467):

A: “You need to press the button that is flashing.”

B: “You need to press the button, which is on your left.”

In both examples, “that” and “which” introduce more information and context for understanding. It is the button that is flashing (A). It is the button on your left (B). The difference is that A uses “that” to signal an essential modifier. Presumably, the information following “that” cannot be omitted from the sentence without altering its meaning. In B, “which” introduces a non-essential modifier, which presumably can be removed; although, arguably, the clause does disambiguate which button (i.e., not the one on the right).

“That” and “which” work as pronouns whose function is to replace, but also to connect, antecedent nouns to clauses that modify them, thereby adding information that may signal a context or contexts to which the antecedent noun is connected. “That” implies that the added information is essential and defining. “That,” plus its modifying clause, adds information about a noun to distinguish it in some way from similar nouns. This function has been called “co-indexing” (Fabb, 1990, p. 58) in that it both points to the antecedent as a necessary relationship and points to a thing in the world that the restrictive modifier must also index (p. 76). “Which” may achieve the same effect, the only difference being a conditional expectation that “which” traditionally introduces non-essential information.

Hinrichs, Szmrecsanyi, and Bohmann (2015) complicate the picture of “that” and “which” by showing, via corpus analysis, how the use of “that” has increased over time, relative to the use of “which,” and that the increase is not attributable only to the increasing complexity of writing or to the intrusive influence of automated grammar checkers. Instead, their analysis of the increase in “that” attributes influence to norms of writing imposed by editors and editorial guides, like style guides (Hinrichs, Szmrecsanyi, and Bohmann, 2015, pp. 825, 828). For example: “Use ‘that,’ without a comma, to introduce a restrictive clause. Use ‘which,’ preceded by a comma, to introduce a nonrestrictive clause” (IBM, 2014). Similarly, style guides recommend using “that” and “which” to clarify content that will be translated from English to other languages (see Akis et al., 2003; Clark, 2009; Kohl, 1999). Although the association of “that” with restrictive clauses and “which” with non-restrictive clauses may not be so categorical in natural language, editorially-shaped writing like technical documentation likely shows a tendency in this direction.

Given that topic-based writing is modular in its development and presentation and that it relies on strategies of minimalism, we might expect “that” and “which” to play a significant role in signaling context in topic-based writing. Further, given some editorial preference for “that” (compared to “which”) for signaling a stronger relationship to context, we might also expect to find “that” being used more often and differently than “which” for building coherence. This expectation forms the basis of the research question pursued in this paper: How does topic-based writing, delivered as web help or interactive PDF, use relative “that” and “which” clauses for context signaling? This question requires examination of a large body of topic-based writing.


Questions about writing style require a large enough body of data to see large-scale language patterns. Corpus analysis utilizes computational support from corpus linguistic software to search for language phenomena across a large corpus and to test the degree to which a language phenomenon is present in and distributed throughout that corpus. Language phenomena that are both present and well-distributed in a corpus of similar texts are arguably part of that writing style. The software then allows one to quantitatively describe the degree to which the language phenomena are present and distributed (see Brezina, 2018).

Data Collection

The question driving this study required the creation of a corpus of documentation written using methodologies of topic-based authoring. To build up this corpus, I solicited input from technical communicators who work at organizations where documentation is produced as topics.

To collect samples of documentation for the topic-based writing corpus, I reached out to practicing technical communicators through chapters of the Society for Technical Communication (21 chapters) and to directors of academic programs in technical communication (18 directors) who could pass the survey request to alumni working in the profession. I cannot know the total number of people who ultimately received the survey invitation.

Of those who received the request, I asked which of the following described the approach that they take to writing at their places of work:

  • “I produce ‘topic-based writing’ which consists of standalone topics (i.e., content chunks) that can be reused in different contexts.”
  • “I produce ‘book-oriented writing’ (or document-oriented writing) which consists of content designed for a singular use and context of delivery (e.g., a user manual).”

Of 35 responses, 49% (17) responded that they produce “topic-based writing” (TBW), 34% (12) said they produce “book-based writing” (BBW), and 17% (6) replied that they produce both kinds. The respondents represent a range of professional sectors:

  • IT (including software/hardware design and testing): 20
  • Business and Financial Services: 4
  • Education and Training: 2
  • Community and Social Services: 2
  • Medical and Healthcare: 1
  • Legal: 1
  • Architecture and Engineering: 1
  • Other: 12

The survey allowed participants to select more than one option.

As I requested, all respondents directed me to at least one published example of documentation that they produce. All of the samples of documentation were publicly available on company websites. Although the resulting corpus essentially reflects a convenience sample of documentation, and although I cannot account for differences in the respondents’ experience as writers, all of the samples included in the corpus represent topic-based writing that were of a quality high enough to pass editorial review at their companies. Furthermore, because topic-based documentation is often written in teams, the relative experience or lack of experience for any one respondent is unlikely to skew the analysis of language features.

All samples of documentation were downloaded and arranged into a folder holding the corpus. To reduce the chance of a text sampling bias (see Brezina, 2018, p. 16), I selected into the TBW corpus whole documentation sets (i.e., all topics included in an outputted set of documentation) into both corpora. In this way, analysis of the corpus is equally likely to draw from the beginning, middle, or end of a documentation set and equally likely to be drawn from topics/sections concerning different kinds of user interactions (e.g., installation, account setup, etc.).

One hundred and twenty-three full documentation sets are included in the corpus. Although the documentation sets come from 15 different companies, it is possible that some of the topics overlap, meaning that some exact phrasing choices may be duplicated in the full corpus. I did not attempt to control for that possibility, so it may be considered a limitation of the corpus. The resulting TBW corpus consists of 1,342 files; 6,519,854 tokens; and 134,121 distinct word types.

Data Processing and Analysis

The files were analyzed using the free corpus analysis software, Lancsbox, distributed by Lancaster University (Brezina, Weill-Tessier, & McEnery, 2020). Lancsbox supports most basic visual and statistical analyses of corpora. I utilized the part-of-speech (POS) tagging in Lancsbox, which marked all tokens in the corpus files with a part of speech tag using the Penn Treebank POS markers.

Lancsbox supports comparative searching based on POS tag. The result of that search yielded a list of determiners, including “that” and “which.” The majority of “that” and “which” instances were tagged as “WDT”: wh-determiners. However, subsequent analysis shows that most of these WDT uses of the “that” and “which” are as relative pronouns, which is what the analysis in this paper focuses on. By default, in Lancsbox, the list of positive search results also showed a context. I set the context parameters to show 30 words to the left and right of “that” and “which,” which is enough to establish the immediate context for the word.

The main research question asks how relative “that” and “which” clauses are used in the TBW corpus. Beyond the quantitative measures of rate of use and dispersion, the question asks whether the “that” and “which” clauses are used differently in the TBW corpus.

To facilitate the qualitative analysis, I drew a random 5% sample of all relative “that” and “which” clauses in the TBW corpus. After isolating relative “that” and “which” clauses, I highlighted the head noun (i.e., that noun being modified) as well as the modifying clause. For each, I applied one of the following codes:

  • Topic: code a modifying clause as “topic” if it clarifies, specifies, or renames the head noun, often by reference to content immediately preceding the modified noun or noun phrase.
  • Context: code a modifying clause as “context” if it adds information to the head noun by connecting it with new topics. The modifier points readers to additional actors (e.g., functions, systems, hardware) or actions (e.g., validate, verify) that are not described or derived from content to the left of the head noun.

These codes arose out of consideration of the kind of information that might be added to a topic, and so I focused on the work of “that” and “which,” used as relative pronouns. “Topic” information would be confined to the topic at hand and would leave the topic cohesive without pointing readers elsewhere in the documentation set. “Context” information would acknowledge that there are additional topics that a qualified reader should either know or reference. The codes were verified with a second coder, resulting in 88.6% simple reliability, corrected with Cohen’s Kappa to 0.77, indicating substantial agreement (Landis & Koch, 1977).


Quantitative analysis of the corpus shows that writers of topics use “that” more frequently than “which.” Writers used “that” 25,788 times (0.4% of the corpus) and “which” only 9,439 (0.1% of the corpus). The use of both was somewhat dispersed throughout the corpus with “that” appearing in 993 of 1342 files (73.9%) and “which” appearing in 791 of 1342 files (58.9%). These frequencies and dispersions are difficult to explain without additional research, but they appear to show that writers were choosing essential modification more commonly than non-essential modification. The framework pursued in this study sought to understand a potential role that “that” might play in providing a sense of rhetorical context that could help readers of topics become “qualified readers” (Baker, 2013). The coding of modifying clauses following “that” and “which” reveal some patterns of difference.

The results of coding a random sample of 300 relative “that” and 300 “which” clauses from the TBW corpus show that when topic-based writers used “that,” the information following was slightly more likely to be “context” information (157 instances, 52.3%) than “topic” information (143 instances, 47.7%). When the writers used “which,” the difference was more pronounced, and in the other direction, with “context” information being referenced 106 times (35.3%) and topic information being referenced 194 times (64.7%). The finding suggests that uses of “which” are more strongly associated with additional information that is found within the topic whereas uses of “that” are more likely to point to information that is not within the topic. These “context” references pointed to actors and functions found elsewhere in the documentation.

Overall, the qualitative coding of modifying clauses associated with “that” suggests that writers of topic-based documentation are using the modifying clauses to introduce or reference contextual and topic information that is germane to the noun being modified.


The coding of “that” and “which” suggests that the information writers deemed to be important, but potentially unknown, to readers is contextual in scope slightly more often than it is topical in scope. When writers conventionally signal a nonessential modifier using “which,” the additional information is more likely to be in the topic. This finding suggests that writers may use “that” for building coherence across topics, by pointing to content that is outside of, but important context for, the topic. The discussion that follows will focus on elaborating some examples of both kinds of modification to show what kinds of context are revealed and how.

Topic-Referring Modifiers

The first function for relative “that” and “which” clauses can be described as “topic-referring” modification. These are modifying references that complete topics by referring readers to content that has already been discussed within the topic.

On one hand are modifying clauses that merely add what appears to be non-essential information to the topic. These are clauses that are commonly, but not exclusively, introduced by “which”:

  • “Deny: Specify the hosts or networks for which access is denied” (Hitachi, 2019b).
  • “All other marks not owned by us that appear herein” (Vernier, 2019).
  • “The IP address of the client application that made the request” (NetApp, 2019b).
  • “ … savings to filter out opportunities that have less than that amount of savings. Select one or more departments by which to filter opportunities” (Strata Decision Technology, 2019).

In the above examples, and in others like them, the modifying clauses add small amounts of information to the head nouns or noun phrases that are modified. In some cases, the modifying information is inferable from the content that immediately preceded it. For example, “to filter our opportunities that have less than that amount of savings” directly follows the heading “Identified Savings Threshold,” which establishes that the section is about savings thresholds. Other examples add distinctly nonessential information, such as indicating that a dialogue box will appear.

The expectation, apparently, is that what readers need to know about the noun being modified is already known. The modifying information appears to offer minimal improvement to coherence across topics, perhaps in some cases just to differentiate similar pieces of information that might be visible on the screen. An example is a tip that “You can filter the list of connections by displaying only connections that are selected (Checked) or by the status of the connection” (Tanium, 2019a, p. 33).

The more interesting topic-referring modifiers are those that include references to other actors, functions, subsystems, settings, peripherals, and the like that are important for understanding the topics where they appear and are explained within the same topic. Often, these references point to content that is adjacent to the modifying clause. Examples include:

  • “Click the DR [disaster recovery] plan associated with the virtual machine. The DR plan details page opens. 6. In the VMs tab, select the virtual machine(s) for which you want to enable the disaster recovery service” (Druva, 2019c).
  • “ … for File Services Manager to monitor quotas. The initial setting is no quota monitoring. If you omit this option, the current setting information applies monitoring-time [, monitoring-time…] Specify the times at which File Services Manager monitors quotas” (Hitachi, 2019c).
  • “Create a Profile which has only Cloud Apps enabled and settings configured in it” (Druva, 2019a).
  • “[Product] certifies backup and restore of databases that are created and managed using SQL Server 2017” (Druva, 2019b).

The modifying clauses in these examples create coherence by creating connections between user actions, system actions, as well as other actors and functions that are already mentioned in the topic. In the example above, from Hitachi, the clause “which File Services Manager monitors quotas” connects the times to the actor “File Services Manager” and to the function “monitors,” both of which were pieces of information established earlier in the same topic. The information a qualified reader needs is within the limited contextual scope of the topic. As self-contained topics, these can be more readily re-used in other output formats.

Here is another example to illustrate:


Shows a table of the fields used in the report. Drilldown reports (field level) Shows a table of the reports in the solution set that are associated with the fields in the report” (Hitachi, 2019a).

“That” is being used to introduce a clause that adds information about the “table of reports,” specifying that they are the ones that “are associated with the fields in the report.” The “report” mentioned in this modifying clause is the subject of the section where this information appears. Even readers who have accessed this topic without reading through related topics will have enough background knowledge from reading the topic to know which report is being referenced.

The type of topic-referring clauses discussed above account for the majority of topic modifying clauses found in the sample drawn from the TBW corpus. In two-thirds of samples analyzed, the topic-referring clauses were preceded by the use of “which.” In topic-referring modifying clauses, readers are not pointed very far away from the head noun or noun phrase to find the information that the author considered important for understanding the topic at hand.

Context-Referring Modifiers

A second function for relative “that” and “which” clauses are “context-referring.” These are uses that modify content in a topic with references to actors, actions, functions, settings, peripherals, and the like that are not present in the topic being read but are instead located elsewhere in the documentation and make up the broader context that makes a given topic coherent. The modifiers are presented as if the content is both 1) not assumed to be well enough known to go unsaid (Bache & Jacobssen, 1980), and 2) essential to understanding the broader set of topics that the current topic connects with. Some examples:

  • “When you upgrade a Citrix Virtual Apps and Desktops deployment:

o If you upgrade from a version that did not support CEIP, you are asked if you want to participate” (Citrix, 2019, p. 1030).

  • “Endpoint Count Select the maximum number of endpoints expected, including endpoints that connect to the Zone Server(s)” (Tanium, 2019b, p. 8).
  • “If you make schema changes to the APIs that were created ground up by you in TIBCO Business Studio for BusinessWorks, the Swagger for such APIs automatically gets updated by the TIBCO Business Studio for BusinessWorks” (TIBCO, 2019, p. 37).

These examples show sometimes complex references to user actions, system actions, as well as actors and functions associated with and discussed in other topics but that are connected to the head noun or noun phrase being modified. For example, the last item in the list above draws the reader’s attention to APIs that may have previously been created by the TIBCO Business Studio and asks the readers to consider if those APIs have been changed. Qualified readers (some at least) will be required to know something about those APIs. Similarly, the first two bulleted examples also specify actors that either clarify the Citrix Virtual Apps and Desktop version to be upgraded (item #1) or required by any Zone Server or Servers (item #2). Each of these references (i.e., CEIP, Zone Servers, APIs) are topics in their own rights. Often, the content referenced was in entirely different topic files, requiring readers to navigate to that content if they need it, to make themselves the qualified readers the writing presumed them to be.

A qualified reader or one who is seeking to become qualified, as presupposed by the following passage, can recognize the implicit reference to additional information and store it away or pursue it:

“You must configure a number of settings before the Archive Node can communicate with an external archival storage system that connects to the StorageGRID system through the S3 API” (NetApp, 2019a).

The “that” clause modifies “external archival storage system,” identifying it as the one that “connects to the StorageGRID system through the S3 API.” The essential modifier indicates that readers might find additional information about connections with the StorageGRID and connections via S3 API.

Key ideas following “that” are important clues about navigation and linking topics. The clues point out of the current topic to other topics, and readers who need additional clarification about the Archive Node can find StorageGRID system, S3 API, but also, importantly, the term “connect,” which is in the topic title “Configuring connection settings for S3 API.” Another example:

“To enable user mapping using LDAP, create a schema file that defines attributes and object classes recognized by the LDAP server configured by using OpenLDAP” (Hitachi, 2019e, p. 3).

The modifying clause does the same thing as in previous examples. It is pointing to additional information, such as “attributes and object classes” and to the “LDAP server” and “OpenLDAP” by implied/reduced relative modification (i.e., [that are] recognized by the LDAP server [that is] configured using OpenLDAP). These “attributes and object classes” represent a direct reference to a different topic. The “LDAP server” points not just to different topics, but to different product documentation entirely (i.e., OpenLDAP) and then to the topic of object classes and attributes recognized by the system. Although this example includes reduced “that” clauses (e.g., “by the LDAP server [that is] configured by …”), such reductions are more common in informal writing (Carter & McCarthy, 2006, p. 387). In formal technical communication, including “that” is recommended (see Kohl, 1999, p. 151):

“sourceKey: Specifies the Listener source secret key that identifies the Listener source feed to which SAS Event Stream Processing sends data” (SAS, 2019, p. 171).

This example shows a “that” clause which modifies a noun or actor in a topic and connects that topic with others to round out the fuller context to create coherence among topics. In this case, the topic is “sourceKey” that identifies a “Listener source feed,” which is another topic that readers are presumed to understand already. Qualified readers might come to this topic knowing what the “Listener source feed” is and only need specification about what role the “sourceKey” plays in the process. Or they know they will have to consult a topic on “Listener source feeds” to gain the knowledge they are presumed to have.

Unlike the previous examples, the information following this modifier is not helpful at guiding readers to the fuller context because they are directed to look for a “Listener source feed,” which is not specifically referenced in the navigation. Perhaps, though, it may be locatable through search or an index. Such words could support readers with what Pirolli and Card (1999) called “information foraging,” searching for information or, failing at that, searching for the “information scent” or signs that are distinct and clear enough to lead readers to find the information they need.

When people read topic-based writing, they can seek out context non-linearly and as needed. For this reason, navigational assistance is important for readers, to help them find the additional topics that contribute to a broad contextual coherence. I will return to this point about navigation in the implications that follow in the conclusion.

To illustrate this point about how relative “that” clauses signal important context, consider the words immediately following, within two words of the relative pronoun. Lancsbox supports investigation of the network of words around “that” via use of graph collocation (i.e., a graph of words that are co-located with the words of interest). To generate these graph collocation networks showing the most common words clustering around “that,” I first filtered the list of clauses containing the word “that” by excluding “that” references with the part of speech tag “DT” (determiner) as well as with the part of speech tag “IN/that” (that as subordinator) leaving only uses of “that” with the WDT (wh-determiner) part of speech tag which are those uses of “that” analyzed throughout this paper (see Penn Treebank https://www.sketchengine.eu/penn-treebank-tagset/). I then specified a word span that looked two words to the right of each use of “that.” To filter out most of the words that might appear only a few times, I set a threshold value of at least 50 collocations. To choose which collocated words to include in the graph collocation below (Figure 1), I used LogRatio to determine how much more likely a word would be to connect with “that” compared to other words in the corpus. LogRatio shows collocations that are more prevalent than chance (see Hardie, 2014). To simplify the display of data, I set the LogRatio value to 5.0 or higher (i.e., more than 32 times more likely to occur than chance). Figure 1 shows a collocation graph for “that” in the TBW corpus.

We see strong collocations with verbs indicating generic actions (e.g., contains, makes, appear). However, we also see active verbs that users can carry out, words that readers may recognize as describing their intended actions or as concrete system actions and then look for those topics in the navigation. Verbs such as “correspond(s),” “defines,” “matches,” “stores,” and “runs” point to task-oriented information that one might readily use to organize a set of topics. These words represent the context to which unqualified readers might be directed. The more concrete that information is, the easier it may be for readers to get to those topics. From these examples of documentation, we can see some implications for writers of topic-based documentation.


Although this analysis focuses on the contribution of “that” and “which” to a topic-based writing style, these relative pronouns are not likely to be the only language choices that contribute to our understanding of how writers of topics help their readers become qualified readers. An analysis with this focus is, however, intended as a step toward that broader awareness of how language choices may help readers utilize topic-based documentation. The findings reported in this paper suggest that there may be rhetorical decisions that writers should be aware of and consider more deliberately when attempting to signal important pieces of context for their readers.

The first rhetorical decision is to choose more deliberately what kinds of pronominal references to use in documentation. For users of topic-based documentation, who may access topics non-linearly, via search, writers should recognize that relative “that” and “which” clauses are perhaps helpfully used as context signals, directing reader attention across and into other topics without being obtrusive about the references in a way that would limit the reusability of the topic content.

The second rhetorical decision is related to the first. Writers should pay attention to the content following uses of “that” and “which.” Nouns that signal important context can also support navigation. When readers recognize their lagging qualifications for understanding presumed context, they may seek out signals about what that context might be and look for navigational aids to find related content. Likewise, choosing vivid action verbs that point to distinct, searchable user and system actions might also aid in navigating to topics in the broader context. We may be advised, given the data here, to attend more deliberately to our word choice in modifying “that” and “which” clauses.

The third rhetorical decision builds on the previous two and suggests that in TBW, where every topic must stand alone, writers need to see an obligation to their readers to help them envision the larger context. Subtle signals about related concepts and tasks might be helpful, but readers would likely benefit from more guidance to pick up on the information scent. Writers might be well advised to choose words that are not only concrete but also specifically reflected in the navigation structures and metadata, allowing readers to navigate their way through topic-based documentation.

The potential importance of topic and navigation labeling, as a complement to context signaling, suggests that usability research may be in order. We do not know much about the user experience of topic-based writing, but the potential exists for readers to see topics as more isolated and acontextual than they are. Signals, such as those introduced by the relative pronouns “that” and “which,” when used deliberately, may have a cumulatively positive effect on how readers understand what is expected of them and where to find information that they might need.


Akis, J. W., Brucker, S., Chapman, V., Ethington, L., Kuhns, B., & Schemenaur, P. (2003). Authoring translation-ready documents: Is software the answer? Proceedings of the 21st Annual International Conference on Documentation, 39–44. https://doi.org/10.1145/944868.944878

Ament, K. (2002). Single sourcing: Building modular documentation. William Andrew.

Andersen, R. (2013). Rhetorical work in the age of content management: Implications for the field of technical communication. Journal of Business and Technical Communication28(2), 115–157. 1050651913513904

Andersen, R., & Batova, T. (2015). The current state of component content management: An integrative literature review. IEEE Transactions on Professional Communication, 58(3), 247–270.

Bache, C., & Jakobsen, L. K. (1980). On the distinction between restrictive and non-restrictive relative clauses in modern English. Lingua, 52(3), 243–267. https://doi.org/10.1016/0024-3841(80)90036-4

Baker, M. (2011, June 9). What is a topic? What does standalone mean? Every Page Is Page One. https://everypageispageone.com/2011/06/08/what-is-a-topic-what-does-standalone-mean/

Baker, M. (2013). Every page is page one: Topic-based writing for technical communication and the web. XML Press.

Batova, T. (2014). Component content management and quality of information products for global audiences: An integrative literature review. IEEE Transactions on Professional Communication, 57(4), 325–339.

Bellamy, L., Carey, M., & Schlotfeldt, J. (2012). DITA best practices: A roadmap for writing, editing, and architecting in DITA. IBM Press.

Brezina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge University Press.

Brezina, V., Weill-Tessier, P., & McEnery, A. (2020). #LancsBox v. 5.x. http://corpora.lancs.ac.uk/lancsbox

Brown, G., & Yule, G. (1983). Discourse Analysis. Cambridge University Press.

Carroll, J. M. (1990). The Nurnberg funnel: Designing minimalist instruction for practical computer skill. The MIT Press.

Carter, R., & McCarthy, M. (2006). Cambridge grammar of English: A comprehensive guide; spoken and written English grammar and usage. Cambridge UP.

Citrix. (2019). Citrix virtual apps and desktops. Citrix.

Clark, K. (2009). Elements of style for machine translation. Multilingual writing for translation, getting started: Guide, October/November 2009.

Druva. (2019a). About using SCIM for user management in Druva InSync. InSync User Documentation. Druva.

Druva. (2019b). Archived release notes and fixed issues. Phoenix User Documentation. Druva.

Druva. (2019c). Manage disaster recovery plan. Phoenix User Documentation. Druva.

Eble, M. F. (2003). Content vs. product: The effects of single sourcing on the teaching of technical communication. Technical Communication; Washington, 50(3), 344.

Evia, C. (2019). Creating intelligent content with lightweight DITA. Routledge.

Fabb, N. (1990). The difference between English restrictive and nonrestrictive relative clauses1. Journal of Linguistics, 26(1), 57–77. https://doi.org/10.1017/S0022226700014420

Flanagan, S. (2015). Intelligent Content Editing: A Prototype Theory for Managing Digital Content. International Journal of Sociotechnology and Knowledge Development (IJSKD), 7(4), 53–57. https://doi.org/10.4018/IJSKD.2015100104

Fowler, H. W., & Crystal, D. (2009). A dictionary of modern English usage. Oxford University Press.

Geisler, C., & Swarts, J. (2019). Coding streams of language: Techniques for the systematic coding of text, talk, and other verbal data. University Press of Colorado. https://wac.colostate.edu/books/practice/codingstreams/

Gillespie, R. (2017). DITA and topic-based writing: Flip sides of the same coin? CIDM. https://www.linkedin.com/pulse/dita-topic-based-writing-flip-sides-same-coin-rob-gillespie/

Halliday, M. A. K., & Matthiessen, C. M. I. M. (2004). An introduction to functional grammar (3rd ed.). Arnold Publishers.

Hardie, A. (2014). Log Ratio – an informal introduction | ESRC Centre for Corpus Approaches to Social Science (CASS). http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/

Hinrichs, L., Szmrecsanyi, B., & Bohmann, A. (2015). Which-hunting and the Standard English relative clause. Language, 91(4), 806–836. https://doi.org/10.1353/lan.2015.0062

Hitachi. (2019a). Hitachi command suite tuning manager 8.6. Hitachi Group.

Hitachi. (2019b). Hitachi data ingestor 6.4.5-01. Hitachi Group.

Hitachi. (2019c). Hitachi data ingestor cli administrators guide. Hitachi Group.

Hitachi. (2019d). Hitachi data ingestor cluster administrator’s guide 6.4.5-01. Hitachi Group.

Hitachi. (2019e). Hitachi data ingestor installation guide 6.4.6-02. Hitachi Group.

Horn, R. E., Nicol, E. H., Kleinman, J. C., & Grace, M. G. (1969). Information mapping for learning and reference. Information Resources Inc.

IBM. (2014) DeveloperWorks editorial style guide. (2014, May 31). http://www.ibm.com/developerworks/library/styleguidelines/index.html

Kantner, L., Shroyer, R., & Rosenbaum, S. (2002). Structured heuristic evaluation of online documentation. Proceedings IEEE International Professional Communication Conference, 331–342.

Kohl, J. R. (1999). Improving translatability and readability with syntactic cues. Technical Communication, 46(2), 149–166.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159–174.

Motorola. (2020). Motorola mg7550 router documentation. Motorola.

NetApp. (2019a). Performing system administration. NetApp.

NetApp. (2019b). Understanding audit messages. NetApp.

Pennebaker, James W. (2011). The secret life of pronouns: What our words say about us. Bloomsbury Press.

Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643–675. https://doi.org/10.1037/0033-295X.106.4.643

Priestley, M., Hargis, G., & Carpenter, S. (2001). DITA: An XML-based technical documentation authoring and publishing architecture. Technical Communication, 48(3), 352–367.

Rockley, A. (2001). The impact of single sourcing and technology. Technical Communication, 48(2), 189–193.

Rockley, A., Manning, S., & Cooper, C. (2009). DITA 101: Fundamentals of DITA for authors and managers. The Rockley Group.

Samuels, J. (2013). Getting started with topic-based writing. TechWhirl. https://techwhirl.com/getting-started-with-topic-based-writing/

Sanders, T. J., & Noordman, L. G. (2000). The role of coherence relations and their linguistic markers in text processing. Discourse Processes, 29(1), 37–60.

Sanders, T., Land, J., & Mulder, G. (2007). Linguistics markers of coherence improve text comprehension in functional contexts. Information Design Journal, 15(3), 219–235.

SAS. (2019). SAS® event stream processing 6.1: Connectors and adapters. SAS Institute.

Scott, M. (1997). PC analysis of key words—And key key words. System, 25(2), 233–245.

Spyridakis, J. H. (1989). Signaling effects: Increased content retention and new answers—part II. Journal of Technical Writing and Communication, 19(4), 395–415. https://doi.org/10.2190/493Q-703B-JBVD-E0T9

Strata Decision Technology. (2019). StrataJazz managing staffing and pay practices opportunities. Strata Decision Technology.

Stubbs, M. (1996). Text and corpus analysis: Computer-assisted studies of language and culture. Blackwell Publishing.

Tanium. (2019a). Tanium Connect User Guide 4.11.1. Tanium.

Tanium. (2019b). TaniumTM IaaS cloud solution deployment guide for Microsoft Azure. Tanium.

Temperley, D. (2003). Ambiguity avoidance in English relative clauses. Language, 79(3), 464–484.

TIBCO. (2019). TIBCO REST Implementation. TIBCO.

Vernier. (2019). Centripedal Force Apparatus. Vernier.

Wachter-Boettcher, S. (2012). Content everywhere: Strategy and structure for future-ready content. Rosenfeld Media.

Williams, J. M. (1997). Style: Ten lessons in clarity and grace (5th ed.). Addison Wesley.


Jason Swarts is a professor of technical communication in the English department at North Carolina State University. His research focuses on technological mediation of writing practices, the rhetoric of technology, workplace communication, and emerging genres of technical communication. His work has appeared in Technical Communication Quarterly, the Journal of Technical Writing and Communication, and Technical Communication. Inquiries can be sent to him at jason_swarts@ncsu.edu.