Features

Maintain Your Wiki Garden: A Cost-Free Method for Archiving Wiki Pages in Confluence

By Richard Rabil Jr. | STC Member

I work at a company that loves wiki technology a little too much.

I realize that sounds strange. Isn’t having an active wiki user base a good problem to have?

Certainly, it’s wonderful that so many people in our company embrace our instance of Atlassian Confluence. Our users actively create and collaborate on pages, contribute valuable knowledge, read the structured guides we wrote, post comments, and answer each other’s questions.

At the same time, our wiki has grown so large that it has been called the Tower of Babel. There are thousands of pages, and if you don’t know exactly what terms to use, running a search quickly feels overwhelming. Many pages are redundant or outdated (often both), leading to diluted search results. Not only does this cause our users frustration, it also prompts them to start new silos of information, which of course only compounds the problem.

The primary way we addressed this problem was by defining a method for page archiving. Below, I review the challenges associated with page archiving, followed by the automated solutions we explored. I then focus on how you can implement a reliable archiving method using Confluence’s out-of-the-box functionality—a method that won’t require you to invest money in any third-party plugins. Note that I use terms and concepts (such as “spaces,” “macros,” and “plugins”) that assume a foundational understanding of the features of Confluence.

Challenges with Archiving

It seems that archiving pages in Confluence should be simple. Why can’t you just query a space, find all the old pages, and either delete them in bulk or move them to another space that no one uses?

Confluence does provide some easy solutions for this. For example, you can create and designate an archive space, and generate lists of wiki pages and sort them by timestamp. But, as I discuss below, these solutions only go so far in helping you identify exactly which pages can be archived. The fact is that some wiki pages are very old, yet still receive frequent pageviews.

The other challenge is time. Assessing the quality of a wiki page is laborious and subjective. Arguably the best person for the job is the page author, but what if that person has left the company? Even if the author is still around, he or she may have created hundreds of pages. And few people have the time and patience to go back through that much information to determine what to deprecate and what to keep.

Automated Methods: Pros and Cons

Ideally, you could automatically archive old wiki pages based on a combination of factors such as a page’s age (when it was last updated) and relevance (how often it has been viewed). The best solution that my team found for this is a third-party tool called the Archiving Plugin for Confluence. Developed by Midori Global Consulting, this plugin provides a range of benefits, a few which include:

  • Running reports that show you the quality of wiki pages in any given space. Quality is determined through a combination of the page’s age and how often it has been viewed within a configurable date range.
  • Sending bulk email notifications to authors whose pages have “expired” in terms of quality. This gives the authors a chance to update, delete, or archive the page.
  • Bulk archiving pages based on configurable criteria and sending automatic email notifications to the authors in case they need to restore the pages.

However, the plugin is fairly costly. For a company of 500 or more users, a license will cost you about $3,000 a year. Moreover, there is a learning curve and an opportunity cost: you could end up spending substantial time as an administrator of the plugin and lose your focus on other valuable writing tasks.

I would argue that the return on investment well exceeds these factors, but if your company is anything like mine, approvals for third-party software takes time, and the budget is tight. In fact, your company may already be paying for multiple third-party plugins, making it all the more difficult to decide which ones justify the investment.

A Reliable Manual Method

The good news is that you can still develop a reliable manual method for archiving wiki pages that doesn’t cost you extra money. What I describe below is certainly not a perfect solution, but it is systematic and intuitive, and my team has implemented it with great success.

Step 1: Create an archive space and update the global space header.

In Confluence, you can easily create a special archive space and move outdated pages into it. This is done by going into the space settings and changing the space’s status to Archived. The effect of this change is that any pages within the space no longer appear in the search results and activity feeds. This reduces the noise in the wiki while still giving you the ability to reference or restore the archived pages if necessary.

Additionally, you can display a prominent message in the global header of the space to cue readers about the space’s purpose. For example, the header could say: This is an archived page. The content is outdated and should not be trusted. This too can be added in the space’s settings. You can even style the message using standard wiki markup so that it displays more effectively. For guidance on this step (and the one before it), consult the Confluence online help.

Step 2: Create a single level of subdirectories within the archive.

After creating an archive space, you may find that it soon becomes filled with thousands of pages in no particular order. In our case, the archive space got so large that there was a major performance lag whenever someone tried to visit the space. A related problem emerged whenever a user needed to return to the space and restore a page, but couldn’t remember the page’s name. That meant we had to browse for it, which was all the more difficult due to the performance issue and the haphazard page structure.

To alleviate these difficulties, I recommend creating a single set of subdirectories within the space based on the major organizations in your company. When my team did this, the result was an alphabetically sorted list of sub-archives that looked like the following:

  • Archive Home
    • Analytics Archive
    • Client Success Archive
    • Engineering Archive
    • Human Resources Archive
    • Marketing Archive
    • Product Management Archive
    • Regulatory Archive
    • Sales Archive

With this structure in place, users could move an archived page into more meaningful subsection and have a somewhat easier time restoring it if they needed to. For example, if you worked on the human resources team, you would move your wiki page under the HR Archive subdirectory, and could start by looking there if you ever needed to resurrect it.

If you go this route, make sure you commit to it. We allowed users to create whatever structure they wanted under the second layer, but we required the first layer to remain intact and had to monitor it to keep it clean. We did not create any additional layers, knowing that too much structure would complicate matters.

Again, this is not a perfect solution, but arguably it’s better than nothing if your archive space expands so much that it starts having performance issues. And it’s prudent to consider building this layer ahead of time, since organizing pages ad hoc is time-consuming and error prone.

Step 3: Identify pages to archive.

One of the more complicated aspects of the archiving process is identifying exactly which pages can be archived. In some cases, it’s easy. You will be familiar with your own pages, or the pages created by your teammates, and will thus have a good sense of what can be removed. But inevitably you will come across many other pages of dubious quality. You will also feel the urge to archive sets of outdated pages in bulk. What can you do in these cases?

Share the page with the page author. If you’re uncertain as to a page’s relevance, share it with the page author and ask if it can be archived. If the author is not around, ask someone on his or her team. You might be surprised at how quickly they respond.

Use Confluence’s content macros. You can use the Content by Labels and Content Report Table macros (both come packaged with Confluence) to generate page lists and sort them by their last updated date. The downside of these macros is that they require users to have meticulously labeled their pages, which is by no means a guarantee. Moreover, neither of these macros offer insight into how often the pages are viewed.

If neither of these methods get you anywhere, you’ll have to use your judgment. But remember that you’re not deleting the page, so even if you make a hasty or inaccurate judgment, you can still restore the page if necessary.

Step 4: Prepare your wiki pages for archiving.

Not all wiki pages are created equal. By that I mean you can’t always move a page to the archive space without some unintended consequences, the chief one being that you will break any incoming links. The links will still work, but they will be misleading because they will direct users to pages that have been deemed no longer relevant or useful.

Fortunately, Confluence makes it easy to identify incoming links. This information is shown in the Page Information menu of an individual page. You can then follow each incoming link and edit the source page to redirect or remove the link. This is a time-consuming step, but an important one—especially if the page you’re archiving has been replaced with a new one that you want people to use.

Step 4: Move the page to the archive space and (optionally) document your decision in the page itself.

After all incoming links have been deleted or redirected, you can safely move the page to the relevant subdirectory in the archive space. By default, any child pages will be moved along with the parent page, but you have the option to keep the child pages behind if you like.

But what if multiple people are watching the page, and they don’t understand why the page has been archived? This could be a problem if the page is frequently used (a relatively common scenario in our company).

In such cases, consider adding a note at the top of the page to explain your rationale. To facilitate this step, my team created boilerplate explanations to cover the most common archiving scenarios, and encouraged users to modify the explanations as needed. Of course, this is another manual step, so unless the page is high-profile, consider it optional.

Archive Scenario
Boilerplate Explanation
Page was outdated and irrelevant This page was archived because it was deemed outdated or irrelevant. See <TICKET NUMBER> for background information.
Page contents were combined with another page This page was archived because its contents were merged into another page: <PAGE NAME>. See <TICKET NUMBER> for background information.
Page was replaced This page was archived because it was replaced by a new page: <PAGE NAME>. See <TICKET NUMBER> for background information.

Figure 1. Boilerplate explanations for common archiving scenarios.

Step 5: Write instructions for the process and evangelize it.

The final step is to create a user-friendly set of instructions on how to execute the page archiving process and then publish it in a prominent place on the wiki. Here are the subtasks that I recommend you include in the page:

  • Identify pages to archive
  • Prepare the pages for archiving
  • Move the pages to the archive space
  • Document your rationale (optional)
  • Restore an archived page
  • Create a new archive subdirectory
Conclusion

It is true that the above method of page archiving relies on end users to proactively identify and move pages from one space to another. A more automated approach would be preferable (especially one that determines page quality based on age and page views). Still, Confluence equips you with valuable tools to move you toward a reliable, albeit non-automated solution.

Once you have developed your process, share it regularly with users in your organization and include it in any relevant trainings. Archiving pages should be something that anyone in the organization can do. If your wiki is large and active, and you’re limited to out-of-the-box features, then getting as many people educated as possible is your best long-term strategy. Indeed, that kind of collaborative crowdsourcing is what wiki gardening is all about.

RICHARD RABIL JR. is a principal technical writer at Oracle. He has over 10 years of technical communication experience and holds a master’s degree in technical communication and rhetoric from Texas Tech University. You can follow him on Twitter at @rrabil or check out his blog at richard.rabil.com.