Product Data Sustainment With Semantic Web Technologies

By Karl Darr

In 2001, the seminal article “The Semantic Web” by Tim Berners-Lee, James Hendler, and Ora Lassila was published. (This article was preceded by a Berners-Lee proposal submitted to the World Wide Web Consortium (W3C), a standards-setting body, on 4 February 2000). This proposal and article set in motion much research and hopes for future technologies that would be able to “bring structure to the meaningful content of web pages” with “software agents roaming from page to page,” where they “can readily carry out sophisticated tasks for users,” and other technologies. The article went on to state, “Like the Internet, the Semantic Web will be as decentralized as possible.” Hence, much has been invested in creating and extending various technologies (SWeLL, PINS, XML, SHOE, RDF, URI, GRDDL, POWDER, RIF, SPARQL, OWL, DAML, etc.) so as to create metadata and ontologies for data that is either already published or for tagging new data and information to make it more computer accessible.

Decentralized Semantic Web vs. Centralized Semantic Web

In the years since the Scientific American article was published, the decentralized Semantic Web, which applies artificial intelligence and data-mining technologies to published, unstructured, and marginally structured data, has come a long way. (Structured data, conversely, includes things like tech manuals, training documents, maintenance schedules, etc.) However, the decentralized semantic Web has not delivered as hoped, especially in the case of nonstructured data, like email, marketing materials, conversations, news items, and blogs. As more standards are implemented, along with the advent of intelligent mobile devices (i.e., smart phones and tablets, with open application programming interfaces), there is a rapidly growing body of interconnectedness, computing capability, and rising interest among the masses to make more information actionable.

The decentralized semantic Web in large part is an “outside-in” undertaking in the sense that intelligence is created by gathering data from potentially non-predetermined sources that are “out there,” analyzing the data on behalf of the individual. In general, the decentralized semantic Web is constrained by the limited amount of metadata and the accuracy of inference and analytics engines used in gathering and analyzing the data necessary to produce intelligence. Thus, decentralized semantic Webs have been more difficult to implement than many people had hoped. Conversely, a centralized semantic Web combines data from strictly (known) predetermined and controlled sources and formats, and potentially highly refined metadata to produce intelligence, in a sort of “inside-out” manner.

Today, more often than not, semantic Web applications are unidirectional, and while results from decentralized semantic Webs for unstructured data are wanting, fantastic results are being commercially delivered with centralized semantic Web technologies for structured data, and these can be multidirectional and multidimensional. Unfortunately, structured data accounts for only 5% of the data that is created.

Two Flavors of Centralized Semantic Webs

There are essentially two categories of centralized semantic Webs: infinite data streams (continuous flows of information and data) and Finite Centralized Semantic Webs (FCSWs), or semantic Webs for bounded data.

Continuous flow centralized semantic Webs are generally aimed at population-trending analysis, or in harnessing the wisdom of the crowds:

Walmart currently handles more than 1 million customer transactions per hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the information volume in all of the books in the Library of Congress, per hour! By being able to post analysis of this data in real-time, Walmart enables their suppliers to see the exact number of products on every shelf of every store at that exact moment. The system gives suppliers a complete overview of when and how their products are selling, and with what other products in the shopping cart. This is a big contributor to the perception that Walmart has one of the best supply chain management systems in the world.
Google is applying recursive learning algorithms to the free “data exhaust,” the trail of clicks that Internet users leave behind from which value can be extracted to produce new products and services. For example, Google produced arguably the world’s best spell-checker in almost every language by observing all of the misspellings that users type into a search window and then “correct” by clicking on the right result.

Machine analysis is the only way to handle virtually endless data streams for population trending analysis, as indicated by these examples.

Conversely, Finite Centralized Semantic Webs are appropriate for bodies of structured data, and these generally require more human engagement earlier in the lifecycle.

Metadata enables computers to elevate data to information, to knowledge, and ultimately to wisdom. Metadata includes things like an ontology, which is a document or file that formally defines relations among terms. The most typical kind of ontology has a classification, or taxonomy, and a set of inference rules. The taxonomy defines classes of objects and relationships among them. With appropriate human analysis and design prior to data generation, structures can be engineered to store the data in sufficiently large yet granular and interconnected links, to make the information more programmatically retrievable and navigable for what are, today, unforeseeable future uses.

Core PLM Requirements

Indeed, making information available for unforeseeable future uses is one of the core six Product Lifecycle Management (PLM) characteristics or attributes, called “cued availability,” as described by Michael Grieves in his seminal book, Product Lifecycle Management (2006). “Cued availability,” Grieves writes, “requires that the information is presented to us when we may not be searching for it, but need it nonetheless.” In other words, a true PLM system delivers knowledge and wisdom.

Another of the core PLM characteristics is what Grieves calls “singularity of data,” which is similar in concept to an authoritative data source, but in a manufactured product environment it simply means, as Grieve writes, “when we have two or more unique data representations, one of the representations is the one we all agree is the correct one—the representation that everyone will work with.”

FCSW + PLM = PILM

When a single data source FCSW is integrated with a PLM environment, a Product Information Lifecycle Management (PILM) solution is possible. Properly implemented, the PILM automatically captures all of the decisions that are made as the product moves throughout its lifecycle. And, when information is delivered from the PILM, it can be branded with its own metadata that enables the tracking of that specific information back to its original source. So, while decentralized semantic Web applications are unidirectional, or one-dimensional, and can provide content, a PILM implementation is omnidirectional or multidimensional and can capture and provide content in context. Also, a PILM can “roundtrip enable” data in the sense that branded information can be delivered with its own unique metadata that allows that information to be tracked back to its source. In this manner, the way service information is used, for example, can automatically flow back to augment engineering to adjust prognostics and diagnostics information with the multidimensional nature of the PILM providing contextual clarity.

For example, a bus manufacturer can use a municipality’s Request for Proposal (RFP) for a new bus fleet to create the initial PILM structure for developing a response to the RFP. This initial PILM structure then becomes the global, virtual collaboration room for all of the downstream information contributors and users in satisfying this customer opportunity. Building the manufacturer’s proposal in the PILM ensures that all of the municipality’s interests are addressed with responses paired to issues or specifications in the RFP.

When the bus manufacturer is awarded the contract, the specifications are now converted to requirements for the design team to work against, and when data posts in the CAD system, it automatically updates the PILM. As the design begins to get finalized, the project begins migration to engineering. Once information is posted to the Engineering Data Management (EDM) and the Product Data Management (PDM) and other systems, the PILM is again automatically updated. As quality assurance gets involved, data from inspection intervals is automatically posted to the PILM. Prognostics and diagnostics information is provided to the service department in “roundtrip-able” sentences, which automatically flow feedback and update information to engineering.

As the bus manufacturer’s project gets promoted to production, the PILM now gets automatic updates, part numbers, etc., from the Enterprise Resource Planning (ERP) system. If task information needs to be authored, the technical publications and training departments can access information from all of the different master data sources as it arrives in the PILM, to begin writing their work steps or training materials, as appropriate, earlier in the process than is usually possible.

Going Forward

Consider the following facts:

Microprocessor price/performance doubles every 18 months.
Memory price/performance doubles every 12 months.
The payroll tax deduction passed by Congress in February 2012 will sell unused broadcast spectrum to bolster wireless Internet connectivity.
The Apple A4 microprocessor was released. (If you had one in 1993, you had one of the world’s top 30 fastest supercomputers, and Apple sold over 112,000,000 A4-equipped devices in the last quarter of 2011.)

This growing computing capability at the periphery of the Internet is powering the demand for faster, more tailored, and immersive experiences that consumers are increasingly looking for. Today, the correct semantic Web technologies enable manufacturers to deliver serial-number-specific product information to their end-user customers, providing that desired richer experience, which in turn enables the manufacturers to strengthen their relationship with consumers while enlisting their aid in sustaining product information throughout the product’s lifecycle.

Karl Darr (karl.darr@star-group.net) is a Silicon Valley–based consultant for STAR Group.

References

All too much—Monstrous amounts of data. The Economist (27 February 2010), 3–15. www.economist.com/node/15557421

Berners-Lee, T., J. Hendler, and O. Lassila. The Semantic Web. (17 May 2001).
www.ryerson.ca/~dgrimsha/courses/cps720_02/resources/
Scientific%20American%20The%20Semantic%20Web.htm

Borek, K. J., and R. F. Wilson. An Analysis of S1000D and Its Adoption by the Department of Defense, Report LG802T2. LMI Government Consulting, 2008.

Darr, K., Managing motorcycle documentation at BMW. Multilingual Magazine (July/August 2006).

Data, data everywhere. The Economist (27 February 2010), 3–15. www.economist.com/node/15557443

Ciarlone, L., K. Kaddie, and M. Leplante. Multilingual Product Content: Transforming Traditional Practices into Global Content Value Chains. The Gilbane Group (June 2009).

Grieves, M. Product Lifecycle Management. New York, NY: McGraw-Hill, 2006.

1 Comment

stcuser says:

May 22, 2012 at 12:03 pm

You obviously have a great deal of knowledge about your subject. However, I find your writing style to be so turgid and abstract that I had to read the article several times before I understood its purpose. I also gathered that this article isn’t intended for technical communicators because its focus is on unstructured data, and you state in the first paragraph or two that technical manuals and other such output from technical communicators is structured data. I’d like to learn more about your subject of interest. Next time, please write in a way that is accessible to people who do not know much about your niche.

Click here to post a comment