J. Andrew Magpantay

Truth in Packaging in the Digital World

Truth in Packaging in the Digital World
Library of Congress Network Advisory Committee
Network Planning Paper Number 30

Andrew Magpantay
Office of Information and Technology Policy
American Library Association


1. Introduction

How can users trust the efficacy, provenance, authority, etc. of digital documents? For my talk today I would like to address three areas:

  • The problems with provenance and authentication of digital documents.
  • Technologies to address problems of provenance.
  • Other necessary infrastructure components.

Provenance concerns information about the ownership and transmission of a document.

For this presentation, I will use the term digital documents to refer to any type of information package in digital format that we can transmit over open public networks. This of course includes multimedia as well as text documents.

Some of the technologies used to address issues of a digital document s provenance include time stamping, digital signatures, and techniques for including authentication information in digital documents. In order to be successful, these technologies must be supported by institutional certification authorities such as electronic notary publics or key holder certifiers. Other information infrastructure components, such as the availability of reviews and critiques can also help users evaluate the authenticity of digital documents.

2. A Story

To help illustrate the problem of authentication in a digital environment, I would like to tell a story.[1] On Apr 5, 1994, a message containing racist jokes and epithets was posted on the Usenet bulletin board on the Internet, under the name of a student at the University of Michigan. Usenet is accessed by millions of Internet users. Over the next month thousands of people sent e-mail to the University of Michigan student and the student even had his dorm room marched on by a student group.

There was one problem however. This student did not send the message. A computer hacker used a software program to retrieve the student’s password and had sent out the racist message using the student s e-mail account. But of course, there was no way for anyone to know that this student’s user account had been compromised. Furthermore, though the victimized student tried to clear his name, his attempts did not circulate over the Internet until days after it had already reached thousands of users.

3. Problems

This story points out one of the problems we deal with when we talk about issues related to provenance and authentication in the digital library — the “Who?” question: Is the purported author of a digital document truly the creator of the document? How can users be sure that the authorship of the digital document they are viewing is indeed genuine?

I would like to digress for a moment to address a point that was raised earlier by one of the previous speakers regarding privacy and the right to anonymity in cyberspace. There need not be a conflict between allowing for anonymity while at the same time providing ways for users to know with whom they are dealing. As long as it is apparent to users of cyberspace that they are dealing with an anonymous or pseudonymous entity, the principle alluded to in the title of my talk, “Truth in Packaging,” can still be preserved while maintaining other users privacy rights.

But in addition to verifying authorship, there are other concerns as well, such as what is in the digital document. Have its contents been tampered with intentionally or accidentally? How will the user know when a document has been changed and what has been changed? [2]

Keep in mind the very dynamic nature of the digital universe. WWW documents are changing all the time for legitimate reasons. In the paper world we have the concept of editions, which helps identify variations in publications that deal with essentially the same intellectual work. In the paper library, we generally preserve and catalog those editions so that those who may need to refer to one edition or another can do so.

In the digital world, a web page can change, and no previous edition of it may exist to which users may have access. For scholars this can be quite important, but it may be important for the lay person as well. For example there may be legal reasons to be able to refer back to a particular digital document that was in some state X at some time Y and in fact the American Bar Association’s Information Security Committee has just issued draft guidelines for digital signatures that address some of these concerns. [3]

Another question that comes to mind is When and Where? When was the digital document created? Where does it reside and how can users have faith in the time and location stamp of information?

Bear in mind that currently a URL is really nothing more than the representation of a domain name, which itself is an overlay for an actual IP address. If my name were Smith, there is not much to keep me from setting up my own version of a “Smith-sonian” Web site (or a URL that could be easily confused with the Smithsonian Institution), and no way for an unsophisticated user to verify where this Web site resides, or even if its location really is a legitimate Smithsonian location. Knowing where institutions have their digital documents located and even when they are adding new documents can help users authenticate the legitimacy of a digital document.

So far, the qualities I have talked about bear on the intrinsic quality of the digital document. In other words, is the digital document really what it purports to be? You might think of these intrinsic elements, respectively, as dealing with the creator, contents, and origin of digital documents. One important fact to keep in mind however, is that these three elements themselves are also in digital format as part of the overall document. This means that they are all susceptible to manipulation by someone who has the appropriate knowledge and tools. The technologies I will be discussing in a few minutes are designed to prevent such tampering.

But before that, there is one other issue I would like to raise, if only to complete this list, and that concerns the quality of information in the document. How will users know how reliable, how accurate, how good the information is in a given digital document? We have Web sites that contain information on everything from Medicine to Mechanics. [4] How does the user assess the quality of that information?

In the paper library environment, we have many tools to help users make comparisons and critical evaluations of the intellectual work. We have classification and cataloging to help users find and compare similar works. There exists in our world today an infrastructure of writers, reviewers, and publishers who critique all manner of intellectual works, and in our libraries we provide guides and access to these secondary sources of critiques and reviews. These tools of collocation and secondary source information help us and our patrons authenticate and evaluate the quality of information they are finding; they need to be replicated in some form in the digital library.

4. Technologies

Given the problems I have described above, what technologies do we have to deal with them? I would like to briefly talk about three technologies:

  • Digital Signatures
  • Digital Time Stamping
  • Steganography

Digital signatures can be used to deal with authentication issues. Digital signatures use cryptography to generate a unique code based on the  digital document. A private key is then further applied to the code to create a specific signature that is then associated with the digital document.

The recipient of the digital document would then use a similar cryptographic algorithm and the corresponding public key to verify both the digital document and the signature. If the digital document has been altered or the digital signature created by some other private key, the verification process produces an invalid result.

This public key/private key technology allows the creator of a digital document to create a unique identifier that can only be verified by the corresponding public keys which can then be published or otherwise made available to potential users of the digital document. Because this unique identifier is also based on a code derived from the digital document itself, it also serves as a means of verifying the digital document’s integrity. [5]

Digital time stamping can be used to verify when a digital document was created. Like digital signatures, an algorithm is used to create a unique code that is then sent to a certification server, which combines the incoming code with another code to provide certfication of time and date. Should the document be altered, the certification code would no longer be valid.

Surety Technologies introduced a Digital Notary System in early 1995. [6] And this summer it was annouced that the U.S. Postal Service was working with a company called Premenos on an electronic commerce pilot project. [7] The project was to allow government and commercial users to authenticate Internet-based messages. The Postal Service agreed to be the repository for public key certificates, which contain data to identify digital document authors. Recipients would be able to retrieve the certificates to authenticate digital signatures. The service would provide a date and time stamp for each transaction or document.

Steganography[8] is a technology which imbeds information in unused portions of the information package. For example the hidden watermarks and codes embedded in U.S. currency can be thought of as employing steganographic techniques. In the digital environment, unused bits in a  transmission packet might be used to hide other information about the contents of the packet.

Steganographic technologies can be used to imbed digital signature information into a document in a non-intrusive way that also makes the embedded information more difficult to tamper with.

For example, this past summer, a company called Digimarc annouced a new technology for embedding electronic signatures or serial numbers within digital images, sound recordings, and video clips. [9]

5. Other Necessary Infrastructure

Other infrastructure is required to support the technologies just described and to provide a total authentication system for the creators and users of digital documents.

In the digital world, one still has need for certification authorities such as digital notary publics and key holders – places that can issue and certify private/public key combinations. This role may be taken on by a private company or a public agency such as the post office.

Librarians and libraries also have a role to play in the authentication system. Specifically I would like to suggest three roles:

  • As collocator of information
  • As contributors to the review process
  • As educators

Librarians should continue to use their skills and knowledge to construct links amongst and between similar intellectual works and between those works and their reviews and critiques. As noted above, this is a useful service to the cyberspace community, allowing users to review and make their own judgements concerning the quality and authenticity of digital documents.

Librarians should continue, and in fact are continuing, their role as reviewers. There is a project at the Kansas City Public Library called Infofilter[10] which involves librarians and others in a formal process of reviewing web resources. [11]

Librarians should also continue their role as educators about this new digital information resource. We have a user population that is still largely unfamiliar with this technology and its nuances. An Audit Bureau of Circulations survey last year indicated that slightly more than half the population had not even heard of the information superhighway. [12] Libraries are the natural place for user education to occur. We are beginning to see more and more states like Maryland that have over 80% of their public library systems connected to the Internet, and I believe this trend will continue. With only about 30% of the U.S. homes owning computers[13] and less than 15% owning modems[14], libraries, I would like to suggest, are places where people can access the information superhighway as well as learn, from knowledgable professionals, about being smart consumers in this emerging digital information market.

Notes:

  1. Michell Levander, “Slurs Posted on Internet Raise Censorship Questions.” San Jose Mercury News (San Jose, CA), May 1, 1994. Page 1D Go Back
  2. For an interesting discussion of these issues see, Peter S. Graham’s report, Intellectual Preservation: Electronic Preservation of the Third Kind, published by The Commission on Preservation and Access, March 1994. Go Back
  3. See Digital Signature Guidelines: Legal Infrastructure for Certification Authorities and Electronic Commerce at http://www.intermarket.com/ecl/digsgleg.html {no longer available at this URL}. Version cited was the WordPerfect 6.1 document, dated October 5, 1995. No endorsement, implied or expressed, is made regarding the web sites cited in this paper. Go Back
  4. See, for example, the Gravitational Fluid Mechanics Laboratory web site at the Department of Aerospace Engineering Sciences University of Colorado at Boulder http://iml.colorado.edu/ or the Harvard Medical School web site at http://www.med.harvard.edu/. Go Back
  5. For a further discussion of public key/private key management see RSA’s web document athttp://www.rsa.com/rsalabs/faq/faq_km.html [Web resource no longer available]. Go Back
  6. John W. Verity. “Bits & Bytes.” Business Week n3409 (Jan 30, 1995), page 80B. Go Back
  7. Brad Bass. “USPS, Premenos Team Up for Public Key Effort.” Federal Computer Week v9, n17 (July 3, 1995), Page 14. Go Back
  8. For additional discussions about this technology, including a bibliography, see http://www.thur.de/ulf/stegano/. Go Back
  9. “Holographic signatures for digital images; authentication, verification, and autometering for copyright holders.” Seybold Report on Desktop Publishing, vol. 9, no. 12 (August 14, 1995), Page 23. Go Back
  10. See http://192.135.229.51/infofilter/keyword.htm [Web resource no longer available]. Go Back
  11. James R. Rettig discusses the need for reviewers in his paper, Putting the Squeeze on the Information Firehose: The Need for ‘Neteditors and ‘Netreviewers, which was based on his November 3, 1995 presentation at the 15th Annual Charleston Conference on library acquisitions and related issues, held at the College of Charleston, South Carolina. Rettig’s paper could be found athttp://www.swem.wm.edu/firehose.htmlas of December 20, 1995. Go Back
  12. “Still A Mystery To Most,” in Editor and Publisher, (New York, NY), November 19, 1994, page 9. Go Back
  13. Bart Ziegler. “Hard Drive PC Makers’ Big Push Into the Home Market Comes at Risky Time.” Wall Street Journal. (New York, NY), Eastern Edition. November 1, 1995. Page A1. Go Back
  14. Victoria Shannon. “Inquiring Minds Want to Know the Secrets of Your On-Line Life.” Washington Post, Washington Business section. (Washington, DC), December 4, 1995. Go Back

Go to:


Library of Congress

Comments: lcweb@loc.gov (01/23/98)