Sanitization (classified information)

From Wikipedia, the free encyclopedia


Sanitization is the process of removing sensitive information from a document or other medium, so that it may be distributed to a broader audience. When dealing with classified information, sanitization attempts to reduce the document's classification level, possibly yielding an unclasified document. Originally, the term sanitization was applied to printed documents; it has since been extended to apply to computer media and the problem of data remanence as well.

[edit] Printed matter

A page of a classified document that has been sanitized for public release. This is page 13 of a U.S. National Security Agency report [1] on the USS Liberty incident, which was declassifed and released to the public in July 2003. Classified information has been blocked out so that only the unclassified information is visible. Notations with leader lines at top and bottom cite statutory authority for not declassifying certain sections. Click on the image to enlarge.
A page of a classified document that has been sanitized for public release. This is page 13 of a U.S. National Security Agency report [1] on the USS Liberty incident, which was declassifed and released to the public in July 2003. Classified information has been blocked out so that only the unclassified information is visible. Notations with leader lines at top and bottom cite statutory authority for not declassifying certain sections. Click on the image to enlarge.

A printed document which contains classified or sensitive information will frequently contain a great deal of information which is less sensitive. There may be a need to release the less sensitive portions to uncleared personel. The printed document will thus be sanitized to obscure or remove the sensitive information. The term redaction is also used to describe this process, though that term is more often used in literary contexts.

In some cases, sanitizing a classified document removes enough information to reduce the classification from a higher level to a lower one. For example, raw intelligence reports may contain highly classified information, like the identities of spies, that is removed before the reports are distributed outside the intelligence agency: the initial report may be classified as Top Secret while the sanitized report may be classified as Secret.

In other cases, like the U.S. National Security Agency's report on the USS Liberty incident (right), the report may be sanitized to remove all sensitive data, so that the report may be released to the general public.

As is seen in the USS Liberty report, paper documents are generally sanitized by covering the classified and sensitive portions and then photocopying the document, resulting in a sanitized document suitable for distribution.

[edit] Computer media and files

See also: Data remanence

Computer (also called electronic or digital) documents are more difficult to sanitize. In many cases, when information in an information system is modifed or erased, some or all of the data remains in storage. This may be an accident of design, where the underlying storage mechanism (disk, RAM, etc.) still allows information to be read, despite it's nominal erasure. The general term for the this problem is data remanence. In some contexts (notably the US NSA, DoD, and related organizations), sanitization typically refers to countering the data remanence problem; redaction is used in the sense of this article.

However, the retention may be a deliberate feature, in the form of an undo buffer, revision history, "trash can", backups, or the like. For example, word processing programs like Microsoft Word will sometimes be used to edit out the sensitive information. Unfortunately, these products do not always show the user all of the information stored in a file, so it is possible that a file may still contain sensitive information. In other cases, inexperienced users will use ineffective methods which fail to sanitize the document.

In May, 2005, the US military published a report on the death of Nicola Calipari, an Italian secret agent, at a US military checkpoint in Iraq. The report was published in Adobe PDF format and had apparently been sanitized using commercial word processing tools. Shortly thereafter, readers discovered that the blocked-out portions could be retrieved using simple cut and paste operations on the posted document.[1]

Similarly, on May 24, 2006, lawyers for the communications service provider AT&T filed a legal brief[2] regarding their cooperation with domestic wiretapping by the NSA. Text on pages 12 through 14 of the Adobe PDF document were blacked out in an attempt to sanitize the document, but the hidden text could be retrieved using cut and paste.[3]

At the end of 2005, the NSA released a report giving recommendations on how to safely sanitize a Word document.[4]

Issues such as these make it difficult to reliably implement multilevel security systems, in which computer users of differing security clearances may share documents. The Challenge of Multilevel Security gives an example of a sanitization failure caused by unexpected behavior in Microsoft Word's change tracking feature.[5]

[edit] References

  1. ^ BBC Report. "Readers 'declassify' US document", BBC, May 2, 2005.
  2. ^ http://www.politechbot.com/docs/att.not.redacted.brief.052606.pdf
  3. ^ Declan McCullagh. "AT&T leaks sensitive info in NSA suit", CNet News, May 26, 2006.
  4. ^ NSA SNAC (December 13, 2005). "Redacting with Confidence: How to Safely Publish Sanitized Reports Converted From Word to PDF". Report# I333-015R-2005. Information Assurance Directorate, National Security Agency. Retrieved on 2006-05-29.
  5. ^ Rick Smith (2003). "The Challenge of Multilevel Security". Black Hat Federal Conference.