Redacted portions of a PDF transcript from a court hearing to determine Facebook's settlement with ConnectU were revealed.

Thomas Claburn, Editor at Large, Enterprise Mobility

February 12, 2009

3 Min Read

Facebook has become the latest company to be bitten by bad PDF redaction.

The company's confidential settlement of a lawsuit brought by ConnectU was revealed Wednesday when Associated Press writer Michael Liedtke reported that redacted portions of a PDF transcript of a court hearing, at which details of the settlement were discussed, could be easily revealed.

"Large portions of that hearing are redacted in a transcript of the June hearing, but The Associated Press was able to read the blacked-out portions by copying from an electronic version of the document and pasting the results into another document," Liedtke wrote in his article.

The improperly redacted document revealed that ConnectU received somewhere between from $31 million and $65 million to settle its lawsuit, and that Facebook's internal valuation was about $3.7 billion.

"At some point in the document's workflow, it appears that someone added a white rectangle over white text in order to cover it," said David Stromfeld, a senior product manager for Adobe Acrobat. "And that's what they thought was sufficient to make that content undiscoverable."

That's not the right way to redact content.

Such mistakes have bedeviled would-be censors for years, in PDF files and Microsoft Word files, too.

A document on proper redaction technique, published by the National Security Agency in December 2005, describes the problem thus: "Both the Microsoft Word document format (MS Word) and Adobe Portable Document (PDF) are complex, sophisticated computer data formats. They can contain many kinds of information such as text, graphics, tables, images, meta-data, and more all mixed together. The complexity makes them potential vehicles for exposing information unintentionally, especially when downgrading or sanitizing classified materials." Earlier that year, the redacted text in a PDF of a U.S. military report containing classified information was revealed because the creator of the PDF reportedly placed black rectangles over the text rather than deleting it. The document described the investigation into the death of Nicola Calipari, an Italian citizen, at a checkpoint in Iraq on March 4, 2005.

A similar situation occurred in 2000, when The New York Times published on its Web site PDF files of a previously secret CIA report, "Clandestine Service History, Overthrow of Premier Mossadeq of Iran, November 1952-August 1953." The Times electronically blacked out certain names in the scanned report to protect those named. But New York architect John Young, who maintains the sensitive document archive Cryptome.org, discovered that the black overlay used by The Times loaded slowly on an underpowered computer, allowing the covered text to be read.

"It's important for users to understand that when you want to remove sensitive content for an electronic document, you want to be using tools that are specifically designed for that," said Stromfield. "People think that by covering content, out of sight out of mind."

Since the release of Acrobat 8 in November 2006, Adobe has been providing two tools to redact content and related information effectively. Redaction is a tool that will completely remove visible information from a document so that it cannot be recovered, explained Stromfield. And Examine Document is a way to detect and remove information that might not be readily apparent, like document metadata and comments.

Adobe Acrobat 9 added a variety of redaction enhancements, like the ability to redact using patterns, which is useful for finding Social Security numbers in legal documents, for example. Other enhancements include redaction word lists, page-based redaction, batch redaction, and the ability to automatically rename files to reflect redaction status.

People appear to be learning about proper redaction procedures, but slowly. "When we go to legal seminars now, we're seeing more and more awareness that there are tools available and there are right ways to do this," said Stromfield.

Balancing privacy and governance has always been a fine line. InformationWeek has published an independent analysis of this topic. Download the report here (registration required).

About the Author(s)

Thomas Claburn

Editor at Large, Enterprise Mobility

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful master's degree in film production. He wrote the original treatment for 3DO's Killing Time, a short story that appeared in On Spec, and the screenplay for an independent film called The Hanged Man, which he would later direct. He's the author of a science fiction novel, Reflecting Fires, and a sadly neglected blog, Lot 49. His iPhone game, Blocfall, is available through the iTunes App Store. His wife is a talented jazz singer; he does not sing, which is for the best.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights