Data-Scraping Lawsuit Sheds Light On Risk To Databases

Former Mitsubishi database and Web development vendor Snap-On Business Solutions battles O'Neil Associates over data-scraping on behalf of Mitsubishi
A recent lawsuit brought by a former database and Web development vendor for Mitsubishi against the industrial giant's current vendor is bringing to light the pitfalls of a common practice: data scraping from Web-facing databases. The case could give reason for pause for organizations considering this practice.

And for those who administer and build valuable Web-facing databases, it serves as another example of why it is important to deploy both technical and legal controls to mitigate the business risk posed by scraping.

The lawsuit, filed by Snap-On Business Solutions against O'Neil Associates within the United States District Court Northern District Of Ohio, just two weeks ago survived summary judgment, which means Judge James Gwin believes Snap-On has a case against O'Neil.

The high and low of the suit is that Snap-On worked for Mitsubishi for years, creating an online parts-ordering site powered by a database that was created from scratch for paper-based parts catalogs provided by Mitsubishi. When Mitsubishi decided to use O'Neil to create and manage a new site instead of paying Snap-On for the data included within the database it created, Mitsubishi chose a different route.

"In 2007, one of Snap-On's clients, Mitsubishi, began considering whether to move its online parts catalog from Snap-on to Defendant O'Neil," wrote Judge Gwin in his summary judgment opinion. "When Mitsubishi and Snap-On disagreed about Mitsubishi's rights to the information in the Snap-On database, however, Mitsubishi directed Defendant O'Neil to run a data retrieval program to recover data and images on Snap-On's servers."

Mitsubishi offered O'Neil numerous login credentials in order for O'Neil to collect the contents of the database surreptitiously using a data-scraping program that automatically retrieves data -- a process that would take a Web user many hundreds of hours to retrieve using manual point-and-click methods. The credentials included those used by a variety of Mitsubishi dealers in order for O'Neil to avoid detection while performing the scrape. Snap-On discovered the data retrieval after a spike in traffic caused by the automated program crashed the parts site.

"This is an area that I'm seeing more activity in," says Eric Goldman, associate professor of law at Santa Clara University School of Law, about data-scraping lawsuits. "The customer has these credentials, and when the customer decides that it wants to basically take its marbles and go home, it realizes it doesn't have the stuff it needs, but it can get it by logging into this private access point and basically grabbing data files from there. There's two principle issues with that. Principle issue No. 1 is the misuse of credentials, and issue No. 2 is the misuse of the servers that are protected by those credentials."

Goldman is unequivocal about whether what O'Neil did was legally copacetic.

"The law is generally pretty clear on this: You can't do that. It's fairly rare when you get a nice, simple unambiguous statement from a law professor," he says, "but you just can't do that. That's about as clear of a statement as I'm ever going to give you."

Goldman encourages businesses, such as Snap-On, that want to protect their intellectual property against scraping to create solid legal contracts that dictate how the IP is used and also licenses the use of the company's physical property -- in this case, the database servers.

"What I used to do when I was drafting those types of contracts in practice is that I would actually say the password and credentials were the company's trade secret," he says.

He also believes it is important to take a "double-barreled" technical approach that includes ample monitoring of databases to detect misuse of data.

"What some of the clients that I work with do is two things," he says. "One, they set up a system that says, 'OK, tell me who is the biggest user of our system in the past 24 hours.' And second is they'll usually put in some kind of limit and say, 'OK, any particular IP address can pull down no more than X amount of megabytes or gigabytes of data in any particular period of time," Goldman says.

Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.