It wasn't a big part of his presentation. But when David Cancel -- CTO of clickstream analysis service Compete -- mentioned last week at the Open Data 2007 conference in New York that his company licenses clickstream data from ISPs, he stirred up a swarm of bees that are still buzzing on the Web.
Privacy proponents this week are registering concerns that ISPs are selling "anonymized" user clickstream data -- concerns that were spurred partly by reports on Cancel's presentation.
"There is no way that this [data] is sufficiently anonymized," said blogger Adam Fields in one response. "It is readily obvious from reading my clickstream who I am. URLS for many online services contain usernames... All it takes is one of those usernames to be tied to a real name, and your entire clickstream becomes un-anonymized, irreversibly and forever."
The clickstream privacy buzz is further fueled by research following AOL's blunder last year, in which the ISP released "anonymized" search data from about 650,000 subscribers. Researchers found that it was easy to trace the search data back to individual subscribers, exposing personal information and embarrassing Web surfing habits. (See Users Outraged by AOL Gaffe.) An AOL spokesman yesterday said his company is not among the ISPs that sell anonymized clickstream data.
Clickstream analysis experts say this week's controversy is largely unwarranted. They point out that ISPs have been licensing clickstream data for years, making it available in a format that contains no usernames or personally-identifiable information.
"We contractually require all our data partners to make sure they never send us PII," Cancel said in an interview today. "Each record is identified by a random integer ID. We do not want any IP addresses, user/agents, etc., transmitted to us." The ISPs are responsible for anonymizing the data before it's sent to Compete, which aggregates the data to show trends in user browsing habits on any Website.
Compete is not the only clickstream analysis vendor to use information from ISPs. Hitwise, a Compete competitor, collects much of its data via software that reports clickstream data from ISP customers who "opt in" to the analysis. A detailed audit report from PricewaterhouseCoopers on the Hitwise Website assures users that Hitwise's data contains no PII and is collected only from users who know they are being tracked.
Other clickstream analysis services, such as Alexa and Nielsen/NetRatings, get most of their data from toolbars and user contributors who must consciously add software to their PCs in order to be monitored. But such services' results may not be as accurate as results from Compete, which collects data from ISPs, application service providers, and panels of willing users, Cancel says.
Still, many consumers -- and their service providers -- are becoming more aware of the privacy issues surrounding Web browsing analysis. Google last week said in a blog that it plans to revamp its information collection process, scrubbing personal information from cookies and removing some parts of IP addresses after the data has been stored for 18 to 24 months.
Tim Wilson, Site Editor, Dark Reading