The paper, published on Monday in The Proceedings of the National Academy of Sciences, details the "unexpected privacy consequences" that arise when disparate data sources can be correlated.
The authors of the study, Alessandro Acquisti, an associate professor of information technology and public policy at CMU's Heinz College, and Ralph Gross, a postdoctoral researcher, demonstrate that Social Security numbers can be predicted using basic demographic data gleaned from government data sources, commercial databases, voter registration lists, or online social networks.
Knowing a person's Social Security number (SSN), name, and date of birth is typically enough to allow an identity thief to impersonate that person for the purpose of various kinds of fraud. Thus, being able to easily guess a person's SSN presents a significant security risk.
Acquisti and Gross estimate that 10 million American residents publish their birthdays in online profiles, or provide enough information for their birthdays to be inferred.
The accuracy with which SSNs can be predicted in 100 attempts varies, based on the availability of online data and on the subject's date and place of birth, from 0.08% to over 10% for some states.
Such odds may not seem particularly dangerous, but an attacker could use a computer program to guess and guess again, over and over. With 1,000 attempts, a SSN becomes as easy to crack as a 3-digit PIN. Among those born recently in small states, the researchers were able to predict SSNs with 60% accuracy after 1,000 attempts.
In their paper, Acquisti and Gross pose a hypothetical scenario in which an attacker rents a 10,000 machine botnet to apply for credit cards in the names of 18-year-old residents of West Virginia using public data. Based on various assumptions, such as the number of incorrect SSN submissions allowed before a credit card issuer blacklists a submitting IP address (3), they estimate that an identity thief could obtain credit card accounts at a rate of up to 47 per minute, or 4,000 before every machine in the botnet got blocked.
Based on an estimated street price that ranges from $1 to $40 per stolen identity, identity thieves in theory could make anywhere from $2,830 to $112,800 per hour.
As a temporary defensive strategy, the authors recommend that the Social Security Administration fully randomize the assignment of new SSNs, instead of randomizing only the first three digits, as the agency recently proposed. But, they note, such measures would not protect existing SSNs.
They also suggest that legislative defenses, such as SSN redaction requirements, won't work either.
"Industry and policy makers may need, instead, to finally reassess our perilous reliance on SSNs for authentication, and on consumers' impossible duty to protect them," the paper concludes.