Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Physical Security

01:00 PM
Connect Directly
E-Mail vvv

Data Bias in Machine Learning: Implications for Social Justice

Take historically biased data, then add AI and ML to compound and exacerbate the problem.

Machine learning and artificial intelligence have taken organizations to new heights of innovation, growth, and profits thanks to their ability to analyze data efficiently and with extreme accuracy. However, the inherent nature of some algorithms such as black-box models have been proven, at times, to be unfair and lack transparency, leading to multiplicated bias and detrimental impact on minorities.

There are several key issues presented by black-box models, and they all work together to further bias data. The most prominent are models fed with data that is historically biased to begin with, and fed by humans who are biased by nature. In addition, because data analysts can only see the inputs and outputs but not the internal workings of how results are determined, machine learning is constantly aggregating this data, including personal data. But this process lacks transparency on how the data is being used and why. The lack of transparency means that data analysts have no clear view of inputs and outputs, and algorithms are making analyses and predictions about our work performance, economic situation, health, preferences, and more without providing insights into how it came up with its conclusion.

Related Content:

Are Unconscious Biases Weakening Your Security Posture?

Special Report: How IT Security Organizations Are Attacking the Cybersecurity Problem

New From The Edge: How to Protect Vulnerable Seniors From Cybercrime

In the infosec realm, this is important as more security platforms and services increasingly rely on ML and AI for automation and superior performance. But if the underlying software and algorithms for these same products and services reflect biases, they'll simply perpetuate the prejudices and errant conclusions associated with race, gender, religion, physical abilities, appearance and other characteristics. This has implications for both information and physical security, as well as for personal privacy.

One of the most prominent examples of bias presented by these key issues emerges in the justice system and risk scores. In law enforcement, risk scores are used to predict the likeliness or risk of there being a crime committed by a group of people, a person, or in a certain location. When police departments ask "What locations have higher crime rates?" in order to inundate law enforcement in crime-prone areas, they look at geolocation's risk scores. But dispatching more police officers to a certain location equates to more arrests, and the more reported arrests of any kind in that area equates to more officers being sent to the location by the risk score. It's a never-ending cycle.

A study of risk scores conducted by ProPublica found that Black defendants were 77% more likely to be pegged as "higher risk of committing a future violent crime" and 45% were "more likely to be predicted to commit a future crime of any kind." They also found that the risk score formula was "particularly likely to falsely flag Black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants" (emphasis added). 

Recently, Boston Celtics players published an opinion piece in The Boston Globe calling out the various bias implications of facial recognition technology in minority communities. Facial recognition technology, which also uses black-box models, has had a history of misidentifying Black people and people of color. A test run by the ACLU, comparing congressional headshots to mugshots, showed that 40% of those who were misidentified were people of color. Just last year, Robert Julian-Borchak Williams was misidentified by the Detroit Police Department via facial recognition technology for shoplifting.

In healthcare, black-box models are typically used to help professionals make better recommendations on care and treatments based on the patients' demographic, such as age, gender, and income. This is great, until we realize that some data are likely to favor just one treatment, but one generic treatment will not work for everyone. For example, if my colleague and I had the same diagnosis and were recommended the same treatment, the treatment could work on one of us and not the other because of our genetic makeup, which is not accounted for in the algorithm. 

In the end, data in itself is neither good nor bad. But, without transparency of how black-box models project results, it presents skewed information that becomes difficult to reevaluate or fix without insight on the actual algorithm being used. As data professionals, we are responsible for ensuring that the information we are gathering and the results being projected are fair to the best of our knowledge, and most importantly, does no harm, especially to vulnerable and underprivileged communities. It's time we go back to the basics — relying on interpretable models such as regressions and decision trees and understanding the "why" of certain data points before analyzing or extracting the data. Even if it means, at times, sacrificing accuracy for fairness.

Christelle Kamaliza, Market Research Specialist, IAPP Christelle Kamaliza is a Market Research Specialist at the International Association of Privacy Professionals (IAPP). She is in charge of the market and customer insights and supports the IAPP Research team on data ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Visit the Web's Most Authoritative Resource on Physical Security

To get the latest news and analysis on threats, vulnerabilities, and best practices for enterprise physical security, please visit IFSEC Global. IFSEC Global offers expert insight on critical issues and challenges in physical security, and hosts one of the world's most widely-attended conferences for physical security professionals.

US Formally Attributes SolarWinds Attack to Russian Intelligence Agency
Jai Vijayan, Contributing Writer,  4/15/2021
Dependency Problems Increase for Open Source Components
Robert Lemos, Contributing Writer,  4/14/2021
FBI Operation Remotely Removes Web Shells From Exchange Servers
Kelly Sheridan, Staff Editor, Dark Reading,  4/14/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-04-20
An unsafe deserialization vulnerability in Bridgecrew Checkov by Prisma Cloud allows arbitrary code execution when processing a malicious terraform file. This issue impacts Checkov 2.0 versions earlier than Checkov 2.0.26. Checkov 1.0 versions are not impacted.
PUBLISHED: 2021-04-20
An information exposure through log file vulnerability exists in Palo Alto Networks PAN-OS software where secrets in PAN-OS XML API requests are logged in cleartext to the web server logs when the API is used incorrectly. This vulnerability applies only to PAN-OS appliances that are configured to us...
PUBLISHED: 2021-04-20
An information exposure through log file vulnerability exists in Palo Alto Networks PAN-OS software where the connection details for a scheduled configuration export are logged in system logs. Logged information includes the cleartext username, password, and IP address used to export the PAN-OS conf...
PUBLISHED: 2021-04-20
A denial-of-service (DoS) vulnerability in Palo Alto Networks GlobalProtect app on Windows systems allows a limited Windows user to send specifically-crafted input to the GlobalProtect app that results in a Windows blue screen of death (BSOD) error. This issue impacts: GlobalProtect app 5.1 versions...
PUBLISHED: 2021-04-19
An out-of-bounds (OOB) memory access flaw was found in fs/f2fs/node.c in the f2fs module in the Linux kernel in versions before 5.12.0-rc4. A bounds check failure allows a local attacker to gain access to out-of-bounds memory leading to a system crash or a leak of internal kernel information. The hi...