Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Physical Security

3/26/2021
01:00 PM
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Data Bias in Machine Learning: Implications for Social Justice

Take historically biased data, then add AI and ML to compound and exacerbate the problem.

Machine learning and artificial intelligence have taken organizations to new heights of innovation, growth, and profits thanks to their ability to analyze data efficiently and with extreme accuracy. However, the inherent nature of some algorithms such as black-box models have been proven, at times, to be unfair and lack transparency, leading to multiplicated bias and detrimental impact on minorities.

There are several key issues presented by black-box models, and they all work together to further bias data. The most prominent are models fed with data that is historically biased to begin with, and fed by humans who are biased by nature. In addition, because data analysts can only see the inputs and outputs but not the internal workings of how results are determined, machine learning is constantly aggregating this data, including personal data. But this process lacks transparency on how the data is being used and why. The lack of transparency means that data analysts have no clear view of inputs and outputs, and algorithms are making analyses and predictions about our work performance, economic situation, health, preferences, and more without providing insights into how it came up with its conclusion.

Related Content:

Are Unconscious Biases Weakening Your Security Posture?

Special Report: How IT Security Organizations Are Attacking the Cybersecurity Problem

New From The Edge: How to Protect Vulnerable Seniors From Cybercrime

In the infosec realm, this is important as more security platforms and services increasingly rely on ML and AI for automation and superior performance. But if the underlying software and algorithms for these same products and services reflect biases, they'll simply perpetuate the prejudices and errant conclusions associated with race, gender, religion, physical abilities, appearance and other characteristics. This has implications for both information and physical security, as well as for personal privacy.

One of the most prominent examples of bias presented by these key issues emerges in the justice system and risk scores. In law enforcement, risk scores are used to predict the likeliness or risk of there being a crime committed by a group of people, a person, or in a certain location. When police departments ask "What locations have higher crime rates?" in order to inundate law enforcement in crime-prone areas, they look at geolocation's risk scores. But dispatching more police officers to a certain location equates to more arrests, and the more reported arrests of any kind in that area equates to more officers being sent to the location by the risk score. It's a never-ending cycle.

A study of risk scores conducted by ProPublica found that Black defendants were 77% more likely to be pegged as "higher risk of committing a future violent crime" and 45% were "more likely to be predicted to commit a future crime of any kind." They also found that the risk score formula was "particularly likely to falsely flag Black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants" (emphasis added). 

Recently, Boston Celtics players published an opinion piece in The Boston Globe calling out the various bias implications of facial recognition technology in minority communities. Facial recognition technology, which also uses black-box models, has had a history of misidentifying Black people and people of color. A test run by the ACLU, comparing congressional headshots to mugshots, showed that 40% of those who were misidentified were people of color. Just last year, Robert Julian-Borchak Williams was misidentified by the Detroit Police Department via facial recognition technology for shoplifting.

In healthcare, black-box models are typically used to help professionals make better recommendations on care and treatments based on the patients' demographic, such as age, gender, and income. This is great, until we realize that some data are likely to favor just one treatment, but one generic treatment will not work for everyone. For example, if my colleague and I had the same diagnosis and were recommended the same treatment, the treatment could work on one of us and not the other because of our genetic makeup, which is not accounted for in the algorithm. 

In the end, data in itself is neither good nor bad. But, without transparency of how black-box models project results, it presents skewed information that becomes difficult to reevaluate or fix without insight on the actual algorithm being used. As data professionals, we are responsible for ensuring that the information we are gathering and the results being projected are fair to the best of our knowledge, and most importantly, does no harm, especially to vulnerable and underprivileged communities. It's time we go back to the basics — relying on interpretable models such as regressions and decision trees and understanding the "why" of certain data points before analyzing or extracting the data. Even if it means, at times, sacrificing accuracy for fairness.

Christelle Kamaliza, Market Research Specialist, IAPP Christelle Kamaliza is a Market Research Specialist at the International Association of Privacy Professionals (IAPP). She is in charge of the market and customer insights and supports the IAPP Research team on data ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Visit the Web's Most Authoritative Resource on Physical Security

To get the latest news and analysis on threats, vulnerabilities, and best practices for enterprise physical security, please visit IFSEC Global. IFSEC Global offers expert insight on critical issues and challenges in physical security, and hosts one of the world's most widely-attended conferences for physical security professionals.

Commentary
Cyberattacks Are Tailored to Employees ... Why Isn't Security Training?
Tim Sadler, CEO and co-founder of Tessian,  6/17/2021
Edge-DRsplash-10-edge-articles
7 Powerful Cybersecurity Skills the Energy Sector Needs Most
Pam Baker, Contributing Writer,  6/22/2021
News
Microsoft Disrupts Large-Scale BEC Campaign Across Web Services
Kelly Sheridan, Staff Editor, Dark Reading,  6/15/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2021-34390
PUBLISHED: 2021-06-22
Trusty TLK contains a vulnerability in the NVIDIA TLK kernel function where a lack of checks allows the exploitation of an integer overflow on the size parameter of the tz_map_shared_mem function.
CVE-2021-34391
PUBLISHED: 2021-06-22
Trusty TLK contains a vulnerability in the NVIDIA TLK kernel�s tz_handle_trusted_app_smc function where a lack of integer overflow checks on the req_off and param_ofs variables leads to memory corruption of critical kernel structures.
CVE-2021-34392
PUBLISHED: 2021-06-22
Trusty TLK contains a vulnerability in the NVIDIA TLK kernel where an integer overflow in the tz_map_shared_mem function can bypass boundary checks, which might lead to denial of service.
CVE-2021-34393
PUBLISHED: 2021-06-22
Trusty contains a vulnerability in TSEC TA which deserializes the incoming messages even though the TSEC TA does not expose any command. This vulnerability might allow an attacker to exploit the deserializer to impact code execution, causing information disclosure.
CVE-2021-34394
PUBLISHED: 2021-06-22
Trusty contains a vulnerability in all TAs whose deserializer does not reject messages with multiple occurrences of the same parameter. The deserialization of untrusted data might allow an attacker to exploit the deserializer to impact code execution.