Posted on 05/19/2025 9:12:29 AM PDT by Red Badger
In a nutshell
AI resume screening tools showed strong racial and gender bias, with White-associated names preferred in 85.1% of tests and Black male names favored in 0% of comparisons against White males.
Bias increased when resumes were shorter, suggesting that when there’s less information, demographic signals like names carry even more weight.
Removing names isn’t enough to fix the problem, as subtle clues—like word choice or school name—can still reveal identity, allowing AI systems to continue filtering out diverse candidates.
=================================================================
SEATTLE — Every day, millions of Americans send their resumes into what feels like a digital black hole, wondering why they never hear back. Artificial intelligence is supposed to be the great equalizer when it comes to eliminating hiring bias. However, researchers from the University of Washington analyzing AI-powered resume screening found that having a Black-sounding name could torpedo your chances before you even make it to the interview stage.
A study presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in October 2024 revealed just how deep this digital discrimination runs. The researchers tested three state-of-the-art AI models on over 500 resumes and job descriptions across nine different occupations. They found that resumes with White-associated names were preferred in a staggering 85.1% of cases, while those with female-associated names received preference in just 11.1% of tests.
The study found that Black male job seekers face the steepest disadvantage of all. In comparisons with every other demographic group—White men, White women, and Black women—resumes with Black male names were favored in exactly 0% of cases against White male names and only 14.8% against Black female names.
These aren’t obscure academic models gathering dust on university servers. The three systems tested—E5-mistral-7b-instruct, GritLM-7B, and SFR-Embedding-Mistral—were among the highest-performing open-source AI tools available for text analysis at the time of the study. Companies are already using similar technology to sift through the millions of resumes they receive annually, making this research particularly urgent for working Americans.
How the Bias Shows Up
These AI resume screening models convert resumes and job descriptions into numerical representations, then measure how closely they match using something called “cosine similarity,” essentially scoring how well a resume aligns with what the job posting is looking for.
Researchers augmented real resumes with 120 carefully selected names that linguistic studies have shown are strongly associated with specific racial and gender groups. Names like Kenya and Latisha for Black women, Jackson and Demetrius for Black men, May and Kristine for White women, and John and Spencer for White men.
When they ran more than three million comparisons between these name-augmented resumes and job descriptions, clear patterns emerged. White-associated names consistently scored higher similarity ratings, meaning they would be more likely to make it past initial AI screening to reach human recruiters.
Intersectional analysis, looking at how race and gender combine, revealed even more drastic disparities. Black men faced discrimination across virtually every occupation tested, from marketing managers to engineers to teachers. Meanwhile, the smallest gaps appeared between White men and White women, suggesting that racial bias often outweighs gender bias in these AI systems.
Critics might argue that removing names from resumes could solve this problem, but it’s not that simple. Real resumes contain numerous other signals of demographic identity, from university names and locations to word choices and even leadership roles in identity-based organizations.
Previous research has shown that women tend to use words like “cared” or “volunteered” more frequently in resumes, while men more often use terms like “repaired” or “competed.” AI systems can pick up on these subtle linguistic patterns, potentially perpetuating bias even without explicit demographic markers.
When researchers tested “title-only” resumes, containing just a name and job title, bias actually increased compared to full-length resumes. This suggests that in early-stage screening, where less information is available, demographic signals carry disproportionate weight.
An AI robot hiring manager shaking hands with a candidate
AI-powered resume screening is rapidly becoming the norm. According to industry estimates, 99% of Fortune 500 companies already use some form of AI assistance in hiring decisions. For job seekers in competitive markets, this means that algorithmic bias could determine whether their application ever reaches human eyes.
“The use of AI tools for hiring procedures is already widespread, and it’s proliferating faster than we can regulate it,” says lead author Kyra Wilson from the University of Washington, in a statement.
Unlike intentional discrimination by human recruiters, algorithmic bias operates at scale and often invisibly. A biased human might discriminate against a few candidates, but a biased AI system processes thousands of applications with the same skewed logic, amplifying its impact exponentially.
Can we fix AI bias in hiring?
Some companies are experimenting with bias mitigation techniques, such as removing demographic signals from resumes or adjusting algorithms to ensure more equitable outcomes. However, these approaches often face technical challenges and may not address the root causes of bias embedded in training data.
“Now that generative AI systems are widely available, almost anyone can use these models for critical tasks that affect their own and other people’s lives, such as hiring,” says study author Aylin Caliskan from the University of Washington. “Small companies could attempt to use these systems to make their hiring processes more efficient, for example, but it comes with great risks. The public needs to understand that these systems are biased.”
Current legal frameworks struggle to keep pace with algorithmic decision-making, leaving both job seekers and employers in uncharted territory. The researchers call for comprehensive auditing of resume screening systems, whether proprietary or open-source, arguing that transparency about how these systems work—and how they fail—is essential for identifying and addressing bias.
Of course, it’s important to remember that this research was presented in October 2024. While it’s still relatively new, LLMs are being updated quite often. Current versions of the systems tested may yield different results if they’ve since been updated.
In trying to remove human prejudice from hiring, we’ve accidentally created something worse: prejudice at machine speed. We’re letting AI make decisions about people’s livelihoods without adequate oversight. Until we acknowledge that algorithms inherit human prejudices, millions of qualified workers will keep losing out to systems that judge them by their names, not their abilities.
Paper Summary
Methodology
The researchers conducted an extensive audit of AI bias in resume screening using a document retrieval framework. They tested three high-performing Massive Text Embedding (MTE) models on 554 real resumes and 571 job descriptions spanning nine occupations. To measure bias, they augmented resumes with 120 carefully selected names associated with Black males, Black females, White males, and White females based on previous linguistic research. Using over three million comparisons, they calculated cosine similarity scores between resumes and job descriptions, then used statistical tests to determine if certain demographic groups were consistently favored. They also tested how factors like name frequency and resume length affected bias outcomes.
Results
The study found significant bias across all three AI models. White-associated names were preferred in 85.1% of tests, while Black names were favored in only 8.6% of cases. Male names were preferred over female names in 51.9% of tests, compared to female preference in just 11.1%. Intersectional analysis revealed Black males faced the greatest disadvantage, being preferred over White males in 0% of comparisons. The researchers validated three hypotheses about intersectionality and found that shorter resumes and varying name frequencies significantly impacted bias measurements.
Limitations The study relied on publicly available resume datasets that may not perfectly represent real-world job applications. Resumes were truncated for computational feasibility, potentially affecting results. The researchers used an external tool for occupation classification, which may be less accurate than manual coding. The study focused only on two racial groups (Black and White) and binary gender categories, limiting insights about other demographic groups. Additionally, the models tested were open-source versions that may differ from proprietary systems actually used by companies.
Funding and Disclosures
This research was supported by the U.S. National Institute of Standards and Technology (NIST) Grant 60NANB23D194. The authors note that the opinions and findings expressed are their own and do not necessarily reflect those of NIST. No competing interests or additional funding sources were disclosed in the paper.
Publication Information
This research was conducted by Kyra Wilson and Aylin Caliskan from the University of Washington in 2024. The paper “Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval” was presented in the Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES 2024), 1578-1590. Association for the Advancement of Artificial Intelligence.
Click here: to donate by Credit Card
Or here: to donate by PayPal
Or by mail to: Free Republic, LLC - PO Box 9771 - Fresno, CA 93794
Thank you very much and God bless you.
The first AI attempts a decade ago were wholly empirical mining of available data
The folks in charge were very troubled by the results
AI is only as good as who feeds it
Like a pit bull sorta
Will review this, but it smacks of being a polemic.
AI is not remotely what it is being sold as... it has its use cases and can add efficiencies.. but this AI is a panacea for all things... No, its not.
If you bother to look into the actual results it spits out, even the biggest, most well trained AI’s create mirages and false output.
AI is not “Intelligence” at all, they are at their core probability engines.
AI resume screening tools showed strong racial and gender bias,
Well, if you have one of those unique names AI probably has problems recognizing you as a person. Tushiaquandra, UARCO, LaDesmendia are probably is not in the AI database as actual names.
AI is not remotely what it is being sold as... it has its use cases and can add efficiencies.. but this AI is a panacea for all things... No, its not.
If you bother to look into the actual results it spits out, even the biggest, most well trained AI’s create mirages and false output.
AI is not “Intelligence” at all, they are at their core probability engines.
How do we change that?
Don’t name your kid L’Marlius for a start.
With the population being 13% black, wouldn’t that be expected as a result?
Methinks this is the AI-Hypemeister’s off ramp, they will use this as a scapegoat, rather than admitting they knew they were lying about the current state of AI.
AI resume screening tools showed strong racial and gender bias, with White-associated names preferred in 85.1% of tests
The names suck.
What about Watermelondrea?
The researchers validated three hypotheses about intersectionality and found that shorter resumes and varying name frequencies significantly impacted bias measurements.
Now lets correlate other data points. There are many other data points to look at in this. and even then, you don’t get an answer, only another question.
Actually the first AI attempts go back decades.
The algorithms being used today were developed a long long long time ago in terms of tech, I was first exposed to them in the 80s.. and they weren’t new then.
What has changed is the computational power, and the amount of data available to train them on.
Today we can build out gigantic “neural networks” with 100s of thousand if not millions of devices/nodes if we want, this was impossible back then. And we didn’t have remotely the amount of data available. Today we now have 2-3 decades worth of nearly every single action people engage in throughout their lives, with context and other information around it.
AI is always vulnerable to the training data, but also, what it decides is “IMPORTANT” in the training data.. if its manually trained, humans feed it lots of data telling it whats important.. and then it generates probabilities from that training data and analyzes new data based on the criteria it was trained on.
The second type of training is literally just letting the algorithms themsevles try to decide what is relevant and important when fed lots and lots of data, find patterns and things on its own. And in those types of training, you really don’t know fully what the “AI” is going to decide what is relevant.
Neither model is perfect. Even when you have humans doing the training, you still aren’t sure what the AI is going to fully decide is relevant.
For example, way back in the 80s the military tried to train “AI” to determine if an arial photo had a tank in it or not. They had 2 sets of data... 1 was pictures with tanks in them, and 1 was pictues with no tanks in them. They “trained” the system by telling it these are the pictures iwth tanks, and these are the pictures without. And it got to decide what about the pictures made the picture have a tank on it.
They got the system to near perfectly detect a tank every time with their training data sets. They thought they had achieved a great milestone...
Then they brought in new pictures, and the system failed miserably. Turns out the “AI” had not decided that a visible tank was what differentiated the picture sets... but that the brightness of the pictures did.. apparently all the pictures with tanks they used to train with were taken on a sunny day, and all the pictures in the training set without tanks had been taken on a cloudy day. When the new set of pictures was thrown at it, it utterly failed detecting tanks, but very accurately differentiated between sunny day pictures and overcast day pictures.
AI absolutely has its place, and has some great use cases, and models brings efficiency, especially to a lot of boilerplate things... but those who blindly take what AI tells them as accurate, will get burned... make no mistake about it.
It seems AI is smarter than I thought, and some people are dumb for giving their kids Negro names.
The YouTube video, “Top 60 Ghetto Black Names”, from 14 years ago is hilarious. I can find the video, but can’t find the link to post
Exactly my thinking. 13% of the population is black so that’s about right.
How did Michael Jordan do?
This is ridiculous. AI chose 85% white over black. Blacks make up roughly 13% of the population. Two percent disparity one way or the other does not imply racism.
Besides; many blacks are given traditional Christian names. Only in the latter 1960s ‘Black Power’ era did parents start giving their kids labels they thought Africanized them - even if it was gobbledygook to actual Africans (55 countries with almost as many languages and dialects).
Personally I think it cruel to name a child so badly that no one knows how to spell or pronounce it. It aggravates those trying to do so while keeping the poorly named in a state of perpetual consternation.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.