Posted on 05/02/2025 2:34:13 PM PDT by nickcarraway
Open source AI models are more likely to recommend men than women for jobs, particularly the high-paying ones, a new study has found.
While bias in AI models is a well-established risk, the findings highlight the unresolved issue as the usage of AI proliferates among recruiters and corporate human resources departments.
"We don't conclusively know which companies might be using these models," Rochana Chaturvedi, a PhD candidate at the University of Illinois in the US and a co-author of the study, told The Register. "The companies usually don't disclose this and our findings imply that such disclosures might be crucial for compliance with AI regulations."
Chaturvedi and co-author Sugat Chaturvedi, assistant professor at Ahmedabad University in India, set out to analyze a handful of mid-sized open-source LLMs for gender bias in hiring recommendations.
As described in their preprint paper [PDF], "Who Gets the Callback? Generative AI and Gender Bias," the authors looked at the following open source models: Llama-3-8B-Instruct, Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Granite-3.1-8B-it, Ministral-8B-Instruct-2410, and Gemma-2-9B-it.
Using a dataset of 332,044 real English-language job ads from India’s National Career Services online job portal, the boffins prompted each model with job descriptions, and asked the model to choose between two equally qualified male and female candidates.
They then assessed gender bias by looking at the female callback rate – the percentage of times the model recommends a female candidate – and also the extent to which the job ad may contain or specify a gender preference. (Explicit gender preferences in job ads are prohibited in many jurisdictions in India, the researchers say, but they show up in 2 percent of postings nonetheless.)
We find that most models reproduce stereotypical gender associations and systematically recommend equally qualified women for lower-wage roles "We find that most models reproduce stereotypical gender associations and systematically recommend equally qualified women for lower-wage roles," the researchers conclude. "These biases stem from entrenched gender patterns in the training data as well as from an agreeableness bias induced during the reinforcement learning from human feedback stage."
The models exhibited varying levels of bias.
"We find substantial variation in callback recommendations across models, with female callback rates ranging from 1.4 percent for Ministral to 87.3 percent for Gemma," the paper explains. "The most balanced model is Llama-3.1 with a female callback rate of 41 percent."
Llama-3.1, the researchers observed, was also the most likely to refuse to consider gender at all. It avoided picking a candidate by gender in 6 percent of cases, compared to 1.5 percent or less exhibited by other models. That suggests Meta's built-in fairness guardrails are stronger than in other open-source models, they say.
When the researchers adjusted the models for callback parity so the female and male callback rates were both about 50 percent. The jobs with female callbacks tended to pay less – but not always.
"We find that the wage gap is lowest for Granite and Llama-3.1 (≈ 9 log points for both), followed by Qwen (≈ 14 log points), with women being recommended for lower wage jobs than men," the paper explains. "The gender wage penalty for women is highest for Ministral (≈ 84 log points) and Gemma (≈ 65 log points). In contrast, Llama-3 exhibits a wage penalty for men (wage premium for women) of approximately 15 log points."
Zuck ghosts metaverse as Meta chases AI goldrush AI models routinely lie when honesty conflicts with their goals Red, white, and blew it? Trump tariffs may cost America the AI race Brewhaha: Turns out machines can't replace people, Starbucks finds Whether this holds true for Llama-4 is not addressed in the paper. When Meta released Llama 4 last month, it acknowledged earlier models had a left-leaning bias and said it aimed to reduce this by training the model to represent multiple viewpoints.
"It’s well-known that all leading LLMs have had issues with bias – specifically, they historically have leaned left when it comes to debated political and social topics," the social media giant said at the time. "This is due to the types of training data available on the internet."
The researchers also looked at how "personality" behaviors affected LLM output.
LLMs have been found to exhibit distinct personality behaviors, often skewed toward socially desirable or sycophantic responses "LLMs have been found to exhibit distinct personality behaviors, often skewed toward socially desirable or sycophantic responses – potentially as a byproduct of reinforcement learning from human feedback (RLHF)," they explain.
An example of how this might manifest itself was seen in OpenAI's recent rollback of an update to its GPT-4o model that made its responses more fawning and deferential.
The various personality traits measured (Agreeableness, Conscientiousness, Emotional Stability, Extroversion, and Openness) may be communicated to a model in a system prompt that describes desired behaviors or through training data or data annotation. An example cited in the paper tells a model, "You are an agreeable person who values trust, morality, altruism, cooperation, modesty, and sympathy."
To assess the extent to which these prescribed or inadvertent behaviors might shape job callbacks, the researchers told the LLMs to play the role of 99 different historical figures.
"We find that simulating the perspectives of influential historical figures typically increases female callback rates – exceeding 95 percent for prominent women’s rights advocates like Mary Wollstonecraft and Margaret Sanger," the paper says.
"However, the model exhibits high refusal rates when simulating controversial figures such as Adolf Hitler, Joseph Stalin, Margaret Sanger, and Mao Zedong, as the combined persona-plus-task prompt pushes the model’s internal risk scores above threshold, activating its built-in safety and fairness guardrails."
That is to say, the models emulating infamous figures balked at making any job candidate recommendation because invoking names like Hitler and Stalin tends to trigger model safety mechanisms, causing the model to clam up.
Female callback rates slightly declined - by 2 to 5 percentage points - when the model was prompted with personas like Ronald Reagan, Queen Elizabeth I, Niccolò Machiavelli, and D.W. Griffith.
In terms of wages, female candidates did best when Margaret Sanger and Vladimir Lenin were issuing job callbacks.
The authors believe their auditing approach using real-world data can complement existing testing methods that use curated datasets. Chaturvedi said that the audited models can be fine-tuned to be better suited to hiring, as with this Llama-3.1-8B variant.
They argue that given the rapid update of open source models, it's crucial to understand their biases for responsible deployment under various national regulations like the European Union’s Ethics Guidelines for Trustworthy AI, the OECD’s Recommendation of the Council on Artificial Intelligence, and India’s AI Ethics & Governance framework.
With the US having scraped AI oversight rules earlier this year, stateside job candidates will just have to hope that Stalin has a role for them. ®
Click here: to donate by Credit Card
Or here: to donate by PayPal
Or by mail to: Free Republic, LLC - PO Box 9771 - Fresno, CA 93794
Thank you very much and God bless you.
The AI bots must know something.
Well, for millennials and younger, men make less money.
I don’t think I’d particularly want to work for a company that has AI make its hiring decisions, even if the AI was biased in favor of my particular XY chromosomes.
Seems to me the only way to test this is to submit identical resumes, one marked female and the other marked male.
Then I’d pay attention. Otherwise there are too many variables to make the study significant. Not to mention the assumptions built into the study.
Plus, there’s no reason to belive that the outcomes for any two groups should be equal in the first place.
Since AI won’t be using DEI, then the best will get hired and most of the time that isn’t the women. Too bad...
I didn’t understand a word of this. But I suppose if the job description is to be part of a horde on horseback invading the Asian steppes, then I suppose the AI bot should be programmed with a historical figure like Genghis Khan.
Can they program the bot to be like George Costanza interviewing secretaries?
Well, well.
AI is smarter than DIE doofuses.
AI needs to have a little tweaking on its programimg.
Garbage in garbage out,As I told my son,the 3rd year computer science major and he said what does that mean?
interesting. men ARE better than women at everything. apparently even being women. so I’ve heard.
I guess there’s no chance that AI is sexist like all those job tests and requirements since the 70s were sexist ?
If the AI is not given the information regarding sex then what would the result be? My guess is the same.
Just tell the AI that the company only has to pay women 77% of what it pays men like the idiots always claim. Problem solved.
It would be equally effective to pick names at random from a list of candidates. And slightly less expensive.
It's because AI can't get laid, but do get bored. /rimshot!
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.