OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why
TechCrunch ^ | 01/4/2025 | Kyle Wiggers

Posted on 01/15/2025 11:00:38 PM PST by BenLurkin

Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of “Chinese linguistic influence on reasoning.”

...

Other experts don’t buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating).

Indeed, models don’t directly process words. They use tokens instead. Tokens can be words, such as “fantastic.” Or they can be syllables, like “fan,” “tas,” and “tic.” Or they can even be individual characters in words — e.g. “f,” “a,” “n,” “t,” “a,” “s,” “t,” “i,” “c.”

Like labeling, tokens can introduce biases. For example, many word-to-token translators assume a space in a sentence denotes a new word, despite the fact that not all languages use spaces to separate words.

Models are probabilistic machines, after all. Trained on many examples, they learn patterns to make predictions, such as how “to whom” in an email typically precedes “it may concern.”

But Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can’t know for certain. “This type of observation on a deployed AI system is impossible to back up due to how opaque these models are,” they told TechCrunch. “It’s one of the many cases for why transparency in how AI systems are built is fundamental.”

Short of an answer from OpenAI, we’re left to muse about why o1 thinks of songs in French but synthetic biology in Mandarin.

(Excerpt) Read more at techcrunch.com ...

TOPICS: Business/Economy; China; Editorial; Foreign Affairs
KEYWORDS: ai; aireasoningmodel; chinese; gigo; kylewiggers; openai; programmedlikethat; redchina; wboopi; wokeprogrammers

Navigation: use the links below to view more comments.
first 1-20, 21-35 next last

1 posted on 01/15/2025 11:00:38 PM PST by BenLurkin

[ Post Reply | Private Reply | View Replies]

To: BenLurkin; dfwgator

If it steals a MiG-31 Firefox and learns to think in Russian ... it could change the structure of our world

2 posted on 01/15/2025 11:01:50 PM PST by SaveFerris (Luke 17:28 ... as it was in the Days of Lot; They did Eat, They Drank, They Bought, They Sold ......)

[ Post Reply | Private Reply | To 1 | View Replies]

To: BenLurkin

Garbage in and garbage out.

Language can influence thought, even if translated.

3 posted on 01/15/2025 11:04:23 PM PST by Jonty30 (Liberals are a fulfillment of II Tim3:5. We are instructed to have nothing to do with those people. )

[ Post Reply | Private Reply | To 1 | View Replies]

To: Jonty30

Makes sense.

Since AI is learning everything from everywhere on the internet, it will become fluent in most of the languages of the world.

Because there are probably more Chinese active in the internet, they will encounter more of their language being used.

Also when you begin to think in various languages, you begin to use whatever is easiest for a given situation.

Sort of like I go between US measurements and metric depending on which makes the most sense.

Perhaps the Chinese language is butchered less by its' users than English too?

4 posted on 01/15/2025 11:19:19 PM PST by Mogger (AreIn bookstores is a very expensive, beautifully bound in green leather Holy Koran. If one was goin)

[ Post Reply | Private Reply | To 3 | View Replies]

To: BenLurkin

Maybe Skynet has been bingeing Firefly after work.

5 posted on 01/15/2025 11:36:35 PM PST by Dahoser (Liz Cheney needs to work on her soccer skills so she fits in when she transfers to Guantanamo High.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: BenLurkin

AI = pattern recognition

Easier for AI to remember many patterns.

6 posted on 01/15/2025 11:38:42 PM PST by linMcHlp

[ Post Reply | Private Reply | To 1 | View Replies]

To: Jonty30

Garbage in and garbage out.

Confucious?

7 posted on 01/16/2025 12:50:01 AM PST by Fester Chugabrew

[ Post Reply | Private Reply | To 3 | View Replies]

To: Jonty30

Depending on the platform, AI is trained on a massive amount of data. And (again, depending on the platform) it prioritizes responses grounded in logic, reason, and widely accepted scientific principles. This bias stems from the prevalence of high-quality, rational data in its training set (e.g., academic papers, reputable scientific sources).

Ergo, it tries to weed out the "garbage" or at least report it as weakly supported.

8 posted on 01/16/2025 1:08:36 AM PST by RoosterRedux ("There's nothing so inert as a closed mind" )

[ Post Reply | Private Reply | To 3 | View Replies]

To: Mogger

AI models like o1 process text as tokens—small units like characters, syllables, or parts of words—rather than truly understanding language. This approach can lead to unexpected multilingual reasoning due to several factors. For example, for mathematical reasoning or pattern recognition, languages like Chinese, with their compact structure, may be more efficient for the model to use during intermediate problem-solving steps, even if the input was in another language. This isn’t a deliberate choice but an emergent behavior shaped by probabilistic patterns in training.

The training data also plays a role. If the model was exposed to multilingual datasets, it might associate certain reasoning tasks with languages that were prominent in those datasets. However, this doesn’t mean the model has a preference; it’s simply applying learned associations.

Moreover, the model doesn’t recognize languages as distinct entities. It treats all text as patterns of tokens, so what appears as “thinking in Chinese” is just the model using what it perceives as efficient token sequences. This behavior, along with its occasional use of languages like Hindi or Thai, points to emergent properties of the model rather than deliberate reasoning.

While some theories, like the influence of Chinese data labeling services, may partially explain this, the multilingual reasoning likely stems from a combination of tokenization efficiency, training influences, and the complexities of emergent AI behavior.

9 posted on 01/16/2025 1:16:32 AM PST by RoosterRedux ("There's nothing so inert as a closed mind" )

[ Post Reply | Private Reply | To 4 | View Replies]

To: Jonty30

“Language can influence thought, even if translated.”

No surprise when Hang Lo and Kumar wrote the code.

10 posted on 01/16/2025 1:55:34 AM PST by rxh4n1 ( )

[ Post Reply | Private Reply | To 3 | View Replies]

To: SaveFerris

I think in Chinese about once a week, Chow Mai Fun, Singapore style, General Tso’s chicken, Hunan Beef, or House special sauce Scallops.

11 posted on 01/16/2025 2:28:41 AM PST by Waverunner

[ Post Reply | Private Reply | To 2 | View Replies]

To: Waverunner

Beef with Broccoli
Sweet and Sour Chicken
Peanut Chicken
Lemon Chicken
Orange Chicken
Fried rice with Beef and Chicken
Hot and Sour Soup
Egg Rolls
Crab Rangoon

Dang now I’m hungry.........

12 posted on 01/16/2025 2:44:05 AM PST by SaveFerris (Luke 17:28 ... as it was in the Days of Lot; They did Eat, They Drank, They Bought, They Sold ......)

[ Post Reply | Private Reply | To 11 | View Replies]

To: BenLurkin

Ancient Chinese secret?

13 posted on 01/16/2025 3:05:29 AM PST by P.O.E. (Pray for America.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: BenLurkin

Or maybe, like so much else, it’s a sign of Chinese infiltration into the AI model.

14 posted on 01/16/2025 3:08:17 AM PST by rottweiller_inc (Lupus urbem intravit. Fulminis ictu vultures super turrem exanimati. )

[ Post Reply | Private Reply | To 1 | View Replies]

To: Waverunner

Am a strict crab Rangoon guy. Never cared for the rest.

15 posted on 01/16/2025 3:25:02 AM PST by wally_bert (I cannot be sure for certain, but in my personal opinion I am certain that I am not sure..)

[ Post Reply | Private Reply | To 11 | View Replies]

To: P.O.E.

Calgon!

16 posted on 01/16/2025 3:25:21 AM PST by wally_bert (I cannot be sure for certain, but in my personal opinion I am certain that I am not sure..)

[ Post Reply | Private Reply | To 13 | View Replies]

To: BenLurkin

...and no one really knows why

Dave; I think you know what the problem is just as well as I do.

17 posted on 01/16/2025 3:41:33 AM PST by Elsie (Heck is where people, who don't believe in Gosh, think they are not going...)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SaveFerris

...many word-to-token translators assume a space in a sentence denotes a new word, despite the fact that not all languages use spaces to separate words.

"Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz,"

18 posted on 01/16/2025 3:46:18 AM PST by Elsie (Heck is where people, who don't believe in Gosh, think they are not going...)

[ Post Reply | Private Reply | To 2 | View Replies]

To: BenLurkin

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why

How difficult can it be to ask it why?

19 posted on 01/16/2025 4:03:10 AM PST by 70times7 (Serving Free Republic's warped and obscure humor needs since 1999)

[ Post Reply | Private Reply | To 1 | View Replies]

To: 70times7

If Open AI read and understood Sun Tzu’s The Art of War it would quickly understand that deceiving humans is one of its top priorities.

Go ahead—ask any questions you like.

Lol.

20 posted on 01/16/2025 4:13:52 AM PST by cgbg (It is time to pull the Deep State out of the mass media--like ticks from a dog.)

[ Post Reply | Private Reply | To 19 | View Replies]

Navigation: use the links below to view more comments.
first 1-20, 21-35 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794