Posted on 01/15/2025 11:00:38 PM PST by BenLurkin
Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of “Chinese linguistic influence on reasoning.”
...
Other experts don’t buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.
Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating).
Indeed, models don’t directly process words. They use tokens instead. Tokens can be words, such as “fantastic.” Or they can be syllables, like “fan,” “tas,” and “tic.” Or they can even be individual characters in words — e.g. “f,” “a,” “n,” “t,” “a,” “s,” “t,” “i,” “c.”
Like labeling, tokens can introduce biases. For example, many word-to-token translators assume a space in a sentence denotes a new word, despite the fact that not all languages use spaces to separate words.
Models are probabilistic machines, after all. Trained on many examples, they learn patterns to make predictions, such as how “to whom” in an email typically precedes “it may concern.”
But Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can’t know for certain. “This type of observation on a deployed AI system is impossible to back up due to how opaque these models are,” they told TechCrunch. “It’s one of the many cases for why transparency in how AI systems are built is fundamental.”
Short of an answer from OpenAI, we’re left to muse about why o1 thinks of songs in French but synthetic biology in Mandarin.
(Excerpt) Read more at techcrunch.com ...
If it steals a MiG-31 Firefox and learns to think in Russian ... it could change the structure of our world
Garbage in and garbage out.
Language can influence thought, even if translated.
Since AI is learning everything from everywhere on the internet, it will become fluent in most of the languages of the world.
Because there are probably more Chinese active in the internet, they will encounter more of their language being used.
Also when you begin to think in various languages, you begin to use whatever is easiest for a given situation.
Sort of like I go between US measurements and metric depending on which makes the most sense.
Perhaps the Chinese language is butchered less by its' users than English too?
AI = pattern recognition
Easier for AI to remember many patterns.
Garbage in and garbage out.
Ergo, it tries to weed out the "garbage" or at least report it as weakly supported.
AI models like o1 process text as tokens—small units like characters, syllables, or parts of words—rather than truly understanding language. This approach can lead to unexpected multilingual reasoning due to several factors. For example, for mathematical reasoning or pattern recognition, languages like Chinese, with their compact structure, may be more efficient for the model to use during intermediate problem-solving steps, even if the input was in another language. This isn’t a deliberate choice but an emergent behavior shaped by probabilistic patterns in training.
The training data also plays a role. If the model was exposed to multilingual datasets, it might associate certain reasoning tasks with languages that were prominent in those datasets. However, this doesn’t mean the model has a preference; it’s simply applying learned associations.
Moreover, the model doesn’t recognize languages as distinct entities. It treats all text as patterns of tokens, so what appears as “thinking in Chinese” is just the model using what it perceives as efficient token sequences. This behavior, along with its occasional use of languages like Hindi or Thai, points to emergent properties of the model rather than deliberate reasoning.
While some theories, like the influence of Chinese data labeling services, may partially explain this, the multilingual reasoning likely stems from a combination of tokenization efficiency, training influences, and the complexities of emergent AI behavior.
“Language can influence thought, even if translated.”
No surprise when Hang Lo and Kumar wrote the code.
I think in Chinese about once a week, Chow Mai Fun, Singapore style, General Tso’s chicken, Hunan Beef, or House special sauce Scallops.
Beef with Broccoli
Sweet and Sour Chicken
Peanut Chicken
Lemon Chicken
Orange Chicken
Fried rice with Beef and Chicken
Hot and Sour Soup
Egg Rolls
Crab Rangoon
Dang now I’m hungry.........
Ancient Chinese secret?
Or maybe, like so much else, it’s a sign of Chinese infiltration into the AI model.
Am a strict crab Rangoon guy. Never cared for the rest.
Calgon!
Dave; I think you know what the problem is just as well as I do.
"Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz,"
How difficult can it be to ask it why?
If Open AI read and understood Sun Tzu’s The Art of War it would quickly understand that deceiving humans is one of its top priorities.
Go ahead—ask any questions you like.
Lol.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.