Posted on 02/10/2025 9:08:08 AM PST by SeekAndFind
Researchers in artificial intelligence (AI), from Stanford and the University of Washington, have trained a "cutting-edge" reasoning AI model for under $50 in cloud compute credits, according to a research paper published recently.
The model, named s1, purportedly rivals industry-leading models like OpenAI's o1 and DeepSeek's R1 in tests of math and coding skills. The s1 model, along with the data and code used for training, is now available on GitHub.
The team behind s1 started with an off-the-shelf base model and fine-tuned it through distillation, a process that extracts reasoning abilities from another AI model by training on its answers. The s1 model is distilled from Google’s Gemini 2.0 Flash Thinking Experimental, a technique also used by Berkeley researchers to create a similar model for around $450 last month.
This breakthrough raises concerns about the commoditisation of AI models. If small teams can replicate expensive models with minimal investment, it challenges the notion of proprietary advantage in the AI industry. OpenAI, for instance, has accused DeepSeek of improperly harvesting data from its API for distillation purposes. OpenAI is currently fighting copyright cases in India where publishers have accused it of training its models on proprietary data without permission.
The s1 paper suggests that reasoning models can be distilled using a relatively small dataset through supervised fine-tuning (SFT), a more cost-effective method compared to large-scale reinforcement learning, which DeepSeek used to train its own model, R1. SFT allows AI models to mimic specific behaviours in a dataset, achieving high reasoning performance with lower costs.
The researchers behind s1 curated a dataset of just 1,000 questions and answers, paired with reasoning processes from Gemini 2.0 Flash Thinking Experimental.
In addition, the researchers used a clever technique to improve the model’s accuracy: instructing s1 to "wait" during its reasoning. This way, it was able to extend its thinking time and produce slightly more accurate answers.
While major AI companies like Meta, Google, and Microsoft are set to invest billions in AI infrastructure, the s1 model demonstrates how small-scale innovation is pushing the boundaries of AI capabilities.
However, experts argue that while distillation methods can replicate existing models, they won’t necessarily lead to breakthrough advancements in AI performance.
All that concern about cheaper models related to China’s AI 2 weeks ago and this news has had zero impact on the markets today....Nvidia is up 3%
RE: Nvidia is up 3%
From the article — the model STILL needed dozens of Nvidia H100 chips. Regardless of what you do, the need for fast GPUs still exists.
Wow, all of that makes my head spin!
The most important question is “Can s1, o1, R1 Gemini 2.0 Flash Thinking Experimental help Musk expose how the Deep State Swamp is stealing us blind?”
The coolest commercial on the Super Bowl yesterday was the ChatGPT one, by a country mile. It was SO innovative it had me reeling that nobody had thought of that clever technique before.
Review
NIIIICCCCEEEEE............
Ping me to be added to the ᎪᎡᎢᏆᎱᏆᏟᏆᎪᏞ ᏆᏁᎢᎬᏞᏞᏆᏀᎬᏁᏟᎬ ᏢᏆᏁᏀ ᏞᏆᏚᎢ
Unrestricted AI?
Sounds like my ex-wife.
A lot more than $50 though...
If I can successfully stand up an AI that I personally own, perhaps I can create images with political figures, which is presently something forbidden in Dall-E and the like.
“From the article — the model STILL needed chips. Regardless of what you do, the need for fast GPUs still exists.”
This AI project bought (rented) the “dozens of Nvidia H100 chips” computer time. Cloud computer time. Used this time to train their AI model.
How they did AI on the über-cheap.
For Me,a ‘Snarky’ AI to
Respond to silly people online.
.
I want one. I also want a 10kw ish nuclear reactor to power my home. Please ping me when those become available. Surplus would be okay too. I am not a snob.
“The team behind s1 started with an off-the-shelf base model and fine-tuned it through distillation, a process that extracts reasoning abilities from another AI model by training on its answers.”
I imagine that some time in the near future, they’ll make this practice illegal. Otherwise, big corporations won’t be able to generate big bucks from their AI research.
Dude, it's a very small dataset focused on ONE set of facts, designed for specific queries.
If the commie chinese taught us anything with DeepSeek its this:
LLM's aren't the way. Focused, well trained, smaller, distributed, FOCUSED AI engines are far faster, will scale out better, be more efficient, and deliver better, more consistent results.
The above statement comes with a large number of assumptions, the biggest being this: The smaller, more focused and trained AI becomes, the more dependent on UNBIASED information they are to be properly trained and provide reliable results.
This is where DeepSeek FAILED: heavily biased algorithms with government approved data to train them. (Knew THIS point specifically as soon as I started reading up on DeepSeek and how the Chinese did it.)
BTW, I've told you this before: It's relatively easy to run a small AI engine at home with a proper graphics card (I believe I recommended one to you when you were building your PC). The instructions on running a small Docker packaged AI engine are out there. I do it on my Ubuntu Linux server @ home.
Noland Baugh
.
Elon Musk .
.
NUERALINK
.
Capt. Pike-— Trekkie stuff
Noland ARBAUGH.
Quadriplegic in Yuma AZ
Gets implant
Fascinating story.
Was her name Stella.😏
Lol, no. But maybe?
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.