Posted on 02/10/2025 9:08:08 AM PST by SeekAndFind
Researchers in artificial intelligence (AI), from Stanford and the University of Washington, have trained a "cutting-edge" reasoning AI model for under $50 in cloud compute credits, according to a research paper published recently.
The model, named s1, purportedly rivals industry-leading models like OpenAI's o1 and DeepSeek's R1 in tests of math and coding skills. The s1 model, along with the data and code used for training, is now available on GitHub.
The team behind s1 started with an off-the-shelf base model and fine-tuned it through distillation, a process that extracts reasoning abilities from another AI model by training on its answers. The s1 model is distilled from Google’s Gemini 2.0 Flash Thinking Experimental, a technique also used by Berkeley researchers to create a similar model for around $450 last month.
This breakthrough raises concerns about the commoditisation of AI models. If small teams can replicate expensive models with minimal investment, it challenges the notion of proprietary advantage in the AI industry. OpenAI, for instance, has accused DeepSeek of improperly harvesting data from its API for distillation purposes. OpenAI is currently fighting copyright cases in India where publishers have accused it of training its models on proprietary data without permission.
The s1 paper suggests that reasoning models can be distilled using a relatively small dataset through supervised fine-tuning (SFT), a more cost-effective method compared to large-scale reinforcement learning, which DeepSeek used to train its own model, R1. SFT allows AI models to mimic specific behaviours in a dataset, achieving high reasoning performance with lower costs.
The researchers behind s1 curated a dataset of just 1,000 questions and answers, paired with reasoning processes from Gemini 2.0 Flash Thinking Experimental.
In addition, the researchers used a clever technique to improve the model’s accuracy: instructing s1 to "wait" during its reasoning. This way, it was able to extend its thinking time and produce slightly more accurate answers.
While major AI companies like Meta, Google, and Microsoft are set to invest billions in AI infrastructure, the s1 model demonstrates how small-scale innovation is pushing the boundaries of AI capabilities.
However, experts argue that while distillation methods can replicate existing models, they won’t necessarily lead to breakthrough advancements in AI performance.
What functions do graphics cards serve to aid algorithms in conjunction with AI?
Please excuse my question, am dumbfounded and dumbstruck in regards to knowledge of subject matter ….. AI
I'm not technical enough to explain why, other than to know graphics are processed differently than other compute functions, and as a result are better suited for AI.
The more powerful the GPU, and the more of them, the better AI performs (obviously, "more is better" applies.)
I'm trusting someone else with a better understanding will come along and answer your question better than I can.
Thank You for your response to my question. Please do not disparage your knowledge for IF a rock and myself were placed side by side and gaged for knowledge ….. the rock would be more knowledgeable than me.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.