Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Elon Musk says AI has already gobbled up all human-produced data to train itself and now relies on hallucination-prone synthetic data
https://www.msn.com ^ | 1/11/25 | Story by Sasha Rogelberg

Posted on 01/11/2025 2:04:18 AM PST by RomanSoldier19

AI takes an immense amount of resources—from endless water to an estimated $1 trillion worth of investor dollars—but Elon Musk warned the technology has already run out of its primary training resource: human-created data.

Engineers and data scientists train AI by essentially reducing the entire internet, all books, and every interesting video published into a token that AI can digest and learn from, Musk told Mark Penn, CEO of marketing company Stagwell, in an interview streamed on X Wednesday. But AI has already consumed that information, and requires even more data to fine-tune itself.

(Excerpt) Read more at msn.com ...


TOPICS: News/Current Events
KEYWORDS: ai; alreadyposted; data; musk; over
Navigation: use the links below to view more comments.
first 1-2021-4041-43 next last

1 posted on 01/11/2025 2:04:18 AM PST by RomanSoldier19
[ Post Reply | Private Reply | View Replies]

To: RomanSoldier19

But, humans continue to produce more data.


2 posted on 01/11/2025 2:12:41 AM PST by Jyotishi (Seeking the truth, a fact at a time.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: RomanSoldier19

Meanwhile, back in the real world ya gotta bust a grape.


3 posted on 01/11/2025 2:18:42 AM PST by HighSierra5 (The only way you know a commie is lying is when they open their pieholes.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: RomanSoldier19

The whistle blower said it’s using massive amounts of stolen and copyrighted data and the mother of all lawsuits is coming. Then they suicided him.


4 posted on 01/11/2025 2:25:26 AM PST by DesertRhino (2016 Star Wars, 2020 The Empire Strikes Back, 2024... RETURN OF THE JEDI..)
[ Post Reply | Private Reply | To 1 | View Replies]

To: RomanSoldier19
I don't buy that. Not for a second do I believe the sum of mankind's knowledge has ever been scanned and parsed. I don't care what size 'language model' is used. I have to ax the Boss Computer just what Ulysses is really about. Then we could move on to deciphering the utterances of Maya Angelou.
5 posted on 01/11/2025 2:29:46 AM PST by ComputerGuy (Heavily-medicated for your protection)
[ Post Reply | Private Reply | To 1 | View Replies]

To: RomanSoldier19

ya... we can tell by the dominance of fake pictures all over the net!


6 posted on 01/11/2025 2:43:46 AM PST by sit-rep
[ Post Reply | Private Reply | To 1 | View Replies]

To: RomanSoldier19

wait until AI gets indigestion from all the crap human databases its swallowed


7 posted on 01/11/2025 2:50:55 AM PST by mo ("If you understand, no explanation is needed; if you don't understand, no explanation is possible)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ComputerGuy

Agreed. AI may have goobled up all accessible digitized written data (which I doubt) but it is nowhere near a state of extracting something stable and trustworthy from it. And it is nowhere, if ever, of having goobled up audio and video data.
Musk is telling a load of bollocks for his salesman pitch.


8 posted on 01/11/2025 3:00:25 AM PST by miniTAX
[ Post Reply | Private Reply | To 5 | View Replies]

To: DesertRhino

He should have pulled the plug and wiped it all clean.


9 posted on 01/11/2025 3:15:50 AM PST by FES0844
[ Post Reply | Private Reply | To 4 | View Replies]

To: ComputerGuy

The 9-5 will be dead in 10 years


10 posted on 01/11/2025 3:23:06 AM PST by RandFan
[ Post Reply | Private Reply | To 5 | View Replies]

To: RomanSoldier19
I asked ChatGPT for the following:

The percentage of synthetic data used to train AI platforms compared to real data varies widely depending on the domain, use case, and the availability or sensitivity of real-world data. Here are some realistic ranges and factors influencing synthetic data usage:

Realistic Percentage Ranges

  1. Domains Where Synthetic Data Is Commonly Used
    • Healthcare and Life Sciences: 50–90% synthetic data
      Real data in these fields is highly sensitive and subject to strict privacy regulations (e.g., HIPAA). Synthetic data is often used to augment datasets for training models.
    • Autonomous Vehicles: 60–80% synthetic data
      Simulated environments are extensively used to generate training data for self-driving cars, as capturing real-world scenarios (e.g., rare accidents) is costly and time-consuming.
    • Finance and Banking: 30–70% synthetic data
      Synthetic data helps create secure, privacy-preserving datasets for fraud detection, customer behavior modeling, and compliance tasks.
    • Retail and E-commerce: 20–50% synthetic data
      Synthetic data can supplement real customer behavior data to simulate specific purchase patterns or scenarios.
  2. General Machine Learning Applications
    • Data-Rich Domains: 5–30% synthetic data
      Domains like natural language processing (NLP) or large-scale image classification often rely primarily on real-world data. Synthetic data is used for data augmentation, filling gaps, or addressing specific biases.
    • Data-Poor or Restricted Domains: 40–100% synthetic data
      In scenarios where real-world data is sparse, expensive, or inaccessible (e.g., rare disease diagnosis or extreme weather forecasting), synthetic data plays a dominant role.
  3. Emerging AI Applications
    In cutting-edge fields or novel applications, the reliance on synthetic data can be as high as 70–100%, especially during the early stages of development.

Key Factors Influencing Synthetic vs. Real Data Usage

  1. Data Availability
    • Abundant Data: In areas like social media or consumer technology, real-world data is plentiful, so synthetic data is supplementary (~10–20%).
    • Scarce Data: In specialized or emerging fields, synthetic data becomes the primary source (~60–100%).
  2. Data Privacy and Regulations
    Industries with stringent privacy regulations (e.g., healthcare, finance) rely more heavily on synthetic data to ensure compliance.
  3. Cost of Data Collection
    When collecting real data is expensive or impractical (e.g., autonomous vehicles, satellite imagery), synthetic data plays a larger role.
  4. Task Complexity
    For tasks requiring edge cases or rare events (e.g., fraud detection, crash scenarios), synthetic data fills the gaps that real data can't easily provide.
  5. Bias and Diversity Needs
    Synthetic data is often used to mitigate bias in datasets or ensure representation of underrepresented groups (~20–50% synthetic data in such cases).

Current Trends

Recent studies and reports suggest that synthetic data accounts for 20–50% of the training data in many AI applications today, with the percentage expected to grow as synthetic data generation tools improve and privacy concerns escalate.

In summary, while synthetic data use ranges from 5% to 100%, most industries fall in the 20–80% range, depending on the context and requirements.


11 posted on 01/11/2025 3:25:20 AM PST by RoosterRedux (Emerson paraphrased, "If you strike at the king, don't fail." The Democrats failed. )
[ Post Reply | Private Reply | To 1 | View Replies]

To: RomanSoldier19

So what is “synthetic data” and where does it come from?

Do the various AI’s make it for each other? This is weird as hell.

If they have indeed consumed the internet, which I doubt: note he said “interesting” videos for eg.then why do they need more “refinement”?

All very confusing.


12 posted on 01/11/2025 3:27:07 AM PST by Adder (End fascism...defeat all Democrats.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Adder

“Do the various AI’s make it for each other?”

The article states that information produced by AI can be false. That is because AI really has no mind and can’t perceive that it is false.


13 posted on 01/11/2025 3:48:07 AM PST by odawg
[ Post Reply | Private Reply | To 12 | View Replies]

To: odawg
That is because AI really has no mind and can’t perceive that it is false.

You mean it's a Democrat?

14 posted on 01/11/2025 3:51:32 AM PST by Sirius Lee ("Never argue with a fool, onlookers may not be able to tell the difference.")
[ Post Reply | Private Reply | To 13 | View Replies]

To: Adder
Synthetic data is artificially created data that mimics real-world data, used for testing, training, or analysis when real data is limited, sensitive, or unavailable.

Here are some examples:


15 posted on 01/11/2025 4:03:21 AM PST by RoosterRedux (Emerson paraphrased, "If you strike at the king, don't fail." The Democrats failed. )
[ Post Reply | Private Reply | To 12 | View Replies]

To: RoosterRedux

But AI has already consumed that information, and requires even more data to fine-tune itself.

Hal open the door please, hal, hal?

Im sorry Im afraid I cant do that


16 posted on 01/11/2025 4:22:30 AM PST by ronnie raygun
[ Post Reply | Private Reply | To 15 | View Replies]

To: RomanSoldier19

99% of the internet is BS. Can you imagine a synthetic “mind” that is basically full of political propaganda and porn? I’ve used Chatbot and Grok and I’m not real impressed. It’s like dealing with a very slow and dull 3 year old child.


17 posted on 01/11/2025 4:25:51 AM PST by Strict9
[ Post Reply | Private Reply | To 1 | View Replies]


18 posted on 01/11/2025 4:37:15 AM PST by SunkenCiv (Putin should skip ahead to where he kills himself in the bunker.)
[ Post Reply | Private Reply | View Replies]

To: RoosterRedux

Ok...but its not real data...just simulations, inventions and flat out made up stuff based on what? Models?

Seems to me it would be like making decisions based on Grimm’s fairy tales.

Garbage in/garbage out still is operative, it seems to me.


19 posted on 01/11/2025 4:39:22 AM PST by Adder (End fascism...defeat all Democrats.)
[ Post Reply | Private Reply | To 15 | View Replies]

To: RomanSoldier19

Gotta think AI is just as prone to “mistakes/prejudices” as those who do the programming.....


20 posted on 01/11/2025 4:51:29 AM PST by trebb (So many fools - so little time...)
[ Post Reply | Private Reply | To 1 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-4041-43 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson