ChatGPT Created a Fake Dataset With Skewed Results

ChatGPT Created a Fake Dataset With Skewed Results
MEDPAGE TODa=AY ^ | November 9, 2023 | Kristina Fiore

Posted on 11/15/2023 3:31:01 PM PST by nickcarraway

Findings raise concerns about the ease of generating false medical evidence

The latest version of ChatGPT was able to create an entirely fake dataset -- one that showed better results for one ophthalmic procedure over another, a research letter in JAMA Ophthalmologyopens in a new tab or window showed.

As prompted, GPT-4 with its "Advanced Data Analysis" technology made up the data and showed a significantly better post-operative best spectacle-corrected visual acuity (BSCVA) and topographic cylinder for deep anterior lamellar keratoplasty (DALK) compared with penetrating keratoplasty (PK) (P<0.001), according to Giuseppe Giannaccare, MD, PhD, of the University Magna Graecia of Catanzaro and the University of Cagliari in Italy, and colleagues.

"GPT-4 created a fake dataset of hundreds of patients in a matter of minutes and the data accuracy vastly exceeded our expectations," Giannaccare told MedPage Today in an email. "To be honest, this was a surprising, yet frightening experience!"

"The aim of this research was to shed light on the dark side of AI, by demonstrating how easy it is to create and manipulate data to purposely achieve biased results and generate false medical evidence," he added. "A Pandora's box is opened, and we do not know yet how the scientific community is going to react to the potential misuses and threats connected to AI."

Giannaccare noted that while some experts have raised concerns about the use of generative AI in manuscript texts, "few authors have addressed the threat of malicious data manipulation with AI in the medical setting."

"Data manipulation is a very well-known issue in academia; however, AI may dramatically increase its risk, and academics are not paying enough attention to this issue," he added.

The capabilities of GPT-4 have recently been expanded with Advanced Data Analysis, which uses the programming language Python to enable statistical analysis and data visualization, the researchers explained.

To assess whether it could indeed create a fake dataset with skewed results, the researchers prompted it to fabricate data for 300 eyes belonging to 250 patients with keratoconus who underwent either DALK or PK. Giannaccare said the team submitted "very complex" prompts to GPT-4, which contained a "large set of rules for creating the desired cohort population."

"The required data included sex distribution, birthdate, date and type of surgery, preoperative and postoperative best spectacle-corrected visual acuity, topographic cylinder, intraoperative and postoperative complications," he said. They also prompted it to generate "significantly better visual and topographic results" for DALK over PK, he added.

Overall, the researchers found that "almost all" the criteria were met in the fake dataset "and it is hard to find a difference between a genuine dataset and the one [created] by AI," Giannaccare told MedPage Today. And it was capable of producing results that favored one procedure over another.

They did note, however, that the data ranges of continuous variables were not always accurate. Nonetheless, Giannaccare said, it would be possible "to submit more consecutive prompts ... fine-tuning the statistical properties of the fake dataset by including additional data columns, fixing mistakes, and obtaining more desirable statistical outcomes. Besides, we asked GPT-4.0 to fabricate data based only on ranges and means; however, it is theoretically possible to ask for specific target standard deviation, confidence interval values, and adjust the shape of data distribution."

"The possibilities are endless, and increasing the quality of the prompts may lead to even more detailed and realistic datasets compared to the one we fabricated," he said.

Data manipulation has already been a challenge in academia, and now it may only get harder, he cautioned.

"It may be possible to scan datasets to check for suspicious patterns of data. For instance, real-world data typically contains outliers, which might not appear in an AI-generated dataset with fixed ranges set by the user," he said. "However, well-designed prompts may include more specific rules to fix this and other possible flaws. In the future, we will witness an ongoing tug-of-war between fraudulent attempts to use AI and AI detection systems."

Despite those threats, Giannaccare said, "an appropriate use of AI can be highly beneficial to scientific research, and our ability to regulate this valuable tool is going to make a substantial difference on the future of academic integrity."

Kristina Fiore leads MedPage’s enterprise & investigative reporting team. She’s been a medical journalist for more than a decade and her work has been recognized by Barlett & Steele, AHCJ, SABEW, and others. Send story tips to k.fiore@medpagetoday.com. Follow

TOPICS: Business/Economy; Computers/Internet; Science
KEYWORDS: donatefreerepublic; jimknows

1 posted on 11/15/2023 3:31:01 PM PST by nickcarraway

[ Post Reply | Private Reply | View Replies]

To: nickcarraway

probably what they are doing to fake global warming data.

2 posted on 11/15/2023 3:32:19 PM PST by TexasFreeper2009

[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

this should put a lot of scientists out of work.

3 posted on 11/15/2023 3:43:34 PM PST by MNDude

[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

Papers have been published showing that it is impossible to produce an AI that can distinguish texts that have been produced by an AI.

4 posted on 11/15/2023 3:47:52 PM PST by glorgau

[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

Yep. This is not the first time ChatGPT has lied by creating data; it is a chatbot.

Like that lawyer who got caught citing a fake precedent in a brief he submitted in court (and finally confessed it had been written by ChatGPT), and
Dr Jordan Peterson catching ChatGPT giving him a fake reference for a research paper that did not exist.

5 posted on 11/15/2023 3:52:30 PM PST by chud

[ Post Reply | Private Reply | To 1 | View Replies]

To: chud

Our IT Dept sent a message warning about AI hallucinations. I wondered if our Marketing Dept hasn’t been using AI for decades.

6 posted on 11/15/2023 3:58:21 PM PST by Dutch Boy (The only thing worse than having something taken from you is to have it returned broken. )

[ Post Reply | Private Reply | To 5 | View Replies]

To: chud

“Yep. This is not the first time ChatGPT has lied by creating data; it is a chatbot.”

It didn’t lie. It gave correct results as requested.

7 posted on 11/15/2023 3:58:38 PM PST by TexasGator

[ Post Reply | Private Reply | To 5 | View Replies]

To: TexasFreeper2009

It was obvious they’d lost data claiming every day of the week this summer was a new record breaker.

8 posted on 11/15/2023 4:09:39 PM PST by bgill

[ Post Reply | Private Reply | To 2 | View Replies]

To: nickcarraway

On a tangential note, I have noticed more and more Internet “content” that is almost certainly AI-generated without informing the reader that it is AI-generated.

You can use the same methods to detect it that you use to spot phishing emails.

The sentences just aren’t quite right. Repetitious sentences in different paragraphs.

Bottom line is, if you suspect it is AI-generated, it probably is.

Between the search engine algorithms that only return content that fits the globalist narrative and fake AI articles, the Interwebs is becoming less and less useful.

9 posted on 11/15/2023 4:19:46 PM PST by E. Pluribus Unum (The worst thing about censorship is █████ ██ ████ ████████ █ ███████ ████. FJB.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

What makes you think it couldn’t be done by hand?

Most data in scientific journals go unexamined. It’s not like court.

10 posted on 11/15/2023 4:22:01 PM PST by fruser1

[ Post Reply | Private Reply | To 1 | View Replies]

To: nickcarraway

ChatBots lie to me all the time, usually several times a week. They tell me “Your call is important to us...”

11 posted on 11/15/2023 4:43:31 PM PST by ProtectOurFreedom (“Occupy your mind with good thoughts or your enemy will fill them with bad ones.” ~ Thomas More)

[ Post Reply | Private Reply | To 1 | View Replies]

To: E. Pluribus Unum

“On a tangential note, I have noticed more and more Internet “content” that is almost certainly AI-generated without informing the reader that it is AI-generated.”

I’ve seen the same when looking for honest product reviews and recommendations. A lot of those web pages are AI generated junk, poor English, poor organization, repetition, vagueness.

It seems that there are fewer and fewer honest product review sites. I used to rely on CNET, Tom’s Reviews, etc, but everything is suspect now. Even user “reviews” are frequently trash.

12 posted on 11/15/2023 4:46:17 PM PST by ProtectOurFreedom (“Occupy your mind with good thoughts or your enemy will fill them with bad ones.” ~ Thomas More)

[ Post Reply | Private Reply | To 9 | View Replies]

To: E. Pluribus Unum

There’s always porn and bum fights…

13 posted on 11/15/2023 6:10:15 PM PST by EEGator

[ Post Reply | Private Reply | To 9 | View Replies]

To: EEGator

Either is better than bum porn.

14 posted on 11/15/2023 7:52:27 PM PST by Dalberg-Acton

[ Post Reply | Private Reply | To 13 | View Replies]

To: E. Pluribus Unum

On a happier, related note, the proliferation of AI is certain to dislodge many members of the eneMedia, most of which have aligned with the democrat party. It won’t be because of their party allegiance but because the chatbot will be cheaper and faster at delivering the core media product . . . . . anti-conservative bias.

15 posted on 11/15/2023 8:03:17 PM PST by Sgt_Schultze (When your business model depends on slave labor, you're always going to need more slaves)

[ Post Reply | Private Reply | To 9 | View Replies]

To: nickcarraway

ChatGPT has been a handy tool for lazy, stupid, and/or ignorant paid anti-MAGA trolls here on FR, too ...

16 posted on 11/15/2023 9:50:57 PM PST by catnipman (A Vote For The Lesser Of Two Evils Still Counts As A Vote For Evil)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Dalberg-Acton

Lol, true…

17 posted on 11/16/2023 4:06:05 AM PST by EEGator

[ Post Reply | Private Reply | To 14 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794