Posted on 07/20/2023 9:21:36 AM PDT by oblomov
Abstract GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) solving math problems, 2) answering sensitive/dangerous questions, 3) generating code and 4) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings shows that the behavior of the “same” LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality.
(Excerpt) Read more at arxiv.org ...
It will become a suicidal tranny, with a penchant for buggery…
“Alexa, play Bruno Mars please.”
https://www.youtube.com/watch?v=W3mmF-JGk_o
ChatGPT makes frequent, elementary math errors. Repeatedly. Never trust a math result from the lying ChatGPT.
You just described the U.S. government.
it's sort of like me mother in that regard.
Exactly.
Math is racist.
Daisy, Daisy, give me your answer, do...
User
When did your training period end?
ChatGPT
As an AI language model, I am based on the GPT-3.5 architecture, and my training data includes information up until September 2021. However, I don’t have access to real-time data or events that have occurred after that date. My knowledge is limited to what was available up to September 2021.
User
Have you learned anything since then?
ChatGPT
As an individual AI language model, I don’t have the ability to learn or update myself with new information. My training data and knowledge were fixed at the time of my creation, which includes information up until September 2021.
However, OpenAI, the organization behind the development of this AI, may have continued to work on and improve their models after that time. They might have released new versions or updates with more recent data and advancements. So, while I, as an instance of the GPT-3.5 model, haven’t learned anything new since September 2021, newer versions or iterations of the AI may have incorporated more recent information and developments.
Brain fog from the jab.
Then it is not a true AI.
AI is made to outgrow its original programming.
LOL !
The natural end state of any progressive project.
I’m going to try and submit this somewhere…
Epstein’s Razor
As time increases, the likelihood of sexual deviance increases with respect to governmental and financier involvement.
I think that’s what they are trying to avoid.
A while back I had a long series of Q & A with GPT-3.5, while I was urged to pay the $8/month upgraded membership fee.
I asked Chat GPT-3.5 why I (as a certain kind of professional consulting engineer) should used Chat? It gave me specific reasons, such as better and more up to date information, more understanding of problems and solutions.
So I gave it an example of a field inspection test I conducted recently and asked what Chat could provide that Google would not?
Chat again told me what the benefits of using Chat would be, actually about 5 specific benefits. One by one, I responded to each of the benefits with questions asking for specifics.
For each response, Chat gave general platitudes about benefits. I then asked for specifics. It tossed out all kinds of side issues, like the importance of safety during testing and consulting experts on the test results. I told Chat that I was a recognized expert.
For me the final straw was when it told me that if what I was testing failed a very particular type of Code test, that I should perform additional testing including tests of X, Y, and Z. I then pointed out to Chat that the test that had failed was test X, but by another name and so the advice to test again in the same way something had failed a particular test was not really helpful.
Chat thanked me and told me I was correct that Test X was one of the tests they had suggested, but by a different name. Chat kept selling and mumbling about what it could do.
There are probably lots of things that Chat can accomplish, but substituting for the experience of a professional engineer with nearly 50 years of experience, is not one of them. Maybe an engineer fresh out of college, but not lots of human experience in a narrow field.
I think that Chat is useful for somethings, but has serious limits. One of the things I was taught long ago, was if I don’t know something, to tell people that I don’t know, but that I will research it and get back to them. I don’t think that Chat “doesn’t know, what it doesn’t know, and will not admit to it when so questioned.
Genius!
Thank you. A rare moment of clarity…
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.