I'm assuming that you DID check whether the studies the LLM found/"found" used for its answers actually exist and the numbers in its model are in them?
My biggest practical problem with them (amazing as they are for *manipulation of symbols/words within their universe*) is their disconnect from reality; ie that they not only present data or sources that don't exist but have no idea that they are doing it. For me it happens ALL THE TIME, hallucinated quotes, papers and whole books, especially when there's nothing there in the world that would satisfy my query; but oddly, also even when there's no need.
So a tool that's uncannily, amazingly good at, for example, unpacking my stream of consciousness in a pseudo-therapeutic way (tho tends to fall into a infuriatingly "validating" mode unless it's corrected frequently) or drafting a rough outline of a piece of writing based on my ideas I "dictate", supplementing it with some extra content REPEATEDLY produces fake quotes from a text that exists in various version in public domain, or invents a one out of five sources. And when I present it with a thesis it will almost never argue against/challenge it, often hallucinating information in support.
I'm not even saying that they are not edging towards AGI. I'm saying that they have a massive "knowledge" problem. So, developing an ability to reason (or it's functional simulacrum) is impressive; but useful reasoning is truth conditional. Valid arguments are pretty, but we need sound ones. And models that confidently produce valid arguments that are based on false premises are a big problem. It's possible that the problem can be very easily rectified by directly feeding the model true data, of course.
o3 hallucinates significantly less often than non-reasoning models or than o1. I did check the numbers provided and they map to actual figures in the papers. You can see for yourself that it genuinely can locate exactly where I was in Guatemala. They make mistakes, but at this point, they make them less often than human researchers do.
They also make different sorts of mistakes to humans. Some things we find trivial they cannot do, and some things they can trivial we cannot do.
The Village was stunning for me to see. I need something like this to do my own version of forecasting research. I know you have written about aggregation of "expert" opinions/forecasts. I have been following the research comparing human experts with actuarial/algorithmic approaches for decades. Have read all the Good Judgement/Tetlock clan stuff. Interesting how some LLMs use MOE (Mixture of Experts). I am trying to do something in this area that nobody has done. Agents like this will help. I plan to clone manifold.markets as part of my project. Part of the project will test if people's predictions can be improved in a Real World setting. That's where the altruism comes in: saving people money by teaching them how bad the "experts" are and showing them how to do better. I'm a retired Psychologist. I want to publish something about this. If you were interested, I'd appreciate your collaboration, and how to contact you.
LLMs are the worst they’ll ever be today. In 90 days they’ll be better. In a year even better.
Maybe they won’t make it to “real” AGI, but they are terrific and useful even if they don’t. And maybe they’ll help us make progress with a different path to AGI.
So much complaining about them, but this is an exciting time to be alive. We are likely to experience real AGI in our lifetimes!
I'm assuming that you DID check whether the studies the LLM found/"found" used for its answers actually exist and the numbers in its model are in them?
My biggest practical problem with them (amazing as they are for *manipulation of symbols/words within their universe*) is their disconnect from reality; ie that they not only present data or sources that don't exist but have no idea that they are doing it. For me it happens ALL THE TIME, hallucinated quotes, papers and whole books, especially when there's nothing there in the world that would satisfy my query; but oddly, also even when there's no need.
So a tool that's uncannily, amazingly good at, for example, unpacking my stream of consciousness in a pseudo-therapeutic way (tho tends to fall into a infuriatingly "validating" mode unless it's corrected frequently) or drafting a rough outline of a piece of writing based on my ideas I "dictate", supplementing it with some extra content REPEATEDLY produces fake quotes from a text that exists in various version in public domain, or invents a one out of five sources. And when I present it with a thesis it will almost never argue against/challenge it, often hallucinating information in support.
I'm not even saying that they are not edging towards AGI. I'm saying that they have a massive "knowledge" problem. So, developing an ability to reason (or it's functional simulacrum) is impressive; but useful reasoning is truth conditional. Valid arguments are pretty, but we need sound ones. And models that confidently produce valid arguments that are based on false premises are a big problem. It's possible that the problem can be very easily rectified by directly feeding the model true data, of course.
o3 hallucinates significantly less often than non-reasoning models or than o1. I did check the numbers provided and they map to actual figures in the papers. You can see for yourself that it genuinely can locate exactly where I was in Guatemala. They make mistakes, but at this point, they make them less often than human researchers do.
They also make different sorts of mistakes to humans. Some things we find trivial they cannot do, and some things they can trivial we cannot do.
I agree entirely that they're getting better all the time. That's why I'm repeatedly surprised by hallucinated books and papers :)
The Village was stunning for me to see. I need something like this to do my own version of forecasting research. I know you have written about aggregation of "expert" opinions/forecasts. I have been following the research comparing human experts with actuarial/algorithmic approaches for decades. Have read all the Good Judgement/Tetlock clan stuff. Interesting how some LLMs use MOE (Mixture of Experts). I am trying to do something in this area that nobody has done. Agents like this will help. I plan to clone manifold.markets as part of my project. Part of the project will test if people's predictions can be improved in a Real World setting. That's where the altruism comes in: saving people money by teaching them how bad the "experts" are and showing them how to do better. I'm a retired Psychologist. I want to publish something about this. If you were interested, I'd appreciate your collaboration, and how to contact you.
Sure, get in touch with samueljnglover@gmail.com if you wanna talk - always open to new
ideas!
LLMs are the worst they’ll ever be today. In 90 days they’ll be better. In a year even better.
Maybe they won’t make it to “real” AGI, but they are terrific and useful even if they don’t. And maybe they’ll help us make progress with a different path to AGI.
So much complaining about them, but this is an exciting time to be alive. We are likely to experience real AGI in our lifetimes!
I wish things were slightly less exciting!