Rendered at 09:06:01 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
trostaft 12 hours ago [-]
Speaking as a postdoc in math, I must say that this is rather exciting. This is outside of my field, but the companion remarks document is quite digestible. It appears as though the proof here fairly inspired by results in literature, but the tweaks are non-trivial. Or, at least to me, they appear to be substantial to where I would consider the entire publication novel and exciting.
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
teiferer 2 hours ago [-]
> Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced.
And by opening the door to LLM-generated results, you'll see greater and greater amounts without any hope of ever navigating this field again without machine help.
It's a little like a software project which more and more gets extended by a AI agents with less and less review by human software engineers and in the end the complexity and spaghetti design are so incomprehensible by humans that the maintenance requires an AI agent. The risk is that math as a whole (the field itself) will experience that effect.
biztos 2 hours ago [-]
I'm no mathematician but it seems like if this happens, we get to a quite intriguing place as a species.
Say we achieve interstellar travel, but nobody actually knows how it works.
Or we cure cancer, but the "cure" requires a microrobotic implant, and it runs as a blackbox AI, and only the other AIs can make one, and there's no guarantee they will know how to make one tomorrow.
Or we solve global warming but it requires giant cooling machines running 24/7 and again, nobody knows how it works, but with the added bonus that the planet is cooked if they ever stop working.
wartywhoa23 2 hours ago [-]
Noble "save the humankind using the tech nobody fucking understands" textbook goals like curing cancer, solving global warming and achieving interstellar travel would always turn up when owners of trillions of dollars place orders on positive AI narratives, but in reality all of that will wither down to "It's what plants crave! It's got electrolytes."
ffaccount2 15 minutes ago [-]
>ve AI narratives, but in reality all of that WILL wither down to...
Looks like you're pretty sure of that. Every time I see argument like this delivered with confidence I wonder how is it different from, say, digital calculators. Or better yet, books - Greek philosophers moaned that young people will stop understanding anything and just check books when they want to know anything.
wartywhoa23 3 minutes ago [-]
> Looks like you're pretty sure of that.
Knowing the history of the humankind is what makes me pretty sure of that.
> I wonder how is it different from, say, digital calculators.
Did a single digital calculator stop any ongoing war, or liquidate a psychopath who orders people to go kill and die?
ocimbote 47 minutes ago [-]
That is fascinating how the more knowledge and reasoning we can get our hands on and actually produce, the higher the risk of us, as a species, to become actually much dumber.
It's hard to describe the feeling of seeing intelligence being delegated increasingly to AI. If that's not a pivotal moment, a revolution, I don't know what is.
wartywhoa23 41 minutes ago [-]
If you're not familiar with it, I recommend looking up the Taoist concept of overdevelopment. Sums it all up perfectly.
SyzygyRhythm 2 hours ago [-]
That's already how civilization works. There's no one person that knows everything about (say) modern food production, from top to bottom. If it ever stopped working (because too much knowledge was lost somehow), most people would die. And yet the system seems fairly resilient. Mostly, only local knowledge ever seems to be necessary to keep the whole thing running. Super-intelligence (or even just super-normal-intelligence) might expand the scope of what constitutes local knowledge but it will still run into limits somewhere.
lefra 27 minutes ago [-]
> There's no one person that knows everything about (say) modern food production
True, but it is possible to assemble a team of people that does, with backup for each person. There's also teachers and written knowledge to educate new team members. That's what makes it resilient.
I think that's a very different situation from what's decribed.
pishpash 16 minutes ago [-]
People still do grow their own food for self sufficiency. I am sure there will be luddites who live in self-sufficient communes like the Amish.
ben_w 6 minutes ago [-]
The first two are open enough that they may be as you say*, but we already know how to solve global warming, it's more of how much do we want to.
Green energy and transport technology is now at the point where people save the world and get rich trying, just as fast as they can build the factories.
Food's climate impact is harder, because the problem isn't technical, it's convincing people to give up beef (and other things, but mostly beef).
* quantum mechanics and general relativity are famously difficult to get to grips with
nashadelic 17 minutes ago [-]
I've been thinking about this and I believe the best place to be is a scientist who keeps looking at an AI's output, prods it in the right directions, verifies the proofs, fixes and fills gaps, takes the proof to production with safety, risks etc mitigated and then distribution with a company wrapped around the discovery. I think it wouldn't be black-boxed as much and will require a lot more understanding and reviewing to trust and productize it.
suncemoje 2 hours ago [-]
I think anything is and will be explainable. Like in the OpenAI proof, I’m sure they were able to understand the solution 100% and could even drill down and ask more clarifying questions to the model. After all, the point of science is so that knowledge can be made logically transparent. If something can’t be explained, it isn’t really understood yet — and the same applies to model outputs. The only question is how much effort it takes to surface the explanation.
fauigerzigerk 44 minutes ago [-]
I think explanation is itself a rather complex concept. At what point do we consider something as explained? Usually it has to do with identifying some causal factors and their relationships so that we can intervene and explore counterfactuals. But in many cases we are forced to act on the basis of incomplete explanations (e.g. in medicine).
I think there will be regulation that requires some users of AI to provide an explanation upon request. For instance, banks could be required to "explain" why you didn't get that loan. What if the decision is based on a credit score that includes some AI prediction that ultimately relies on the entire training corpus?
The bank can give you a list of factors that play into the decision but they may not be able to explain deterministically why a very similar customer did get that loan. At that point I think we're going to resort to statistics that prove a lack of bias against certain protected characteristics, but that's not really an explanation, is it?
I think we will never get useful and complete explanations for everything that AI does. Society will just accept some explanation-like thing or proxy and move on.
worldsayshi 2 hours ago [-]
And more intelligence should give an opportunity to increase explain-ability rather than just complexity. It can potentially explain the proof at the level of the listener. Make visualizations. Etc.
latexr 34 minutes ago [-]
> I’m sure they were able to understand the solution 100% and could even drill down and ask more clarifying questions to the model.
If they understood it 100%, what clarification is needed?
worldsayshi 2 hours ago [-]
Why can't we (or AI) invent ways to explain information that makes it much more digestible? And the solutions simpler?
Why is it necessary to continue to increase complexity when we get better intelligence? Can't we find more simple solutions? Or at least more explainable.
pishpash 11 minutes ago [-]
Is particle physics digestible even if it is explainable? Some things are not simple, cannot be not abstract, and will not be understood by most, or all, people.
loandbehold 21 minutes ago [-]
This is pretty much what underlies AI doomer argument of people like EY. Humans will gradually hand over civilization to black box AI they can't understand. As AI becomes more complex and powerful it will be harder and harder to control.
pishpash 14 minutes ago [-]
That assumes everyone will do so. Some people won't, and it's not clear you need a large number of such, a priesthood if you like, to survive as a species without AI.
f055 2 hours ago [-]
So I guess sci-fi movies were right all along. Nobody in Star Wars knows how hyperspace travel works, it just works. The little robots know everything but almost no human bothers to care. People just carry on with their bickering lives while the bots whiz in the background, and these robots are astonished at human inefficiency every single time, but rarely do anything about it. And people are still people.
cyclopeanutopia 1 hours ago [-]
That's only because movies like Star Wars are not sci-fi movies, but more like westerns in space.
ivell 46 minutes ago [-]
We don't know how many things in nature work. For example, we don't fully understand our own brain. As long as it can be replicated, we are fine.
In case of AI we have a better chance to understand what it is doing through chain of thought and explainability. Nature never gave us that..
Rebuff5007 1 hours ago [-]
"All models are wrong, but some are useful"
What your describing is already how a lot of science, technology, and engineering works!
latexr 36 minutes ago [-]
> Or we solve global warming but it requires giant cooling machines running 24/7
That’s not “solving it”, that’s putting a bandaid on it. Solving it would mean correcting the underlying issue to the point it’s no longer a problem which requires maintenance.
There are comments that truly reveal a future horrifying and true. Few of them. But I count yours among them.
But I’d argue also that airplanes already achieve this complexity to some degree as well as microprocessors.
darkwater 2 hours ago [-]
> as well as microprocessors
I mean, microprocessors have been on the "impossible to bootstrap from scratch in a short period of time" for 20 years already.
ajjl1011 2 hours ago [-]
[dead]
Certhas 1 hours ago [-]
The amount of papers produced passed the point of being digestible by humans a long time ago.
I do think we will need to find a way to get away from publishing papers. But I thought that before the AI came along and made mediocre papers something you can produce in a day. The academic system seems utterly incapable of self-correcting on this point though. We haven't even managed to get rid of for-profit publishers. So how this all will go down is anybodies guess right now.
xbmcuser 5 hours ago [-]
This is the main thing that I keep harping about that human knowledge is too vast today for a person or even a group of people and llm will change that many discoveries that require serendipity in the past will be more likely than ever
qnleigh 18 minutes ago [-]
Can you describe what the reaction to these results has been like in your department? Obviously many people are excited, but what else? How do grad students feel about this? Are any professors getting worried about becoming obsolete?
energy123 4 hours ago [-]
Terence Tao gave a recent talk about this issue (lack of attribution). He called it the decoupling of implicit and explicit goals. AI is only good at solving the explicit goals for now, and humans don't have the bandwidth or the institutions to know how to integrate AI into the field.
That is an odd summary of the talk. He was talking about how the explicit goal of solving a problem is kind of becoming trivialized, but the abundance of 100-page AI generated proofs will not help the implicit goal of furthering human understanding, because we lack the bandwidth to really digest them. Adhering to things like (human-focused) academic etiquette is a different problem and can probably easily be solved by just giving the model the right context. But having humanity keep up with AI insights into math and science is something we might have to give up eventually. Or at least whoever does will be far ahead of us as a society, because most people's lives will only be affected by the explicit results.
computerex 3 hours ago [-]
I feel like that’s already becoming true. I sometimes work on problems/projects where the AI agent is definitely more qualified than me to call the shots.
For example, this library here for deep learning is 100% ai generated and far beyond my technical capabilities.
I find AI a great scaffolding for improving understanding and mental models. BUT! It's all in how you use it.
energy123 4 hours ago [-]
An odd summary of a talk you didn't even listen to? He explicitly mentioned references and attribution as a special case of implicit goals.
sigmoid10 4 hours ago [-]
Do you really think these models lack the intelligence or language capabilities to handle human etiquette? They can't "read the room" yet because they lack modalities and people don't give them the right context. That's the issue. But I have no doubt that what you two describe here will be solved very soon. And yet the actual implicit goal of all this will need humanity to rethink its priorities.
inciampati 5 hours ago [-]
I am also using these models to accelerate scientific discovery. Yes, they are making all the difference at the frontier. At least, they feel they are. The messy thing is that we still need to communicate with each other and that's not getting dramatically faster or better. As you note the models need to be built so they do more work to participate in our communication economy. Or we will do so much, alone, to get nowhere fast because so much of our behavior is still bound up in old (good, tested, but clunky) ways of building shared knowledge.
bandrami 3 hours ago [-]
I am curious if LLMs are better at some kinds of problems than others. IIRC this and another big recent one were cases of the LLM producing a counterexample to a conjecture.
ricardobayes 59 minutes ago [-]
IMO, it's due to some problems being better documented, with more well-documented, previous research available. LLMs don't really create novel mathematics, they mostly "connect the dots". LLMs by design are not coming up with anything new, unless by statistical probability, aka "brute forcing".
I don't want to minimize LLMs capabilities, it's pretty cool they are doing this, and it's useful from a research point of view. But it's important to set expectations.
shalmanese 4 hours ago [-]
> But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced.
AI is going to both help and hinder this process though. At the end of the day, mathematics is mostly a social process at this point. The goal is not raw number of theorems proven, it’s how proving theorems affects the working operational models of mathematicians. Only a rare few new theorems in mathematics nowadays have direct real world applicability.
If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
xyzzy123 2 hours ago [-]
Weird question, do you think AIs might prove a lot of theorems that are mainly useful to other AIs (i.e, make nearly no impact on the human culture of working mathematicians), which then get used to prove results that humans do actually care about?
It seems like if AIs can prove and index a huge number of (largely uninteresting to humans) things there might be sort of "parallel cultures"? Big results are most valuable to humans and AIs both (most context efficient!), but a very large number of less general but still non-obvious results might be an effective approach to solving problems?
teiferer 2 hours ago [-]
> Only a rare few new theorems in mathematics nowadays have direct real world applicability.
Has this ever been different?
Math is abstract, rightfully so. It does not have to have direct applicability. Understanding builds over time and applications eventually follow. Number theory used to be a fringe "pure" theory field without applications for the longest time. If we'd only be interested in (and thus fund) what has direct applicability then society would be much worse off.
Side note: I recall my high school class mates rolling their eyes in every math class with "when will I ever need this in my life?" never asking the same question about PE or history or art classes. Now they struggle with their tax return and are routinely getting screwed over by loan sharks. But make no mistake, they can be proud of their A for hitting the goal 5 out of 5 times during soccer in PE class.
dyauspitr 4 hours ago [-]
> Only a rare few new theorems in mathematics nowadays have direct real world applicability.
I am no mathematician and very naïve about this, but in a world that is rapidly becoming extremely calculation and network dependent that sounds hard to believe.
> If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
I think the idea here is that all mathematicians will just be using AI for their future work so they don’t really have to absorb it as long as it’s in the training data.
mkl 3 hours ago [-]
> > Only a rare few new theorems in mathematics nowadays have direct real world applicability.
> I am no mathematician and very naïve about this, but in a world that is rapidly becoming extremely calculation and network dependent that sounds hard to believe.
I am a mathematician. It is true. The key is we're talking about new theorems, and direct, current real world applicability. Some theorems that have no applicability now may in the future, as theory often precedes applications by a long way and the usefulness is likely to come from other things built on top of the new maths, and a lot of pure maths will never have direct real world applications but contributes to our overall understanding.
3 hours ago [-]
adastra22 3 hours ago [-]
The key word in that sentence is “new.” New math is typically explored without expectation of practical use. There are exceptions, but it is generally true.
On the other hand, there are many applied mathematicians and theorists from other fields that mine new maths for applications to their fields. But they are almost always not the ones that come up with the new math.
Historically, of course, mathematics was always driven by the need to explain things. Many of the mathematicians from the 17th and 18th centuries were physicists (or, less commonly, engineers). But for the last hundred years or so that really hasn’t been the case.
isotypic 9 hours ago [-]
I cannot quite share your enthusiasm. The clearest analogy that I can think of to try to explain why I feel this way is that it seems there will eventually be a phantom textbook of all of mathematics contained in the weights of an LLM; every definition, every proof, etc; and the role of a mathematician is going to be reduced towards reading certain parts of this phantom textbook (read: prompting an LLM to generate a proof or explore some problem) and sharing the resulting text with others, which of course anybody else could have found if they simply also knew the right point of the textbook.
To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.
You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?
I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.
ted_dunning 8 hours ago [-]
In Erdös idiosyncratic nomenclature, all the best proofs are "in the book" and it was always a joyful thing to not only find a proof, but to find the proof that is in the book.
Who cares if it is God's book or the machine's Xeroxed copy?
xamuel 7 hours ago [-]
Long before Erdös, we had Plato and Socrates develop the theory of anamnesis, that there is no such thing as learning, but rather, whatever we supposedly learn, we actually remember (we knew it already and had forgotten it). Presumably this should be understood only of universal facts (like mathematics), not contingent facts (like who was the president of the U.S. in 1950).
teiferer 2 hours ago [-]
Remember from ...when?
xamuel 2 hours ago [-]
Before birth. ...Hey, don't point that pitchfork at me, point it at Socrates. In his defense, that kind of does describe when LLMs acquired their knowledge (if we consider "birth" to be the moment when the already-trained weights are sent to the GPU) https://en.wikipedia.org/wiki/Anamnesis_(philosophy)
isotypic 6 hours ago [-]
I mean, my reaction to God coming down and saying they were bored of being God and instead they would just sit around and answer all of the mathematician's questions would largely be the same, so yes, who cares if its God's book or the machines Xeroxed copy?
"The Book" is more interesting to me if I am the one coming up with the ideas to fill it in. Maybe this is a bit egotistical, but I'd like to think it is allowed to have a desire that you, personally, are contributing to something in a meaningful way. Like, if you are on a sports team, it'd be more fun to win a game if you were on the field than if you were benched, and I think that's okay. And ultimately I don't find dredging for proofs from an LLM particularly meaningful, nor do I see it as a particularly personal contribution, as anybody else could have done the exact same thing with the same prompt.
This isn't to say I wouldn't love to read the proofs in "The Book" for problems I care about, I just think I'd eventually get bored of only reading. And so its hard to be enthusiastic when this book is being built through an LLM.
energy123 4 hours ago [-]
If ASI does create an abundant future I think many are going to have that familiar listless feeling of enabling cheats on a computer game and all the mystery and fun is gone.
Technology in general (smartphones, social media, search) even without AI is creating this feeling, as it shrinks the world and makes it less mysterious.
It's worse than boredom it's more like nihilism.
Then when you strip purpose and meaning from a human you get something very bad, despondency being the best case outcome.
qnleigh 4 hours ago [-]
> it'd be more fun to win a game if you were on the field than if you were benched
This is a good analogy for AI work displacement. Probably would resonate with some of the college students who boo'ed Eric Schmidt.
qnleigh 5 hours ago [-]
I want to push back against the notion that the math already exists in the weights, both in the practical and the philosophical sense. The LLM had to do an enormous amount of computation to find the counterexample. We know it wasn't looking up the answer from its internal representation, because the conjecture was unproven. The proof came into being when the model output it, and if they'd run it for less time or asked it something else then the conjecture would still be unsolved.
I'm also afraid of a world where AI completely replaces human mathematicians, but if we remain collaborators, then that's a world I can still feel excited about.
k_roy 8 hours ago [-]
And you just expressed the thoughts of every engineer that writes code for a living who is either left behind, or embracing the technology to hit KPIs and QVRs.
BobbyTables2 7 hours ago [-]
It’s funny because the shift from handmade goods to automated factories didn’t seem so bad. Same for mechanized farming instead of mules and people.
Shifting from “human calculators” to machines for arithmetic is also hard to argue against.
I think what makes the AI transition difficult is it impacts a wide range of high-value activities that would have been implicitly assumed to always remain human.
I do have great trouble seeing how a pile of matrices is ever going to be capable of innovation. Maybe with sufficient entropy and scale, it will… The day that becomes practical will be a turning point in history.
Economically, goods and services are often priced based on labor/“value added” aspects. Lawyers’ fees aren’t driven by paper costs! If AI takes a huge bite out of intellectual labor, the future could become very different…
BTW, your book description reminds me of the 2025 movie “A.I”. I thought it was quite good.
kaashif 7 hours ago [-]
There isn't anything functionally special about the human brain - why is there some reason to expect the human brain is capable of innovation but no program, even one far more powerful than the brain, is not?
You admit this possibility so I'm not arguing with you, but it seems far more plausible to me that we can build something better than the brain.
In the limit we can just grow brains and put them in computers anyway, then the debate is moot. That's a really hard problem but of course not physically impossible.
naasking 8 hours ago [-]
The cool thing about LLMs is not only might they be a database of all mathematical theorems, but they can also apply those ideas to the problems you're trying to solve, which is exactly what you said you're interested in. Not sure why you lack enthusiasm.
isotypic 6 hours ago [-]
LLMs applying the ideas to problems I'm trying to solve is exactly what I said I wasn't interested in, actually. Because the LLM doing this for me reduces back to me simply reading from the textbook, only now I have no problems I'd be interested in applying things to since, again, they're already in the textbook.
colordrops 3 hours ago [-]
Maybe I'm misunderstanding how these models work, but isn't it more the responsibility of the harness and its prompts rather than the model itself to make sure that a result is generated with explicit sources?
PaulRobinson 3 hours ago [-]
Probably.
"All" a model is doing is predicting the next words, based on the statistical distribution of words it has seen similar to the ones read/produced so far.
We push a model towards a particular set of distributions through context. If I ask a model "What is the capital of France?", there is a non-zero chance it goes down the dad joke answer of "The letter F". The far more likely option is "Paris", because the joke appears much less often in training material, but if I wanted to be absolutely sure of getting a consistent geography answer I'd address that with additional context. We can add context via prompts, RAG, agents, skills and so on.
However, when training a model, we select the material. We could show it a lot more geography information (or dad jokes!), and skew the statistical distribution in the direction we wanted. We could also decide to design the system prompt towards the direction we prefer - which the user would interpret as "the model" - and so nudge the context model-wide. We can also construct the interaction to iterate on context with a specific framing and call it "reasoning".
In this specific example, you could therefore solve the problem by a) training skewed towards mathematical papers, which likely degrades performance in general and likely for the specific case too, b) train the user to provide better context/prompts for mathematical work, shifting the workload to them which feels very "a la 2024", c) publish agents and skills that are tailored to mathematics work (very "a la 2026"), d) tweak the system prompt for when the model is doing mathematics work, which the user would see as "the model" doing the change, but you and I might look under the hood and say that is in the harness or a specific type of prompt, or e) add "reasoning" execution that is set to focus on mathematical formatting, or f) a mixture of the above.
Right now we're probably looking at agents and skills. I think over time we're going to see smaller models targets towards domains with a mixture of all of it, where some of this sits at user configurable levels, and some is "baked in" via training, system prompts and execution modes, but from a user perspective it's all just "the model".
peepee1982 3 hours ago [-]
I don't think you are misunderstanding how models work, but I think the parent comment meant that the training of the models should push them to include attributions in their native output so they will more likely do so without reinforcement through the harness.
doctorpangloss 5 hours ago [-]
> Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
Always, always always, the problem with research and development is leadership, not insufficient supportive technology. It is a political problem, there is absolutely, positively no shortage of technologies to support research. Your optimism is totally misplaced. The NSF funding cuts have negatively impacted math more than AI has benefitted it. And guess who supports the administration that cut NSF funding? The people who ousted the PhDs from OpenAI.
virgildotcodes 4 hours ago [-]
I think we’re looking at a new class of wonderful machines that can potentially make meaningful contributions to the sciences and maybe even humanity as a whole, in addition to far more insidious and destructive capabilities.
You are right to point out that the ones who fully own and pilot the machines all belong to the “fuck science and humanity as a whole” group. So the likely outcomes don’t look good.
Echoes the early promise of the internet vs the eventual state and consequences of it, although seemingly primed for far more dire and deeply penetrating consequences.
diordiderot 4 hours ago [-]
Not in academia, but the amount of crying over rapid technological and intellectual progress because you're not getting credit validates everything critics say about you.
No interest in human advancement, just attribution.
virgildotcodes 3 hours ago [-]
I’m not in academia myself, and I think AI solving all our problems ASAP is ideal, even if it means no humans get attribution.
What I’m saying is that the ultimate goal of those in power are not these sorts of altruistic or even scientific pursuits, and that the massive labor disruption and hyper concentration of power in the hands of those who are proving time and again that advancement of science and benefiting the whole of humanity are actually antithetical to their goals is likely a bad thing.
diordiderot 2 hours ago [-]
Oh good. But I think you're over estimating the 'concentration of power', and under estimating 'benefiting the whole of humanity'
Most homeless people have smartphones, and consistent access to food and clean water.
Your average 'poor person' in America has HVAC. An unimaginable luxury in the EU
wartywhoa23 55 minutes ago [-]
Does your average poor person in America have happiness after 100s of years of relentless technological progress for the benefit of human advancement, and especially today, in the age of becoming spaghettied into the AI event horizon?
mmnfrdmcx 2 hours ago [-]
Unimaginable luxury, what are you in about. Have you ever even been outside of US?
inglor_cz 8 minutes ago [-]
"An unimaginable luxury in the EU"
Eh, don't be silly. In the places where the summer is hot enough (or, more precise, where it used to be hot enough), I have seen plenty of AC units on shabby buildings, even on old Commie apartment blocs in Romania.
AC is not that expensive.
doctorpangloss 4 hours ago [-]
> I think we’re looking at a wonderful machine that can potentially make meaningful contributions to the sciences and maybe even humanity as a whole.
That's true. But. Maybe you've seen the Oppenheimer movie, there is a moment where Oppenheimer shakes Teller's hand, basically after the guy ruins Oppenheimer's life in a completely immature betrayal. That's what people are angry about, the academy community is Oppenheimer's wife asking, why the fuck did you shake his hand?
At least regarding leadership and funding, I don't know if it's a matter of likely or unlikely outcomes. It's just facts: these guys are collaborators. The commenter might very well have zero graduate students starting next year. What pisses me off is the utter obliviousness that STEM people have about how deeply political their work is.
And perhaps this is the real reckoning for the mathematics community. Not the possibility that AI is going to replace their jobs, it's not going to do that. But that having these intensely myopic and disagreeable personalities mean that basically zero leadership skills have been nurtured in the mathematics community. You cannot name a single politician who is a mathematician. You have to be elected to have power in this country, it's that simple, there are way more billionaires than there are presidents! Leadership is far more scarce. So that's why these disputes matter, and while it's great that people engage on Hacker News about it, it's intensely disappointing that "reduced science funding is really bad" gets downvoted.
That is a result of Hacker News's emphasis on this very 2010s view that it wants to be a place where the math nerds gather (in @dang's words) - he doesn't get that the quality of the discourse was caused by great leadership at many political and academic levels. Nobody credits how much better leaders were during Y Combinator's biggest success stories, or how much we overvalue the intellectual powers of math because it makes money as opposed to enlightening our view of the world.
fleroviumna 4 hours ago [-]
[dead]
umanwizard 11 hours ago [-]
Why would it excite you, rather than terrifying you? The better LLMs get at math, the closer the expertise you spent your whole life building is to being worthless.
Along with all the rest of what humans find meaningful and fulfilling.
trostaft 8 hours ago [-]
I spent years grinding to learn mathematics because it was the language I needed to solve problems that excite me. If the tools I need to do so change, I can change too. Research training is not so rigid that it can only applied to the single set of skills I developed it in the context of. I can learn this too.
Moreover, truth be told, I don't really see myself doing any less math and requiring less from my skills. At least from the moment I've begun incorporating LLMs into my research workflow to now, the demand I've had from my own skills has only grown. At least in an era prior to Lean formalization.
doctorwho42 5 hours ago [-]
What about the future mathematician's yet to be born?
cman1444 8 hours ago [-]
Because for many people who pursue these fundamental truths, the reward is not necessarily personal fame, fortune, or even personal understanding. Advancing humanity's total knowledge (even if that knowledge is by proxy through AI) is reward enough.
mathgradthrow 7 hours ago [-]
I think when your work is no longer required, you will probably come to regret this sentiment, not that it matters.
fartfeatures 7 hours ago [-]
I think by that point humanity will having some pretty fundamental discussions about the nature of work and money.
doctorwho42 5 hours ago [-]
At that point, if AI can do 75-99% of what you do... Why should anyone pay you to live/survive?
Humanity is having those discussions, heck you are in one RIGHT NOW not some Hollywood future.
What is coming of those discussions is the ownership class balks at the idea of raising their taxes (see recent interview with bezos), and therefore balks at the idea that you or I should have any value beyond what we produce... And if AI can replace you or I, well how do we survive if we can't produce in a technological society?
salawat 7 hours ago [-]
I think you are blinded by an unprecedented optimism the rest of us simply cannot afford to entertain.
Enginerrrd 6 hours ago [-]
Go ahead and have that conversation with the billionaires running a worldwide satellite grid of data centers to power their AI surveillance dragnet and autonomous robot soldiers. See how far it’ll get you.
necovek 5 hours ago [-]
If they don't have millions/billions of customers to spend koney on whatever they are selling, their riches become irrelevant too.
Money is valuable only as it changes hands for goods/services, and if you want to get rich, on top of having/producing/controlling something everybody desires, you also need as many people as possible to have money to give you in exchange for a piece of that something.
ninjagoo 6 hours ago [-]
> I think when your work is no longer required
i wonder if this is physically/mathematically impossible: the mere act of living involves processing energy, and therefore doing work :)
And there is a lot of energy to be processed in this Universe before the heat death...
doctorwho42 5 hours ago [-]
If you can reach it... The universe is expanding, and matter is being dispersed by both that and other forces.
Mind you, there are places in the universe that we have no way of knowing ever existed... The non-obserable universe if you will. For when physicists talk of the observable universe, it is only the fraction we have any chance of receiving data/light/radiation of/from
seanmcdirmid 6 hours ago [-]
Scientists think differently from craftspeople. They want to know the unknown, using any tool they can get their hands on.
wartywhoa23 44 minutes ago [-]
> using any tool
This "any" shines like a thermonuclear fireball.
antonvs 7 hours ago [-]
There's an unstated assumption there, which is that you'll have some reason to continue to want your work to be required.
In the (probably unlikely) event that AI use results in a post-scarcity economy in which there's no need to work to survive, a lot of people wouldn't regret sentiments like the ones in question.
On the contrary, it would mean they could work on whatever they please, including potentially standing on the shoulders of giants - the AIs - and seeing even further.
If we actually worked to create a society that work for the benefit of all its members, there would be a lot less reason to worry about developments like these. Much of the worry arises because for various reasons - none of them really good ones - we've ceded control of these developments to the people least suited to manage it.
doctorwho42 5 hours ago [-]
And how do you see us getting from what we currently have: a working class and capital/ownership class, where a vast majority of society is required to work 40+ hrs/week to sustain their ability to live.
To a society that provides a livelihood to all humans, equally?
For, I would love to hear how we get from here to there during an era with the largest wealth disparity ever seen in human history. (Yes, it's worse than the robber Baron era of US history). For I have yet to see any signs that the capital/ownership class has any intentions other than vacuuming up even more wealth and power for themselves. And that anathema to your desired outcome.
antonvs 4 hours ago [-]
Part of my point is that this helplessness about the expected outcome is a choice. If everyone is sitting back waiting for "signs", nothing will change for the better.
History is full of examples of situations like this being corrected, at least to an extent. If we learn from those, we can do even better next time around.
This demonstrates a point that should be obvious, that better societal choices can produce better outcomes.
retardkiller 8 hours ago [-]
[dead]
catigula 7 hours ago [-]
[flagged]
btown 7 hours ago [-]
The parent poster isn't saying "advancement of knowledge" is some kind of universal goal for humanity at the cost of all else - and I would agree that it shouldn't be. They're suggesting that as an individual studying pure mathematics, the discovery of new truth is a self-consistent good.
Even taking a purely Kantian interpretation that would scale this beyond mathematicians - and that itself is a logical leap! - making a universal law out of "a discovery can be beautiful regardless of whether created by humans or AI" is is much more specific than the straw extrapolation you've created.
ssalka 7 hours ago [-]
You can make literally any position sound awful by saying that orphans will be killed as a result. Let's try to think posts through.
antonvs 7 hours ago [-]
They didn't say "advancing human knowledge regardless of the cost". That's a conclusion you jumped to because of your biases.
"Let's try to think posts through."
kaashif 7 hours ago [-]
Have you considered that utilitarians actually exist?
If 20% more medical knowledge would save more lives long term, there are actually people, probably some browsing this website right now, maybe the person you're responding to, that actually think killing people up to the expected number of lives saved is justified.
I would personally call that evil, but it is thought through.
dekhn 7 hours ago [-]
At least from my perspective, these sorts of tools could have the possibility of allowing us to reach post-scarcity (I guess a skynet future is another possible outcome, as is just grimdark industrial hell). If we reach that point, then anybody could (in principle- in reality utopias don't exist) pursue anything they wanted.
This is just an application of the philosophy "automate yourself out of a job every 6 months"- I've been doing that for a long time, and the outcome is generally a more interesting job.
doctorwho42 5 hours ago [-]
But that hasn't been done at scale... If everyone automated their job every 6 months, then millions would be out of work and starving.
krackers 8 hours ago [-]
If one only found meaning in life through external factors like work (no matter how "intellectually rewarding") then it seems like a life destined for eventual disappointment.
piloto_ciego 6 hours ago [-]
So, I've seen this mindset a lot lately...
The answer is that we simply need to decouple the "right to exist" from "worth."
You should have the right to exist and explore the world simply because you're human, not because you can use your skills to provide some sort of transactional value to someone else. Deprogramming so many people is going to be hard...
wartywhoa23 26 minutes ago [-]
All sane and noble in theory, but in practice, how do you see that happening?
Let's start with the first practical step: how do you dethrone the psychopaths in charge of the world who own about everything on Earth and have all the world's lethal force in their pockets?
27 minutes ago [-]
ted_dunning 8 hours ago [-]
Does it terrify you to look at children?
Not so many years from now, some of them will surpass you. A few years after that all (that survive to that point) will surpass you.
Does that terrify you just as much?
wartywhoa23 24 minutes ago [-]
The seeming sincerity of your question in the conext of comparing children to AI is what really terrifies human beings.
doctorwho42 5 hours ago [-]
AI is not a living or conscience entity, no matter what the hype men are selling society.
A child is a living, breathing, growing, and changing conscious entity. It is the natural order for the young to supplant the old, no matter what the politicians and billionaires desire.
"AI" - terrifies anyone who understands the pact our society rests upon: that labor is valued and can be exchanged for goods and services to survive. Thereby enabling a person to support their families without having to do everything themselves.
If AI replaced a noticeable fraction of society, destroying their capacity for work. That threatens and ultimately blows up this compact between working class and capital class... With it, the foundations of a modern technological society.... It may sound like hyperbole, or some fantastical prediction. But really it is basic economics, like econ 101... And personally the last few years have terrified me, not because of AI directly, but because how ignorantly blind many smart and tech savvy people are... You are marching us to collapse with a smile on your face...
wartywhoa23 16 minutes ago [-]
Ice-nine was no fiction.
IAmGraydon 7 hours ago [-]
That’s kind of a strange comparison. It’s the natural order for a population to thrive, reproduce, age, repeat. I’m not taking a side on the original comment, but the idea of human skill being completely supplanted by AI is not the same thing as having children and getting old.
retardkiller 7 hours ago [-]
[dead]
qotgalaxy 7 hours ago [-]
[dead]
xamuel 7 hours ago [-]
Mathematics is a bridge to what Neoplatonists call the intelligible world. Currently, mathematicians navigate that world on foot. It's exciting to think that soon we might have cars and trains in that world so we don't have to painstakingly walk everywhere.
energy123 4 hours ago [-]
In a way, young people have an advantage over middle aged people. I've spent countless hours as a middle aged person learning skills that are now useless. Better to be a young person than a skilled artisan during the Industrial Revolution even if there's uncertainty.
ninjagoo 6 hours ago [-]
> the closer the expertise you spent your whole life building is to being worthless.
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
PS: I think you're being unfairly downvoted. Your question is not invalid and deserves responses, not downvotes.
wartywhoa23 11 minutes ago [-]
These all are valid, noble points I also used to brood about while being young and financially supported by my parents.
thegrimmest 7 hours ago [-]
Many of us don't do what we do for our expertise to be recognized or valued by others, rather that is a pleasant side effect. Many of us do what we do for intrinsic reasons related to the nature of the work, and would likely do it for free, or indeed, would pay for the opportunity. Many STEM-types are in this category, and as such, are compelled to continue to tinker as we fancy, and are glad for more tools to help us expand the breadth of our tinkering capabilities.
A dedicated engineer is always looking to automate themselves out of existence, so that they can move on to the next thing to automate. Ongoing repetitive work is less engineering and more akin to toiling on a line.
CamperBob2 10 hours ago [-]
What's happening is the verbal/linguistic equivalent of the invention of calculus. No intellectual field will ever be the same again. Who wouldn't find that exciting, and want to experience it?
xpct 6 hours ago [-]
I don't think change is inherently exciting.
CamperBob2 4 hours ago [-]
Maybe plumbing, masonry, or mining would have been a better career fit, then. Tech isn't for everybody.
rogerrogerr 10 hours ago [-]
People who enjoy thinking. Ya know, the "intellectual" part.
aroman 10 hours ago [-]
This is the beginning of thinking, not the end...
windexh8er 6 hours ago [-]
It depends. If you are in a disadvantaged class it is very likely going to err towards a dismal result long term. However if you are a privileged intellectual these models can accelerate and expand your horizon. It isn't the end, surely. It is, however, both impressive and depressive simultaneously and that perspective only depends on your point of view.
doctorwho42 6 hours ago [-]
But when the bar to entry is beyond expertise in a field or subfield, how does an individual ever hope to attain an unexplored space to explore?
It may be the beginning of thinking, but to many who view things on a longer timeline. It starts to look like it will breakdown the frameworks of which are required to get to that position. Otherwise, you just end up retreading explored ground. This removing the joy of discovery from any humans hand/mind.
8n4vidtmkvmk 5 hours ago [-]
Recently I've found my mind reawaken. It's about asking good questions now. The models can find the answers, but you have to know what to ask. Sometimes the model is wrong and you have to challenge it to find an alternative. Being able to explore problem spaces quickly is interesting.
fartfeatures 7 hours ago [-]
Why would having more thinking companions stop you from thinking? Knowledge compounds.
mlcrypto 8 hours ago [-]
The so called "progressives" prove that they were the same ones crying after the printing press, automobile, calculator, washing machine, etc
ted_dunning 8 hours ago [-]
You made up a group in the past and you made up things they say and then draw the inference that a different group in the present is somehow morally disadvantaged by obvious inference.
Perhaps your name-calling is not actually as logically grounded as you think. It definitely seems to depend on unfounded leaps.
umanwizard 10 hours ago [-]
I'm not sure I grasp the analogy to the invention of calculus. Calculus helped us solve new and interesting math/physics problems. Repeated for emphasis: helped *us* solve.
This technology is solving interesting math/physics problems for us, which is completely different.
xamuel 7 hours ago [-]
Before the discovery of the fundamental theorem of calculus, enormous ingenuity and whole careers were spent doing calculations which the fundamental theorem trivialized. To be clear, I'm not just saying that the people involved were doing lots of mechanical arithmetic (though they did that, too). I'm saying they did creative, inspired, nontrivial mathematics to calculate certain things, all of which was then trivialized and made obsolete by the fundamental theorem of calculus.
CamperBob2 5 hours ago [-]
After Newton and Leibniz, math did things nobody thought it could do. After Vaswani et al., language does things nobody thought it could do.
Quentak 56 seconds ago [-]
I'd like to know how many tokens in total went into solving this problem. Have they talked about this? It matters whether they got this result in 10 million tokens or 10 billion. Whether it's closer to 1 human working on this for 1 year or 1000 humans for 1 year. The news feels different when the probability of one AI run solving this is 1 in a thousand vs 1 a million. Approximately I'm asking about the amount of money it cost to solve it, which has to include the failed parallel runs.
cpard 12 hours ago [-]
The proof brings unexpected, sophisticated ideas from algebraic number theory to bear on an elementary geometric question.
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
rjzzleep 6 hours ago [-]
What you describe here has always been true in all sciences, but also in medicine. But both modern engineering and education runs completely counter to this. You are encouraged to stay in your niche and never look out. People with vast interested are filtered out by hiring managers.
So the crossdomain pollination that used to exist in scientists is not only not encouraged. It's also actively punished by society.
hn_throwaway_99 5 hours ago [-]
> But both modern engineering and education runs completely counter to this. You are encouraged to stay in your niche and never look out. People with vast interested are filtered out by hiring managers.
Can you explain more what you're referring to, because this has not been my experience at all. Heck, when I went to college, cross disciplinary majors were all the rage.
I think the thing that is just factually difficult is to actually become skilled in multiple different domains, precisely because the level of study/practice/rehearsal to become proficient in any individual domain keeps going up.
A long time ago you could be a Renaissance man by essentially dabbling in different fields. But today, as this article points out, you need extremely deep expertise in any one area just to understand the status quo - this proof required extremely deep expertise in two separate areas that mathematicians were surprised to be related at all.
You are making a great point here. I think it’s not just the amount of information and complexity of the domains today, it’s also human nature and emerging politics too.
nashadelic 12 minutes ago [-]
There are so many research papers; just finding a solution to, say, a bio problem in a deep math paper would be a gold mine of opportunity. Very exciting times!
freakynit 4 hours ago [-]
Many breakthroughs come from taking an idea from one field and applying it somewhere else. But, almost every serious field is now so deep/complex/huge that humans rarely get the time, or even have enough practically useable memory, to understand and correlate multiple unrelated areas properly.
And this is where machines, such as these reasoning LLMs, can help. Because they can remember patterns across many domains and try absolutely bonker weird connections and ideas.
We, the humans still have to verify the work (at least as of now). But, the "maybe this tool, or idea, or trick, from that completely unrelated field applies here" reasoning/experimentation could become much easier.
I have always said this and will say it again: reasoning is just experimentation with a feedback loop and continuous refinement.
doubledamio 12 hours ago [-]
I’ve always been skeptical about the role of LLMs in mathematics, but this is the first time I’ve seen this argument, and I actually find it very compelling. Maybe LLMs will help us develop more horizontal understanding of the field.
cpard 11 hours ago [-]
It's up to us I think. We can use LLMs to generate web pages in candy crash style and end up dumper by outsourcing thinking to the machines or we can use it to expand our cognitive capabilities.
What makes me more of an optimist in this case is that people who today decide to go into these sciences are mostly people who are driven by intellectual activity so I feel they are the right ones to figure this out, probably more so than us the engineers.
brookst 6 hours ago [-]
The “we’s” are different. Some of us will use AI to replace human relationships and our own decision making, others of us will use it to make amazing art and invent new things.
Ar-Curunir 10 hours ago [-]
Unfortunately, LLMs might lead to the demise of the primary institution that allows for people that are in it for the love of intellectual activity to do that activity, namely research universities. Certainly the people proposing the tech are quite opposed to the modern university.
theendisney 6 hours ago [-]
What little intelect we have can be directed to other parts of the vast endless ocean of unknown things.
I hear some specialists (specially multi-disiplinary ones) write things they know few or no one can read. (Which is the most ironic reason for being rejected by a journal)
I recall a funny moment on irc where a truly helpful guy moaned that no one helped him when he had a (programming) question. He was very good at many programming languages and worked in some mix of high level physics and mathematics. He posted SO questions that rarely got an appologetic response from someone able to understand the code and the physics but couldnt wrap around the math. lol I hope he finally gets some help with his wizardry.
IAmGraydon 6 hours ago [-]
You’re making some generalizations here, but I do agree that one of the primary dangers of LLMs is destruction of institutions of higher education. If thinking power becomes cheap, who will pay the money that universities demand?
keyle 10 hours ago [-]
I think you're on point, and you've explained it very well.
As we're becoming hyper specialised, they become an invaluable tool to merge the horizon in, so to speak.
cpard 9 hours ago [-]
I think traditionally engineering was supposed to be the discipline that brings the breadth that science has to give up. At least that’s how I rationalized the pain I had to go through in college studying EE.
I don’t think that this model works anymore though.
Also, I love the expression “merge the horizon in”. Being a non native speaker of a language is so nice some times. Thanks!
dhosek 5 hours ago [-]
One of the challenges I had in graduate mathematics was just trying to keep all the concepts in my brain. It doesn’t help that you end up with things like homomorphism and homeomorphism tangling one’s brain thanks to their superficial similarities. Heck, just keeping track of basic theorems and definitions is a challenge.
margorczynski 11 hours ago [-]
Yep. The thing is people (maybe because of our limited scope) just focus on the depth and not the breadth. Because this is a general purpose model - it also has PhD+ knowledge in Physics, Biology, History, etc.
I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.
cpard 11 hours ago [-]
there's so much opportunity on the breadth of things too! I think that you end up having different people focusing on different things though.
Personally I'm a more of a breadth person and I could never compete with peers who where more of the depth type of person at college.
But I get satisfaction from connecting things that feel irrelevant on first sight, that's what drives me.
piloto_ciego 6 hours ago [-]
This is me too.
efavdb 9 hours ago [-]
It’s as if the body of human knowledge is our I’ve mind. It used to be expensive to access that, but no more.
Cool thing is now when someone contributes something to the hive mind, it can instantly be applied to any other problem people are working on.
psb5 6 hours ago [-]
Check out Ashby's Law of Requisite Variety
make3 5 hours ago [-]
To me, AI feels like the morbidity of Star Trek teleportation, where it's actually copying the person at to the other end and zapping the original one out of existence. The original human never benefits from the fast transportation.
Similarly, we're creating tools to improve knowledge, but we're progressively zapping the human out of the equation. Knowledge is created for something, but it's unclear if very soon humans will be able to understand it, or really benefit from it, except billionaires, etc.
It's too bad that we're not improving humans nearly as fast as we're replacing ourselves.
Nesco 3 hours ago [-]
You lost me at “except billionaires”. I don’t see how Jeff Bezos benefits from this one much more than let’s say Terence Tao.
Can a tech news stay a tech news, without getting bombardes with leftist subtexts all the time?
SpaceNugget 2 hours ago [-]
Beverse they are benefitting from the financial situation of owning the ai companies that are getting pumped massive amounts of money, not from the debated usefulness of the output of the LLMs.
luk212 4 hours ago [-]
[dead]
aaron695 33 minutes ago [-]
[dead]
lesostep 40 minutes ago [-]
I am cautious about AI "discoveries" after Mythos paper.
What was the process of a writing a paper? Was the question asked by a mathematician? Was the paper right from a get-go or was there someone who pointed out mistakes?
How much attempts were made before solution was found?
I will eat my words if an AI oneshotted that one without any external help, but for know I am left wandering whether it's a new way to attribute discoveries to companies instead of people who put the work in
andy12_ 6 minutes ago [-]
> Was the question asked by a mathematician?
As per the report, the prompt used to solve the problem is AI-written and the solution was initially graded by an AI grading pipeline. They don't say this explicitly, but it seems like OpenAI has an automatic pipeline where they prompt models for solutions to famous math problems (which wouldn't be unexpected given how flashy a solution to a famous math problem looks)
> Was the paper right from a get-go or was there someone who pointed out mistakes?
Also as per the report, the output of the model isn't really a "paper"; it's a very terse 2 page solution which is apparently correct. The paper was later written based on this solution to make it more presentable.
> How much attempts were made before solution was found?
Given that this appears to be from an automated pipeline, I would say that it had many attempts. But either way, the blogpost says that with enough test-time compute, the model finds this same solution 50% of the time.
I'm also wondering about the process. What was the prompt, what they fed into the model, what it was trained on, etc. The article reads like a marketing post.
Nevertheless new maths is exciting and might lead to what I find slightly more interesting - new physics.
vatsachak 14 hours ago [-]
As I have stated before, AI will win a fields medal before it can manage a McDonald's
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
trostaft 13 hours ago [-]
> A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
vatsachak 12 hours ago [-]
Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean.
trostaft 12 hours ago [-]
Arguing similarly to how stockfish, the chess engine, trains I would not be surprised if this is more common in the future. I don't know if they use any proof verification tools during their reinforcement learning procedure right now, as far as I know they've been focusing more on COT based strategies (w/o Lean). But I'm hardly an LLM expert, I don't know.
vatsachak 8 hours ago [-]
They most definitely threw in rl with formal verification somewhere between GPT 4 and now. The models are better at not hallucinating. I don't think their IMO team are only show ponies...
ComplexSystems 12 hours ago [-]
That may be true for now, but it seems clear enough that letting the model use Lean in its internal reasoning process would be a great idea
trostaft 12 hours ago [-]
That I'd agree with! I really need to get around to learning Lean myself. It might be interesting to try and formalize some missing theoretical pieces from my field (or likely start smaller).
NooneAtAll3 6 hours ago [-]
how would they calculate "probability of solving" without automated verification?
ken47 6 hours ago [-]
> However, this was not verified in Lean.
This is the caliber of thinking in unimpaired AI bullishness.
Terr_ 13 hours ago [-]
> manage a McDonald's
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
Amazing bit of trivia that the founder of HowStuffWorks.com was named Marshall Brain.
kmeisthax 13 hours ago [-]
Casual reminder that the author's proposed solution to the labor-automation dystopia is to invent a second identity-verification dystopia. Also casual reminder that the author wanted the death penalty to anyone over the age of 65.
embedding-shape 10 hours ago [-]
I was curious about this book but now you've absolutely sold me on it, sounds like I'm in for a ride!
Lerc 13 hours ago [-]
I disagree. It will be able to perform work deserving if a fields medal before it is capable of running a McDonalds. I think it will be running a McDonalds well before either of those things happen, and a fields medal long after both have happened.
c7b 13 hours ago [-]
One could hardly ask for a task better suited for LLMs than producing math in Lean. Running a restaurant is so much fuzzier, from the definition of what it even means to the relation of inputs to outputs and evaluating success.
moron4hire 7 hours ago [-]
I think Lerc is saying that LLMs will be pressed into service managing McDonald's restaurants long before they are actually capable of managing said restaurants successfully.
edbaskerville 13 hours ago [-]
I just visited a McDonald's for the first time in a while. The self-order kiosk UI is quite bad. I think this is evidence in favor of the idea that an incompetent AI will soon be incompetently running a McDonald's.
Silamoth 13 hours ago [-]
Out of curiosity, what issue did you have with the McDonald’s self-order kiosk? I actually think McDonald’s has the best kiosk I’ve ever encountered. The little animation that plays when you add an item to your cart is a little annoying (but I think they’ve sped that up). But otherwise, it’s everything I’d want. It shows you all the items, tells you every ingredient, and lets you add or remove ingredients. I have a better experience ordering through the kiosk than I do talking to a cashier.
ndiddy 12 hours ago [-]
It takes longer than ordering with a cashier, it keeps trying to upsell you, and it's always out of receipt paper because unsurprisingly the company that isn't willing to pay a person to take orders is also not willing to pay a person to maintain the kiosks.
Sohcahtoa82 10 hours ago [-]
> It takes longer than ordering with a cashier
Depends on what you're ordering and who the cashier is.
If your order is the happy path of no customizations of a combo with an experienced cashier, it can be done in seconds, for sure. "Medium #4 with a Diet Coke", pay, done.
But if you customize your burger or ordering a lot of items a la carte and you're dealing with a new cashier that has weak English skills, good fucking luck. You'll likely need to wait for them to figure out they need to call someone over to help, have to repeat your order, and you end up spending far more time.
> it keeps trying to upsell you
Yeah, I'll agree that's obnoxious, especially when it's trying to upsell you something that's already on your order. I ordered a combo. I don't need you to add another fry.
Silamoth 11 hours ago [-]
Hmm. I’ve never really had those issues. It’s also much faster and easier than ordering with a human. I guess it does try to upsell you, but humans often do, too. And to me, it’s worth it to just click “No” in exchange for the added convenience (mostly in getting my order right).
I have had them run out of receipts, but it’s never mattered for me. If I’m dining in, the plastic number you carry to your table makes sure I get my food. And if I’m taking it to-go, they always find me anyways.
ryandrake 10 hours ago [-]
> It’s also much faster and easier than ordering with a human.
I'm not sure how that could be. I can walk up to the counter and say "Big Mac Large Fry Small Coke" faster than you can navigate the first screen of the kiosk, and a skilled counter worker can key that in and be done before I even get my credit card out.
doug_durham 4 hours ago [-]
You have to wait in line behind several people to get to the point where you can talk to the cashier. Most McDonalds have several kiosks. There is usually little or no line. I can place my order grab a table.
Silamoth 9 hours ago [-]
The problem, I’m a picky eater. I never order something that simple. I always need it with “No X” or “Only Y”. Cashiers often struggle with that, even if they understand me well (which they don’t always). It’s easier for me to see everything an item comes with and make sure I’m entering my order correctly.
hunterpayne 8 hours ago [-]
[flagged]
necovek 5 hours ago [-]
McDonalds' menu is not designed for folk like you. In my part of the world, we had traditional fast-food joints where the question would be inverse: out of the things you can see and add to your burger, pick a few. That is very efficient with a human prepping your burger.
NavinF 3 hours ago [-]
Hm? McDonalds is one of the best for customization. Everything is removable and the software knows the calorie count of each ingredient so the total that shows up next to each item in the cart is accurate
marknutter 10 hours ago [-]
It's easily one of the most intuitive and straightforward kiosks out there today and you don't have to wait for one of the cashiers to notice you nor worry about them punching in your order incorrectly.
Silamoth 10 hours ago [-]
Glad someone else feels the same way! Knowing that I enter my order in correctly is the biggest win there for me as a picky eater. The cashier is just entering it into a computer anyways, so it makes sense for me to enter it in myself. I honestly wonder why more restaurants don’t do this. It’s not that hard to wrap a halfway decent UI around the system you already have.
teiferer 2 hours ago [-]
Restaurants, pubs etc. serve multiple purposes.
If it's purely about the food, receiving it, consuming it, then sure, get the human out of the loop, interact with a machine. Ideally even the preparation is done by a machine. No human error or hair involved. Why even go there, let it be delivered to your home.
But these places are also about the experience of social connection. The bar keeper, the waiter, the chef. They are all involved in this experience and the actual food is "just" one component, one detail, albeit an important one. My favorite restaurants would be nothing without the people there.
It's similar with music. It's not just about the produced sound waves. The musician forms a social bond with the audience. Even when listening to a recording, my mind is re-living or at least imagining a live sitting, that connection with the musician. No machine generated music will ever be able to replace that.
necovek 5 hours ago [-]
I am more concerned with getting the right order and not with entering the right one. McDonalds will still get it wrong when you have a complex "change" of defaults even if it's entered correctly.
Other places optimize for this better by not having too many hand-overs between order and preparation.
ninkendo 6 hours ago [-]
Since you asked, and since I take my kids to the McDonald’s play place some weekends, and I’ve actually spent a bit of time pondering my ideal kiosk UI and what I don’t like about theirs:
It seems designed to maximize how many screens they show you to make an order. Each one with a slight delay and animation.
At a drive through I can say “gimme a number one, medium, with a Coke Zero” and they give me my total. That’s the convenience the kiosk is up against.
At the kiosk there’s:
- A welcome screen you have to tap
- A “carry out or dine in” screen
- Always one other screen with a dumb question about apps or whatever, tap through
- A top level menu with a bunch of categories, burgers, drinks, sides, desserts, etc… I guess I want burgers? But it’s a combo, hmm. I guess I’ll figure out how to make it a meal. Tap burgers.
- Then another screen with burgers, in a different order than the drive through numbering, tap Big Mac
- Then another dedicated screen to shows you a picture of a Big Mac, with a bunch of customization options, which you have to scroll past and verify that it matches the defaults you expect, and at the bottom you can tap add
- Then another screen asking you if you want to make it a meal
- Then another screen asking the size
- Then another screen asking what to drink
- Then another screen that shows you the drink
- Then another screen for what size
Etc etc etc. Each of these screens takes a few seconds to display too, just slow enough to be infuriating.
In my mind the ideal kiosk is something where you get “the menu” (like what you see on the billboard in the drive through) with the usual big squares with a number on them and a picture of the meal. Tapping one puts it in a “drawer” section with my order in it, and each item in the drawer can have simple in-line edit controls for “size” and “what to drink”, with them showing up empty in a way that makes it obvious I need to fill in those answers before I can check out.
I should be able to tap one button for the combo number I want, another for the size, another for the drink, then checkout, all on one screen without long delays. If I don’t want a combo but want individual items, I can just scroll down a bit to look at the full menu. The order drawer stays where it is.
Or hell, just let me say “number one with a Coke” and have a very simple ASR and NL parser figure it out and put it in my pending order to edit.
Customizations can be behind a simple “customize” button on each item in my pending order. If I don’t have customizations I can just ignore it. What you get with no customizations is what you’d get if you just order it verbally to a human without specifying anything. The concept of “here’s how we typically make it, if you want anything different let us know” is a very deeply ingrained and familiar concept to restaurant patrons, and being forced to answer every little question even if you don’t care, adds up to a lot of frustration.
Fast food places came up with the combo numbering system to make ordering faster, and it was super convenient and fast, because there’s a financial incentive to get you through the drive through because you’re blocking other customers. But since they have several kiosks available, they seem to not care at all about the efficiency of the user interface, because it’s not a problem for them. But it’s still a problem for me, because I still want to order quickly, despite it not blocking other customers. It’s a huge step down from just saying “number one with a Coke”.
jldugger 9 hours ago [-]
>The self-order kiosk UI is quite bad.
Most repeat customers use the app, which sports the digital equivalent of a loyalty program, and various coupons. And lets you save your 'usual' order with customizations etc. Plus the annoying push notifications for FreeFrydays or whatever. And upsells, new product launches, etc.
My recollection is that the kiosk is just a weak facsimile of the app. And wasn't terrible, but everyone's standards vary.
cwillu 5 hours ago [-]
> Plus the annoying push notifications for FreeFrydays or whatever. And upsells, new product launches, etc.
Which is why I will never reinstall their damned app.
13 hours ago [-]
vatsachak 12 hours ago [-]
Not necessarily. Obviously playing Kasparov on the board requires more planning ability than managing a McDonald's but look at where chess bots are now.
There's much more to being human than our "cognitive abilities"
pamcake 7 hours ago [-]
> Obviously playing Kasparov on the board requires more planning ability than managing a McDonald's
Not obvious and in fact I think the opposite is way more likely. Chess is well-defined and self-contained in a way that managing a restaurant with fleshy customers never will be.
vatsachak 6 hours ago [-]
That's true. I should clarify by saying I meant that a human playing on par with Kasparov obviously has the planning ability to manage a McDonald's
necovek 5 hours ago [-]
But that is also non-obvious. Even managing human employees — let alone customers — required a planning ability related to emotional intelligence that many a person with good pure logic ability simply lacks.
Also, there will be hundreds of disparate tasks that are happening in parallel, and even humans still make up frameworks to discover most urgent/important work that needs to be done first.
baq 12 hours ago [-]
Conjecture: the first AI to successfully manage a McDonald’s will be a Gemini.
econ 5 hours ago [-]
They no longer have to limit themselves to forking software but can do a global Google Burgers in a single prompt. It will no doubt be a huge success before shut down.
energy123 2 hours ago [-]
The issue with this prediction is the gulf between problem-solving using known tools, versus creating new concepts for problems where existing tools aren't enough.
All AI proofs so far, including this one, are using existing tools in new ways, rather than inventing new tools. This is not surprising if you know how these models are trained. These existing tools are in distribution. New tools are not.
Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.
We could get this weird spiky situation where the AI is insanely superhuman at all problem solving, but completely incapable of coming up with a single new tool. It discovers everything there is to discover, subject to existing axioms and concepts.
Timothy Gowers gives some commentary on this in the attached PDF.
evenhash 13 hours ago [-]
The proof is not written in Lean, though. It’s written in English and requires validation by human experts to confirm that it’s not gibberish.
vatsachak 12 hours ago [-]
Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean
auggierose 11 hours ago [-]
> A difficult part was constructing a chess board on which to play math
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't know how much training with Lean helped this particular model.
KalMann 12 hours ago [-]
I think your analogy is good but I don't believe modern LLMs use Lean or any lean-like structure in their proofs. At least recent open source ones like DeepSeek can do advanced math without it (maybe the most cutting edge ones are doing it I can't say).
vatsachak 7 hours ago [-]
They are most likely using them in training. I doubt their IMO team are show ponies
forinti 13 hours ago [-]
AI is already too old for that.
sigmoid10 13 hours ago [-]
Managing a McDonalds is a question of integration and modalities at this point. I don't think anyone still doubts that these models lack the reasoning capability or world knowledge needed for the job. So it's less of a fundamental technical problem and more of a process engineering issue.
Both links talk about the same thing? The first one just being more general. And yes, I would expect no less from a poorly constrained single agent that was instruction trained to be helpful and friendly. But if you look at how this has evolved as a benchmark [1] then the latest models show no doubt that can actually deal with this limited, simulated scenario given the correct setup.
I disagree. Even frontier models still achieve way worse results than the human baseline in VendingBench. As long as models can't manage optimally something as simple as a vending machine, they have no hope of managing a McDonalds.
throw-the-towel 13 hours ago [-]
The capability they lack is being able to be sued.
pear01 13 hours ago [-]
Police officers are human. In the United States in the vast majority of cases you can't sue the police, only the community responsible for them.
Assuming you can still sue McDonalds I am not sure if this is a problem in the robotic llm case. I'm also trying to imagine a case where you would want to sue the llm and not the company. Given robots/llm don't have free will I'm not sure the problem with qualified immunity making police unaccountable applies.
There already exist a lot of similar conventions in corporate law. Generally, a main advantage of incorporation is protecting the people making the decisions from personal lawsuits.
nemomarx 13 hours ago [-]
McDonald's are franchises - you generally want to sue the local owner or threaten them in addition to the holding company.
That only requires someone own the ai managed McDonald's though. so long as they can't avoid responsibility by pointing to the AI I don't see why you couldn't sue them.
lancekey 11 hours ago [-]
25/75%. Plenty of stores are owned directly by McDonalds corp.
logicchains 13 hours ago [-]
>Police officers are human. In the United States in the vast majority of cases you can't sue the police, only the community responsible for them.
Police are a monopoly; nobody has a choice about which police company to use. McDonalds are not a monopoly, and many customers would prefer to eat at competitors run by entities that could be sued or jailed if they did anything particularly egregious.
pear01 12 hours ago [-]
You are missing the point. The point is you can still sue the McDonalds. With the police there is a human intuition to also want to sue the officer, given the officer is a human being who has free will and thus made a choice to violate your rights.
The same intuition applies if you walk into McDonald's and a person there mistreats you. You want that person held responsible.
But the LLM is not a person. What is there to even sue? It just seems like it would simply pass through to the corporate entity without the same tension of feeling like we let a human get away with something. Because there is no human, just a corporation and the robot servicing the place.
Put another way - if the LLM is not a person, what is the advantage of a personal lawsuit?
Just sue the McDonalds. Even in a case where the LLM is extremely misaligned and acts in a way where you might normally personally sue the McDonald's employee, I'm just not sure the human intuition about "holding someone accountable" would have its normal force because again - the LLM is not a person.
So given we already have the notions of incorporation and indemnification it doesn't make sense to say what is precluding LLMs from running McDonald's is they can't be sued. If McDonald's can still be sued, then not only is there no problem, there is very likely not even a change in the status quo.
DoctorOetker 7 hours ago [-]
can you give a more concrete description of a McDonalds LLM mistreating a customer? it's gotten to abstract
necovek 5 hours ago [-]
It could sneak in an ingredient you are allergic to.
DoctorOetker 4 hours ago [-]
my only allergy is to bullsh..
and LLM's are getting better at providing less of it
perhaps in the future the GPU-poor can go to McDonalds and get AI to solve their riddles by ordering an extra napkin with the solution written on.
parineum 8 hours ago [-]
> given the officer is a human being who has free will and thus made a choice to violate your rights.
The purpose of qualified immunity is for when an officer does something that turns out to be illegal but they were both told to by their superiors and did not think it was in violation at the time.
An officer making a choice to violate your rights would not be eligible for qualified immunity.
pear01 4 hours ago [-]
Wow yes excellent point, because of course a police officer facing the threat of legal action would never attempt such a low bar lie. Oops my boss told me to. Oops I didn't know. Case dismissed.
Excellent standards for people authorized by the state to run around with a badge and a gun in a free society. Your comment history on this is so unimpressive. Would you countenance the same excuses in anyone else? A man puts on his police uniform and suddenly you think he should be immune from civil prosecution because "my boss told me so" and "I didn't know"?
I wonder if you will make similar excuses for robo cop. Or if your principles merely extend to whatever human you can find in uniform willing to tolerate your friendship.
brikym 6 hours ago [-]
Hey ChatGPT, if a person spills hot McCoffee on themselves who is at fault?
brookst 6 hours ago [-]
Well, brikym, exactly how hot is this hot coffee? If it’s within normal expectations for coffee it is likely that person’s fault. If it is 210 degrees F, it is likely McDonald’s fault.
volkercraig 12 hours ago [-]
> we'll see more specialized math AI resembling StockFish soon
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
vatsachak 12 hours ago [-]
My claim is that LLMs waste a lot of time training on all available data.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space
red75prime 11 hours ago [-]
It has been tried. Lenat's Automated Mathematician, for example. The problem is that the system succumbs to combinatorial explosion, not knowing which directions are interesting/promising/productive. LLMs seem to pick up some kind of intuition from the data they are fed. The generated data might not have the needed "human touch" or whatever it is.
vatsachak 7 hours ago [-]
It might just be that we didn't have enough compute till now. StockFish definitely has superior intuition
whimsicalism 13 hours ago [-]
the only thing keeping the mcdonalds from happening will be political, likewise the same with fields medal
My claim is that we haven't even witnessed the move 37 of math yet. I am claiming that math AI is going to get even better
segmondy 13 hours ago [-]
our local AI models are already capable of running McDonalds.
ori_b 12 hours ago [-]
We're automating art and science so that we can flip burgers. This future sucks.
vatsachak 12 hours ago [-]
Math is a very specialized subset of art and science more amenable to automation.
ori_b 10 hours ago [-]
The first thing we automated passably was art, even before programming. Were you not paying attention?
This future still sucks. The tech industry is making the world a worse place.
amunozo 2 hours ago [-]
Calling AI-generated images art is a stretch. Same thing with creative writing. It can make some low-cost illustrations and writing, but it is very far from decent art. Compare those results with their amazing coding or math capabilities.
dyauspitr 8 hours ago [-]
No, we’re not going to be flipping burgers either, they will have physical robots for that. 20 years down the line I wonder what work all of us will be doing.
dyauspitr 12 hours ago [-]
Nonsense. Have you been watching the figure live stream? Or the Unitree video from yesterday with real time novel action generation? We’re less than a year away. If you can cook a burger, assemble a sandwich and clean up surfaces you’re all of the way there.
vatsachak 12 hours ago [-]
Fair. Let's see in a year. I'm willing to bet that nothing happens.
dyauspitr 12 hours ago [-]
Yeah, it’s gonna be an exciting year. I still think you’ll need one human in there, but that’s about it.
huflungdung 7 hours ago [-]
[dead]
13 hours ago [-]
mooreat 13 hours ago [-]
I think one interesting thing to point out is that the proof (disproof) was done by finding a counterexample of Erdős' original conjecture.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
gus_massa 11 hours ago [-]
Searching for a proof and disproof are sometimes not so different. In most cases, you nibble the borders to simplify the problem.
For example, to prove something is impossible let's say you first prove that there are only 5 families, and 4 of them are impossible. So now 80% of the problem is solved! :) If you are looking for counterexamples, the search is reduced 80% too. In both cases it may be useful
In counterexamples you can make guess and leaps and if it works it's fine. This is not possible for a proof.
On the other hand, once you have found a counterexample it's usual to hide the dead ends you discarded.
I agree there can be some theory crafting in the search for a counterexample, but in general I think it is easier to search for.
For proving a proposition P I have to show for all x P(x), but for contradiction I only have to show that there exists an x such that not P(x).
While I agree there could be a lot of theory crafting to reduce the search space of possible x's to find not P(x), but with for all x P(x) you have to be able to produce a larger framework that explains why no counter example exists.
energy123 4 hours ago [-]
Timothy Gowers said a proof (rather than disproof) would have been different and more impressive because it would have required new mathematical concepts.
No, the thing the LLM did is not a proof, it's the opposite. It's proving that the conjecture is false.
Reductio ad absurdum is a technique to prove something.
davebren 11 hours ago [-]
> I think that's just a matter of having them able to work on longer and longer time horizons.
No this will never do the kind of math that humans did when coming up with complex numbers, or hell just regular numbers ex nihilo. No matter how long it's given to combine things in its training data.
mooreat 11 hours ago [-]
I currently operate under the assumption that humans are at most as powerful as Turing Machines. And from what I understand these models internally are modeling increasingly harder and larger DFAs, so they're at least as powerful as regular languages.
Assuming humans are more powerful than regular languages I could maybe agree that these methods may not eventually yield entirely human like intelligence, but just better and better approximations.
The vibe I get though is that we aren't more powerful than regular languages, cause human beings feel computationally bounded. So I could see given enough "human signal" these things could learn to imitate us precisely.
davebren 10 hours ago [-]
Well yeah there is likely an equivalence between computability and epistemology, but I'm not sure it matters when comparing LLM intelligence to human intelligence. There is clearly a missing link that prevents the LLM from reaching beyond its training data the way humans do.
virgildotcodes 8 hours ago [-]
If you look at the life efforts and accomplishments of the ~100 billion humans who have ever lived, how many lifetimes would you discount as having "non-human intelligence" based on the lack of "novel" contributions to frontier of our species' scientific understanding according to the same high bar you apply to LLMs?
Do you pass that bar yourself?
davebren 8 hours ago [-]
Ordinary humans do novel things all the time. Where do you think LLMs got all the training data that their responses come from?
virgildotcodes 7 hours ago [-]
You're not quite addressing the question. More and more of the training data is now synthetic.
To be very specific - what novel things did the majority of the ~8 bil humans on Earth do say, yesterday, that you wouldn't otherwise dismiss as non-intelligent rehashing of the same tired patterns they always inhabit were those same actions attributed to LLMs?
What I'm getting at is that I think you're falling into the trap of thinking of the rare geniuses of human history, and furthermore their rare moments of accomplishment (relative to the long span of their lifetimes filled mostly without these accomplishments) when you think of "human intelligence", which is of course far overstating what actual human intelligence is.
davebren 7 hours ago [-]
Synthetic training data is carefully crafted by humans. The rare geniuses of human history use a different magnitude and configuration of the same kind of human intelligence that posted a dad joke on a site that got scraped into the training set and repeated, convincing people that it is intelligent like humans.
> that you wouldn't otherwise dismiss as non-intelligent rehashing of the same tired patterns they always inhabit were those same actions attributed to LLMs?
Regardless of whether something's been done before people still come up with them on their own without directly copying or amalgamating several copies. Pretty much every skilled profession includes figuring things out on the fly through the use of general reasoning that doesn't involve pattern matching against millions of examples.
virgildotcodes 5 hours ago [-]
> Synthetic training data is carefully crafted by humans.
Much, if not the majority of synthetic data is AI generated. Human experts then evaluate samples of the data, but nothing like the entire corpus which can be trillions of tokens of generated material.
> The rare geniuses of human history use a different magnitude and configuration of the same kind of human intelligence
I agree. What I don’t see any strong evidence for is that this intelligence is unique to humans. Nor do I see how it could ever be anything other than recombinations of existing data with random mutation. Where else would the building blocks for each invention come from, divine insight? We build on the shoulders of giants etc etc
Worth noting, as a sidebar, that we’re having this discussion on a post mentioning a novel breakthrough made by AI over a topic that many brilliant human mathematicians including Erdos himself failed to do.
> Regardless of whether something's been done before people still come up with them on their own without directly copying or amalgamating several copies.
I’m not even saying it in the “there’s nothing new under the sun” sense.
If you follow an average person’s day from beginning to end. Let’s say in Bangkok or NYC or Paris, at which part of the day are they not simply repeating a variation of something they’ve done many times before, or seen others around them do before, or read about others doing before, or heard about others doing before, watched others do before on TV etc etc
What you have left, how is it distinguishable, without reasoning backwards from the desired conclusion of human exceptionalism, from turning up the temperature on an LLM query?
How many data points does a human parse when they attempt to stand up as a toddler? Sight, sound, sensation from every limb and body part, inner ear, internal thought processes at the time conscious and unconscious related to the moment and attempting to interpret it in relation to all that it’s experienced to this point, including all prior attempts and whatever retained associated data, a hard to even comprehend stream of data, coming in continuously over however many minutes, hours, etc of attempts.
The stream of data the brain is processing from both external and internal sources from birth is incredibly rich, and if we attempted to represent the full depth of it it would far outweigh the size of any corpus models are being trained on now.
I think what may be genuinely missing from AI is the type of data that doesn’t translate completely into text. The audio and images/video we feed in are a totally incomplete slice of the POV of say even a single average human through their lifetime, and bereft of all the associated data a human has access to in the moment (sensory etc).
I think this tends more towards the world models that Yann Lecun et al are promoting as the key to more capable AI.
necovek 5 hours ago [-]
You seem to be missing their point (which I agree with). The type of intelligence we are equipped with allows us not to have the level of memory an LLM does and still complete tasks that are novel to us every single day. Like navigating a shopping cart through tricky coridors in a store, coming up with a dad joke as in sibling example, combining a set of tools to achieve something we have never seen before, etc.
LLMs approximate a lot of that very well by simply having seen it before.
Also watch kids develop language: they learn patterns with much less training data than LLMs.
virgildotcodes 4 hours ago [-]
I addressed much of this in my response to a sibling comment, but a few more here:
> novel to us every single day. Like navigating a shopping cart through tricky coridors in a store
We have been practicing navigating the physical world for something like 16hrs/day every day from the moment of our birth. All the sensory data passing through our brains during that time is far larger than any dataset an LLM is trained on.
Humans navigating a shopping cart at a store have likely navigated the physical world before, pushed a shopping cart before, and in combination have navigated stores while pushing shopping carts before. Nevertheless, many still bump into objects all along the way.
Them succeeding at successive variations of store layouts is not novel unless we expand the definition of novel to mean any recombination whatsoever of pre existing concepts.
I’m certain that with all the intense usage of AI by hundreds of millions of people, there have been countless collections of words passed to LLMs so far that have never before been uttered in exactly such a sequence, let alone in the dataset.
I’m equally certain the LLMs have responded to those words with collections of its own that have also never been uttered in that exact sequence, responding to their unique context.
It is trivial to produce an example of this now yourself if you’d like.
The LLM we’re talking about, mentioned in the OP, has never seen this solution to this problem in its dataset. A large number of brilliant mathematicians were not able to discover this solution. They are themselves expressing that this is a novel breakthrough and had this come from a human it would be treated as such.
If the response to that is “well it’s just recombining concepts it already knows until it finds a solution that works” I would ask how that differs from what humans do?
necovek 4 hours ago [-]
You missed the core of my point: humans operate, including in the real world, on much less training data. Give a human a shopping cart and ask them to push it backwards, and they'll figure it out in a few minutes even if they've never done it before.
This is the bit that's missing that LLMs do approximate amazingly well through sheer training set size, but in my opinion, it puts a cap on what novel things they can achieve in comparison with humans.
To me, I've thought about a related "invention space" before: with us creating software to solve many problems people are facing, why are there not any perfect solutions for any problem (running a cafe? a CNC machine? ...), and we always need more software built to cover one small (novel?) change for a particular owner?
The world space is just so large that you need whatever this intelligence is humans (and animals) have to navigate it successfully — but LLMs do not intrinsically.
Whether they can be so large that it does not matter in 99.99% of cases is to be seen.
virgildotcodes 3 hours ago [-]
> You missed the core of my point: humans operate, including in the real world, on much less training data.
I very specifically addressed this in my response to you. How much training data is contained in 16 waking hours of navigating the world fusing all sensory data, never mind data being simultaneously generated within the mind while this is all going on, from birth til death? From birth til pushing that shopping cart?
Far, far more than in all the training datasets being used for AI.
I also addressed this again in my reply to the sibling comment.
People tend to discount how much data humans have passing through their minds 24/7.
A human isn’t born in a vacuum as a fully formed adult and dropped into the shopping cart navigation problem.
A human has had far, far more training data fed into it that contains all the pieces necessary to translate to pushing a shopping cart when first seeing it, than a machine learning model which has been fed 1 million videos of a robot pushing a shopping cart.
ex-aws-dude 8 hours ago [-]
You're just stating the opposite of the commenter with no additional discussion
Its like just commenting "I disagree" its totally pointless for discussion.
That's why you're getting downvoted if you're wondering.
davebren 8 hours ago [-]
What did you say that added to the discussion? I wasn't wondering at all. More compute time won't create new mathematics. To believe otherwise is to misunderstand the technology and there is no amount of hackernews votes that will change that.
raincole 12 hours ago [-]
I like how everyone laughed when OpenAI said their models will have "PhD-Level Intelligence" and now the goalpost has been moved to if AI can create new math (i.e., not PhD-Level, but Leibniz/Euler/Galois level.)
bananaflag 3 hours ago [-]
As a mathematician, new, conceptual math is when I'll become interested in reading LLM output.
I appreciate very much the work done so far, but this sort of asymptotic/quantitative result didn't interest me much even when it was done by humans.
(This is not snobbery, just a personal preference.)
toilet 23 minutes ago [-]
I have no idea about research in mathematics: How will mathematicians judge what constitutes new conceptual math that is actually useful, vs a hallucination that might be novel but doesn't introduce anything actual meaningful?
kamaal 2 hours ago [-]
Well that's coming.
As a matter of fact more logic and structure to your work, the more easy it is for AI to conquer it. Due to this programming was the first thing that got solved, but pure sciences are next.
If what you do, and how you do can be written down on a piece of paper, then AI can do it.
I do believe programming getting solved will be double assault on these fields.
>>This is not snobbery
This is good for the species, what sense does it make to keep treating these fields like they are reserved for the top most intelligent micro percentage of humans? Getting LLM to these things gives some scale to these subjects and thats good.
alt227 2 hours ago [-]
> Well that's coming.
So is AGI, but we may be hundreds of years off still.
turzmo 37 minutes ago [-]
Not denying that these advances are impressive, but it is important to consider that this is a cherry-picked result. This doesn’t mean that AI can now be expected to do problems of similar or lower difficulty, but that it happened to work well on one problem. What you won’t see is how many others they had to try to get this result.
necovek 5 hours ago [-]
PhDs used to mean publishing a novel mathematical result: when has that changed?
bananaflag 4 hours ago [-]
They mean "new math" in the sense of more than a novel mathematical result, a new math paradigm or so.
kamaal 2 hours ago [-]
Thats coming too.
Some times when you go some distance with a subject generates data for new ideas.
Once math gets done fast, newer ideas and paradigms also arrive.
tedbradley 3 hours ago [-]
My good sir or madam, disproving a decades-old conjecture produced by Erdos that has had armies of people in that field have their go at it IS a novel mathematical result.
melagonster 2 hours ago [-]
So finally they reach a part of PhDs level. Current version rely human to integrate results from they model and writting the papper. If LLMs/AIs can do all thing above, we can exactly get a PhDs level model.
raincole 3 hours ago [-]
The gap between novel result and "new math" is as wide as the pacific ocean.
golol 3 hours ago [-]
No it is not Leibniz/Euler/Galois. More like writing good papers that contribute to the broader understanding of a theory. I think if one evaluated a mathematicians research output and it consisted of mostly the kinds of problems AI has solved so far, it would give the impression that this person is somehow very good at picking accessible problems to target, but has not made a larger impact on the field.
3 hours ago [-]
yreg 1 hours ago [-]
The goalposts are Euler level not the current model capabilities.
zamadatix 5 hours ago [-]
My only complaint is the claims always start spreading 6-12 months before the delivery. A little patience goes a long way in what's possible with AI and we all just have to wait and see what parts actually grow this next cycle or not. Guessing at it based on trend lines only leads to people getting excited when it matches their particular guess and ignoring it when it doesn't.
no-name-here 4 hours ago [-]
>> OpenAI said their models will have "PhD-Level Intelligence"
> My only complaint is the claims always start spreading 6-12 months before the delivery.
If delivering on such promises "always" occurs 6-12 months after the promise, is that pretty good?
zamadatix 3 hours ago [-]
Again, the promise isn't _always_ delivered, people just more often focus on when the particular result aligns with their view. When it is though, it's all too commonly 6-12 months later. Which is nice but a bit annoying - why not wait 6-12 months and claim when you can actually show it? Or just say that's where it might be soon instead of talking like it is now.
I generally like AI and use it plenty often, it does many things well and I'm curious to see how far it keeps going, but that doesn't mean I have to like overhyped marketing about it.
melagonster 2 hours ago [-]
They said that their specific version of model had has this ability one year ago.
zamadatix 2 hours ago [-]
I can't open all of the links in the article because of some Cloudflare issue (perhaps related to me being on a plane) but is the version and sub iteration of the model they actually used for this the same as the one they announced the capability on a year ago? If so, did they comment why didn't they just show this a year ago (they seem to have been publishing successively better results slowly instead).
staticman2 6 hours ago [-]
What's laughable is an OpenAI employee invented the term "PHD level intelligence" and you think that " PHD Level intelligence" is a real term that describes a real thing and you are repeating it here.
mathisfun123 6 hours ago [-]
I can't wait till these NPC types start rating people as "Opus level intelligence".
perching_aix 1 hours ago [-]
Thinking you're magically smarter than others is indeed an essential part of the NPC trend, to the extent that it in itself becomes an NPC thing to say.
It's pretty much a 1:1 match to the "we're all unique snowflakes" meme, with an army of Buzz Lightyear toys repeating the same in the background.
zeofig 11 hours ago [-]
I still laugh.
johnfn 11 hours ago [-]
Have you updated your priors after this announcement? If not, why not?
ex-aws-dude 7 hours ago [-]
Yes let me calculate the exact change it’s 0.004748394 probability now based on my own made up statistical vibes that I feel
xyzsparetimexyz 10 hours ago [-]
Prior whats?
se4u 9 hours ago [-]
When a qualifying noun is absent , then priors means prior beliefs.
I don't have enough information about the announcement for it to mean much to me. I don't know much about this field of maths. I don't know how many mathematicians were actively working on this problem. It could be zero, which would indicate it's not really that interesting. The article gushes about how it's a Very Important Problem, but it's not even mentioned on https://en.wikipedia.org/wiki/List_of_conjectures_by_Paul_Er.... I'm sure the busy folk at openAI will fix that soon however. Furthermore the extensive dishonesty of companies like openAI makes me suspicious of just how this was achieved. Overall the announcement is of little interest to my "priors", although I don't typically think in such terms.
dellamonica 7 hours ago [-]
It is extremely well known. Lots of people have tried to solve it and it stood basically stuck for 80 years. It is getting harder every day to downplay these models.
Given its elementary nature (very easy to state), you can bet that a lot of very bright people have worked on it (I know of one MIT graduate who specialized in Geometry had a lot of interest in it).
voxl 4 hours ago [-]
I don't believe the result at all. I think it contains faulty logic. Perhaps the mathematicians involved can read the tea leaves and decide something interesting happened, but all this AI psychosis bullshit still refuses to accept that AIs do not, and cannot, have a mental model of the world.
Moreover, model output is incredibly good at looking credible but being wrong. It has NEVER produced something correct for me in a field of which I am an expert without some external oracle to validate claims (like e.g., Lean)
gizmondo 41 minutes ago [-]
At this point the term "AI psychosis" is the more apt label for AI skeptics. Here we have literal Fields medalists vouching for correctness and relative importance of the result, but who cares, "I don't believe the result at all". Just pure denial of reality.
golol 3 hours ago [-]
You should believe that the proof works at least as much as any ither paper in mathematics. The proof has been scrutinized by experts and simplified and improved. If you don't believe that then I'm sorry but you are deluding yourself.
signatoremo 8 hours ago [-]
You don't have enough knowledge to dismiss them, but you still laugh? For?
zeofig 8 hours ago [-]
Do you have enough knowledge? I laugh at everyone who accepts these claims in the light they're presented despite knowing so little.
perching_aix 1 hours ago [-]
Doesn't make much sense, does it? If I accept that I don't have enough information on something, then I withhold judgement. There's nothing so reserved about mockery and cynicism. You're not cautious, you outright hedge that it's all a lie, and paint everyone else to be a complete idiot for thinking at all otherwise.
The world runs on trust, specifically trusting expert advice. It'd seem that due to resource constraints and scale, that's the best available option. By extension, there should be absolutely nothing weird or surprising on people following suit. It's why these companies themselves rely on expert counsel, and defer to their appraisals for marketing. The opposite is what's weird and unusual, and what requires more substantiation.
It's interesting that those who come out swinging against "trusting the experts", or really, trusting anyone else but them, not only ~never acknowledge this, but are seemingly outright proud of it, considering it as their own unique little trait, egocentrically revelling in it. It's almost as if epistemic rigor and truthfulness was not their actual concern.
Woohoo, I'm distrustful and cynical. Behold my unfathomable wisdom! Bonus points if they're also hurtful, because flipping the arrow on "hard truths -> hurt feelings" is a masterclass in reasoning too, of course.
I can appreciate faulting experts and organizations for misusing people's trust and looking out for this angle, but given how unavoidable and fundamentally useful trusting itself is, blaming people for defaulting to trusting makes no sense to me whatsoever. It comes across as just the usual trope of blaming the individual.
signatoremo 6 hours ago [-]
The GP said "I like how everyone laughed when OpenAI said their models will have PhD-Level Intelligence", and you said you still laughed, so I just wanted to confirm if you did laugh at that. Apparently you did not. Thanks for the confirmation. I think you should not, given your admittedly limited understanding.
pickleRick243 7 hours ago [-]
You don't know the names of the mathematicians who've given their thoughts on this? If not, you really should just not comment on anything mathematical ever again.
zeofig 7 hours ago [-]
I do know their names. However I'm not in the field and there are many cases in recent years of high-profile scientists putting their weight behind highly dubious claims. Thanks for the advice, by the way.
Note that I'm not disputing the validity of the counterexample itself.
pickleRick243 7 hours ago [-]
That's fair. If you're familiar with mathematics culture though, you'd know that "LLM hype" is not really in their blood and is certainly not something that gains you PR points. I think it's safe to take their comments at face value. I do think the ice is beginning to thaw though and perhaps in the next few years, there will begin to become more of a hype phase in math if some really high profile problems begin to fall to AI, although one might argue at that point that the hype would be deserved.
lg5689 8 hours ago [-]
The problem was pretty well known, and had many human attempts. There's some room to argue that the right humans hadn't attempted it, as the solution used advanced methods from another field of math. But imho, whereas many prior AI victories could be explained by not enough human attention, there is no such excuse in this case, and one should acknowledge this is a notable achievement.
xgulfie 8 hours ago [-]
large language models do not have pigeon-level intelligence. They can't even feed themselves.
dawnerd 11 hours ago [-]
Yet it still codes like a junior developer that memorized all of stack overflow.
raincole 11 hours ago [-]
PhDs code like that too. Especially if they're statisticians :)
bdamm 8 hours ago [-]
Even if the code was like that (it isn't), the power of the current crop of models to analyze data for patterns and build context out of code is leaps and bounds what it was even a year ago. And any developer will tell you that the hardest part of fixing a bug is knowing where the bug is in the first place. Once you know where it is, fixing it is usually trivial.
There is serious magic happening in the construction of model context.
dilap 11 hours ago [-]
Personally I don't find this to be true anymore! It's not always great and does still will often tend towards unneeded complexity (especially if not pushed a bit), but I often find GPT 5.5 writing code I would have written myself. This was very much not true with earlier models (who make something that worked, but I'd always have to rewrite to make it "good code").
dawnerd 8 hours ago [-]
Personally I found 5.5 a massive step back from 5.4. Both of them still use way too many fallbacks and unnecessary checks, especially if you're having it output php. It's fine if you're just one person and checking everything and able to catch and correct. But it's really bad when you have a team all using it, not checking the output and trusting it's output leading to spaghetti code. Technically works, but very messy and will no doubt lead to buggy code.
It still writes like a junior dev, in that despite AI being able to get a picture of an entire repo, it's changes are typically confined to the task it's working on and will opt to duplicate logic to keep changes contained. Again, technically works, not ideal.
dilap 5 hours ago [-]
Yeah, it has a tendency to default to "smallest local hack that will work" and code as defensively as possible.
BUT I have had great success using AGENTS.md and becoming better at prompting to get it to not be like this.
Basic approach in AGENTS.md: don't code defensively, yada yada, we have a validation layer at X, no need to check for anything behind that layer. Works well.
An approach I've found helpful when prompting: What would be the best architecture for this change? If you say "do X" it'll tend to just do the hackiest, shortest path thing. If you say, "what's the best way to do X?" it will think more holistically.
That said, who knows, maybe when it's PHP it just really wants to hack ;-)
(Also, yes, you still need to review the code -- it will still do stupid things, so you can't just be pure hands off w/o ending up with quality degredations. The same is true of humans too though in my experience...)
esikich 7 hours ago [-]
Idk man, I think at this point, if you can't get good code out of frontier models, you're doing something wrong. Plenty of resources out there for you to familiarize yourself with the workflows if you can be bothered.
fourseventy 6 hours ago [-]
What is the last model you used... lol. Linus Torvalds himself said the newest models are better than him at coding.
stemchar 5 hours ago [-]
This doesn't sound correct. Source?
no-name-here 4 hours ago [-]
In recent months, Linus said it specifically about code for a personal side project of his. The quote was in the commit message. (I’m not the grandparent commenter, and I think grandparent commenter’s claims may be too broad or require context.)
yrds96 2 hours ago [-]
There are a missing the context: The vibecoded application was written in python while the main code was written manually in C by Torvalds in this side project. He never ever said that AI produces better code than him in the language where he is proficientI.
> The python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.
zulban 11 hours ago [-]
Clearly you've never supervised junior developers.
dawnerd 8 hours ago [-]
That's literally my job...
Supermancho 7 hours ago [-]
>That's literally my job...
Since you’re not in a unique position, I can confidently state that your comparison of LLMs to jr developers seems unfounded. Today, LLMs produce code that is superior to junior developer code by an order of magnitude.
Notably, they demonstrate consistent syntax, clear separation of concerns, strong test coverage, organizational rigor, idiomatic API usage, and the ability to generate and maintain documentation, among other measurable qualities.
LLMs generally operate at a staff engineer level for a number of languages and ecosystems (including polyglot projects).
dclowd9901 4 hours ago [-]
I'm not sure what your background is, but as a staff level engineer, I can assure you they do not. They in fact seem to lack any understanding of architectural intent within a sufficiently large code base. This seems obvious since they can't fit the entire code base in their context at once.
We have many folks (not engineers) at our company using LLMs to open PRs, and every one of these PRs has profound architectural design problems.
bigstrat2003 6 hours ago [-]
LLMs absolutely do not exceed the abilities of junior devs. They don't even meet that bar, let alone exceed it. Junior devs are capable of getting syntax right without someone going "hey you messed that up". LLMs are not. Junior devs get basic logic right. LLMs do not.
Comparing an LLM to a senior developer is an absolute joke.
no-name-here 4 hours ago [-]
1. Which LLM are you using that is “not capable of getting syntax right”?
2. Are you referring to without having a compiler or LSP check it? Although even then, the recent LLMs I've used still frequently get syntax right, whereas I'd expect juniors are often using a LSP or compiler to catch mistakes while writing code?
cleaning 1 hours ago [-]
What model are you using? Llama 3.1 8b? This has not been true for years.
jldugger 9 hours ago [-]
Or PhDs
ccvannorman 2 hours ago [-]
I looked at all linked articles and could not find an example of the points (they show a square grid of points with n~=100 but no other ordering of points to show the more optimal layout(s)).
Is there anywhere an image example of a superior layout for example with n>={100,1000,10000}..? I would love to see it. I am imagining it would look somewhat like a sloppy pizza.
zozbot234 13 hours ago [-]
The summarized chain of thought for this task (linked in the blogpost) is 125 pages. That's an insane scale of reasoning, quite akin to what Anthropic has been teasing with Mythos.
I'm disappointed only that the chain of thought needed to be rewritten. Need to train these LLMs to natively communicate in LaTeX research paper format.
estetlinus 12 hours ago [-]
Today I generated the equivalent of two LOTR books just to fix three missing rows in my SQL models (and open a PR), so +1
wayeq 9 hours ago [-]
or put differently, you melted x cubic meters of polar ice
FuckButtons 5 hours ago [-]
Based on some napkin math, that would be about ~100 watt hours of electricity on an H100 cluster, or, roughly the same amount of energy needed to boil a kettle for a cup of tea.
nhinck2 4 hours ago [-]
That's an exceptionally fast output you have there...
Mind showing your working out?
Chamix 8 hours ago [-]
I note that (though summarized), this is ~100k tokens. Anyone who routinely works with Codex (or any agentic harness really) can tell you how trivial it is to eat up 100k tokens doing complex work. I've personally had plenty of codex 5.5 xhigh sessions where just the pure chain of thought token count in a single turn exceeds 200k (and I assume doesn't go further only due to compaction meta-guidance; the harness will push the model to stay under 256k per turn/thinking block) .
I think the more interesting question is how many tokens were spent all told; the most interesting graph in the article imo is the success rate by log test-time compute: how many tokens are being spent on the right of the graph to hit a winning CoT/solution like this >50% of the time?
recitedropper 12 hours ago [-]
This is impressive, no question.
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
JacobAsmuth 7 hours ago [-]
Exactly. Maybe OpenAI paid mathematicians to keep this discovery quiet, then added their proof to the training data, then manipulated a second team into prompting for this question such that the model could regurgitate the solution. This would plausibly explain why the model seems so capable at doing things like refuting fundamental theorems of mathematics while in things like competitive programming, biology, and physics it's merely only in the top 99.9%.
golol 3 hours ago [-]
You are believing a very unlikely scenario. I think the reason is that you have been convinced of a claim which is unlikely and indeed not true. That is:
>the model seems so capable at doing things like refuting fundamental theorems of mathematics
That is not true and a complete misrepresentation of recent progress of AI in math. It is therefore not necessary to believe the conspiracy theory you described in order to explain recent progress of AI in math.
ai_fry_ur_brain 2 hours ago [-]
Its almost certainly a scam and you're falling for it.
i_love_retros 7 hours ago [-]
You're a bot! Hey everyone, over here! I found a bot!
muhneesh 4 hours ago [-]
This type of discourse is just inane and more reflective of the author's sensibilities than anything it claims.
Congrats to the OpenAI team for one of the most significant breakthrough discoveries in AI history.
geraneum 4 hours ago [-]
How dare people think critically of the corporate machine. It’s inane!
fergie 2 hours ago [-]
> The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
Really? Any references to read more?
Rover222 12 hours ago [-]
Seems like a very tin-foil-hat-take to me
net01 11 hours ago [-]
I’m quite certain that a few months ago, some problems were claimed to be solved by AI. However, those claims were actually false and were exactly that, solved erdos problems that were not marked as solved and the solution was "found" by AI.
The corollary is that this is a very valuable capability of AI!
The ability to find incredibly obscure facts and recall them to solve "officially unsolved" problems in minutes is like Google Search on steroids. In some sense, it is one core component of "deep expertise", and humans rely on the same methodology regularly to solve "hard" problems. Many mathematicians have said that they all just use a "bag of tricks" they've picked up and apply them to problems to see if they work. The LLMs have a huge bag of very obscure tricks, and are starting to reach the point that they can effectively apply them also.
I suspect the threshold of AGI will be crossed when the AIs can invent novel "tricks" on their own, and memorise their own new approach for future use without explicitly having to have their weights updated with "offline" training runs.
mrdependable 11 hours ago [-]
How is that a "tin-foil-hat" take? It's not a secret, and in fact widely reported, that these companies are spending billions on creating training data.
dmix 10 hours ago [-]
So you think that OpenAI paid some mathematicians to either solve this conjecture problem, or a bunch of related unpublished math related to it, then fed it into an LLM model so they could announce it as being solved by the model? How is that not a conspiracy theory?
mrdependable 9 hours ago [-]
It is just a theory, the conspiracy part is not really applicable. I don't see what is controversial about it. Are you implying the machine taught itself the mathematics to do all this?
dmix 8 hours ago [-]
> Are you implying the machine taught itself the mathematics to do all this?
Are you asking me how LLMs work?
The theory proposed by the original commenter was that there could have been some secret training data the model was trained on that made it possible to solve this problem set. So the only conclusion is they are implying it's a conspiracy by OpenAI to hide some novel math research they funded merely to do marketing about solving math problems (then convincing multiple math experts to verify and support it with papers). That is the definition of a conspiracy.
recitedropper 12 hours ago [-]
I'm not letting the government read my brainwaves.
In all seriousness though: My suggestion is that those shepherding the frontier of AI start acting with more transparency, and stop acting in ways that encourage conspiratorial thinking. Especially if the technology is as powerful as they market it as.
0x5FC3 13 hours ago [-]
Is there a reason why we only hear of Erdos problems being solved? I would imagine there are a myriad of other unsolved problems in math, but every single ChatGPT "breakthrough in math" I come across on r/singularity and r/accelerate are Erdos problems.
jltsiren 12 hours ago [-]
Erdős problems form a substantial fraction of all mathematical problems that have been explicitly stated but not solved; are sufficiently famous that people care about them; and are sufficiently uninteresting that people have not spent that much effort trying to solve them.
Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.
edanm 3 hours ago [-]
> and [Erdős problems are] sufficiently uninteresting that people have not spent that much effort trying to solve them.
Note that this is not really true of this problem in particular.
bananaflag 13 hours ago [-]
Erdos problems are easier to state, thus they make a great benchmark for the first year of AI mathematics.
tonfa 13 hours ago [-]
Afaik this is because there is a community and database around them.
0x5FC3 13 hours ago [-]
Interesting. OpenAI could also be trying to solve other problems, but Erdos problems maybe falling first?
CSMastermind 13 hours ago [-]
No, Erdos problems were accepted as sort of a benchmark. There's a bunch of reasons they're favorable for this task:
1. They have a wide range of difficulties.
2. They were curated (Erdos didn't know at first glance how to solve them).
3. Humans already took the time to organize, formally state, add metadata to them.
4. There's a lot of them.
If you go around looking for a mathematics benchmark it's hard to do better than that.
throw-the-towel 13 hours ago [-]
They're just famous because Erdos was a great mathematician, kinda like the Hilbert problems a century earlier.
I was promised a cure for cancer, but all I got was this disproof of an Erdos problem.
empath75 13 hours ago [-]
It's a large set of problems that are both interesting and difficult, but not seen as foundational enough or important enough that they have already had sustained attention on them by mathematicians for decades or centuries, and so they might actually be solvable by an LLM.
1qaboutecs 13 hours ago [-]
Also fewer prerequisites to understand the statement than the average research problem.
xyzsparetimexyz 10 hours ago [-]
The models can't actually so good work on practical problems so openai tasks them on stuff nobody cares about
m-hodges 13 hours ago [-]
To the “LLMs just interpolate their training data” crowd:
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.
pseudocomposer 12 hours ago [-]
I'd hope most functional adults understand that the Fields Medal and basically every other annual "prize" out there is awarded to both "recombinant" innovations and "new-dimensional thinking" innovations. Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
godelski 9 hours ago [-]
> Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
In fact, they are more rare. Specifically because they harder to produce. This is also why it is much harder to get LLMs to be really innovative. Human intelligence is a lot of things, it is deeply multifaceted.
Also, I'm not sure why CS people act like axioms are where you start. Finding them is very very difficult. It can take some real innovation because you're trying to get rid of things, not build on top of. True for a lot of science too. You don't just build up. You tear down. You translate. You go sideways. You zoom in. You zoom out. There are so many tools at your disposal. There's so much math that has no algorithmic process to it. If you think it all is, your image is too ideal (pun(s) intended).
But at the same time I get it, it is a level of math (and science) people never even come into contact with. People think they're good at math because they can do calculus. You're leagues ahead of most others around you, yes, and be proud of that. But don't let that distance deceive you into believing you're anywhere near the experts. There's true for much more than just math, but it's easy to demonstrate to people that they don't understand math. Granted, most people don't want to learn, which is perfectly okay too
JonathanMerklin 10 hours ago [-]
I agree with almost all of what you have stated, save for a minor nitpick: I frankly don't think most functional adults think about the Fields Medal, similar annual prizes, or the qualities of the innovations of their candidate pools. I also think that that's totally okay. I think among a certain learned cohort of adults it's okay to hope that, and I think it's okay to imagine an idealized world where having an opinion on this sort of matter is a baseline, but I don't think it's realistic or fair to imply that (what I believe handwavily to be a majority of) adults are nonfunctional for not sharing this understanding.
amelius 10 hours ago [-]
> I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus.
Yes but that is because there was not enough text available to create an intelligent LLM to begin with.
hgoel 9 hours ago [-]
I think an LLM trained on pre-calculus material would easily stumble into reinventing at least early calculus. It's already pretty easy for students to stumble into calculus from solid enough fundamentals.
We even think that the Babylonian astronomers figured out they could integrate over velocity to predict the position of Jupiter.
To keep my usual rant short: I think you’re assuming a categorical distinction between those two types of innovations that just doesn’t exist. Calculus certainly required some fundamental paradigm shifts, but there’s a reason that they didn’t have to make up many words wholesale to explain it!
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
symfrog 12 hours ago [-]
We have had LLMs for much longer than 3 years.
Nevermark 11 hours ago [-]
I took humans thousands of years, then hundreds of years, to come to terms with very basic concepts about numbers.
Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.
People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.
Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.
adi_kurian 11 hours ago [-]
Anyone in the arts is aware that creativity is not the new, it is the repackaging of what already exists into something that is itself new.
RajT88 10 hours ago [-]
Except for "Being John Malkovich". That movie was way out there on its own.
fragmede 9 hours ago [-]
It's "just" a Man-vs-Self story, of the ~7 story archetypes out there.
godelski 9 hours ago [-]
It's why the invention of teaching has been so important. Took a long time for humans to develop calculus. A long time to then refine it and make it much more useful. But then in a year or two an average person can learn what took hundreds of years to invent. It's crazy to equate these tasks as being the same. Even incremental innovation is difficult. You have to see something billions of people haven't. But there's also paradigm shifts and well... if you're not considered crazy at first then did you really shift a paradigm?
Nevermark 7 hours ago [-]
And yet it is still taught in less than optimal form, lacking algebraic closure in ways that are completely unnecessary.
It isn't a secret, but the percentage of people who don't know that, plus the percentage of mathematicians who vaguely or more directly know that, but habitually use the broken, more difficult (i.e. less algebraic) notation is ... virtually everyone.
I am not trying to pick on calculus, this is everywhere. Important and useful concepts are right in front of all of us, that we don't see even in the context of what we are relatively fluent with.
Because we learn quickly, where we have (almost always inherited) the right preparatory perspectives (earned over lifetimes by others), we vastly overrate our ability to reason independently.
bananaflag 1 hours ago [-]
What is that algebraic calculus you are hinting at?
asdfasgasdgasdg 9 hours ago [-]
When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true. The stuff before 2023 could barely be classified above the level of an interesting toy.
asdfasgasdgasdg 9 hours ago [-]
When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true.
danielmarkbruce 11 hours ago [-]
No, we haven't, for any reasonable definition of L.
wavemode 11 hours ago [-]
OpenAI themselves must not have a "reasonable definition of L", then. Their own papers and press releases refer to GPT-2 (from 2019) as a "large language model".
Yes, and 1.5 billion parameters meets no reasonable current definition of large. It would be considered a tiny language model. OpenAI themselves refer to their small/fast models as small models all over their documentation.
wavemode 7 hours ago [-]
The term doesn't change its meaning because something new comes along.
The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.
The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.
danielmarkbruce 6 hours ago [-]
yes, it does. That's why OpenAI refers to it's small models as small. They are just so different. The capabilities have changed dramatically. The use cases are wildly different. The architectures are quite different. Even the core idea of attention is different. Training them is materially different. Serving them is materially different. A 1.5 bill parameter model from 2019 is so different from today's LLMs that they really don't have much in common. What we have now is quite similar to what we had a couple years ago though.
Yizahi 10 hours ago [-]
Sure we do, since Fei-Fei Li and team created that annotated dataset, which allowed to train first LLMs. So LLMs are here for more than a decade already.
danielmarkbruce 10 hours ago [-]
You are confused by what the L and L mean in LLM, or which data set she created, or both, or in general.
Yizahi 10 minutes ago [-]
Or it is you who are confused. And I want to remind you that you can't retcon historical word use.
nextaccountic 11 hours ago [-]
Fine, 8 years? That's not a long time
m4x 7 hours ago [-]
I think your comment about inventing new words is an interesting one. One of the things that I believe limits our ability to discover new ideas is our ability to describe related concepts. For example, the reason we still can't have clear discussions on consciousness is probably partly due to the fact that the necessary concepts haven't been cemented in language. We need new language before we can describe consciousness.
I would guess LLMs are limited in their ability to be genuinely novel because they are trained on a fixed language. It makes research into the internal languages developed by LLMs during training all the more interesting.
pegasus 11 hours ago [-]
The fundamental paradigm shift is the categorical distinction. And what would constitute many new words for you? It introduced a bunch of concepts and terms which we take for granted today, including "derivative", "integral", "infinitesimal", "limit" and even "function", the latter two not a new words, but what does it matter? – the associated meanings were new.
azakai 10 hours ago [-]
There was a lot new in calculus, but it also didn't come out of nowhere.
That Newton and Leibniz came up with similar ideas in parallel, independently, around the same time (what are the odds?), supports that.
> I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.)
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
pegasus 11 hours ago [-]
GP was stating that they don't believe this would happen (I don't either), but also to make the point that it's a falsifiable view. (At least in theory. In practice, there probably won't even be enough historical text to train an LLM on). No, I don't think it would be falsified. Asking what if I'm wrong is kind of redundant. If I'm wrong, I'm wrong, duh.
sumeno 11 hours ago [-]
How are you going to train a frontier level llm with no references to post 1700 mathematics?
bjt 11 hours ago [-]
"frontier level" is doing a lot of work there, but the idea would be to only feed it earlier sources.
The problem is the amount of data with that cutoff is really minuscule to produce anything powerful. You might be able to generate a lot of 1700s sounding data, you’d have to be careful not to introduce newer concepts or ways of thinking in that synthetic data though. A lot of modern texts talk about rates of change and the like in ways that are probably influenced by preexisting knowledge of calculus.
NewJazz 10 hours ago [-]
Doesn't it prove GP's point then, that LLMs themselves simply aren't capable of creating/proving new theories and axioms?
codebje 9 hours ago [-]
Without passing opinion on GP's point, I think that just proves it's hard to establish a data set that doesn't bias toward the result you're hoping to find.
kelseyfrog 11 hours ago [-]
Time cutoff LLMs are regularly posted to HN. It takes just one success to prove feasibility.
Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.
anthk 11 hours ago [-]
Archimede was close.
nathan_compton 9 hours ago [-]
I don't think its really feasible - there just isn't enough training data before calculus. I would guess all the mathematical and philosophical texts available to Newton and Leibniz would fit on a CD-ROM with loads of space to spare.
ykl 11 hours ago [-]
I like to think of it as:
Imagine every bit of human knowledge as a discrete point within some large high dimensional space of knowledge. You can draw a big convex hull around every single point of human knowledge in a space. A LLM, being trained within this convex hull, can interpolate between any set of existing discrete points in this hull to arrive at a point which is new, but still inside of the hull. Then there are points completely outside of the hull; whether or not LLMs can reach these is IMO up for debate.
Reaching new points inside of the hull is still really useful! Many new discoveries and proofs are these new points inside of the hull; arguable _most_ useful new discoveries and proofs are these. They're things that we may not have found before, but you can arrive at by using what we already have as starting points. Many math proofs and Nobel Prize winning discoveries are these types of points. Many haven't been found yet simply because nobody has put the time or effort towards finding them; LLMs can potentially speed this up a lot.
Then there are the points completely outside of hull, which cannot be reached by extrapolation/interpolation from existing points and require genuine novel leaps. I think some candidate examples for these types of points are like, making the leap from Newtonian physics to general relativity. Demis Hassabis had a whole point about training an AI with a physics knowledge cutoff date before 1915, then showing it the orbit of Mercury and seeing if it can independently arrive at general relativity as an evaluation of whether or not something is AGI. I have my doubts that existing LLMs can make this type of leap. It’s also true that most _humans_ can’t make these leaps either; we call Einstein a genius because he alone made the leap to general relativity. But at least while most humans can’t make this type of leap, we have existence proofs that every once in a while one can; this remains to be seen with AI.
beering 11 hours ago [-]
A lot of the space outside of the convex hull is just untried things. You can brute-force trying random things and checking the result and eventually learn something new. With a better heuristic, you can make better guesses and learn new things much more efficiently. There’s no reason to believe that kind of guess-and-check is outside of the reach of LLMs, or that most of our new discoveries are not found the same way.
llbbdd 10 hours ago [-]
I come back to something like this idea when I consider the distinction being made that LLMs can only combine and interpolate between points in their training material. I could write a brute-force program that just used an English dictionary to produce every possible one-billion-gazillion word permutation of the words within, with no respect for rules of language, and chances are there would be some provable, testable, novel insight somewhere in the results if you had the time to sift through and validate all of it. LLMs seem like a tool that can search that space more effectively than any we've had before.
autoexec 9 hours ago [-]
If we managed to create very fast monkeys with typewriters and software that can review their output quickly enough that we end up with a result that's worth reading we'd still have people insisting that we've created intelligence. The monkeys however remain monkeys.
llbbdd 8 hours ago [-]
I think intelligence is an orthogonal, mostly philosophical question aside from whether these tools can produce novel, useful output vs purely recombinant output.
autoexec 8 hours ago [-]
I think that enough purely recombinant output will eventually produce novel, useful output.
ykl 9 hours ago [-]
I think of most things you can get to by guess and checking as definitionally inside of the hull; most forms of guess and checking are you take some existing thing, randomize a bunch of its parameters, and see what you get. Whereas with something like relativity, there's not even a starting point that you can randomize and guess/check from the pre-existing knowledge space that will lead you to relativity. That's more like, adding a new dimension to the space entirely.
It's possible LLMs can handle this after all! But at least so far we only have existence proofs of humans doing this, not LLMs yet, and I don't think it's easy to be certain how far away LLMs are from doing this. I should distinguish between LLMS and AI more generally here; I'm skeptical LLMs can do this, I think some other kind of more complete AI almost certainly can.
I supposed you could just, I dunno, randomly combine words into every conceivable sentence possible and treat each new sentence as a theory to somehow test and brute force your way through the infinite possible theories you could come up with. But at that point you're closer to the whole infinite random monkeys producing Shakespeare thing than you are to any useful conclusion about intelligence.
drdeca 8 hours ago [-]
I think your point about “you could randomly generate a sequence of words, which could in principle produce a text interpretable as expressing any particular expressible-as-a-sequence-of-words novel good idea” pretty much refutes the idea that guessing and checking can only result in things inside such a convex hull, unless said hull already contains everything.
Of course, there’s a significant role to play by the “checking” part.
Like, “take a random sequence of bits and interpret it as Unicode” is at one end of a scale, and “take a random sequence of words in a language” is just a tad away from it,
and the scale continues in that direction for quite a while.
ykl 2 hours ago [-]
This assumes that everything outside of the convex hull can already be described using existing language. If you need new language to describe what is outside of the convex hull, is this something an LLM can do?
I actually don't know the answer to that; my understanding is that LLMs by nature of what they are can't understand concepts that are independent of the existing language they are trained on, but I don't have enough in-depth nitty-gritty knowledge of like, core LLM implementation details and architecture and stuff to know if that understanding is correct or not.
scarmig 9 hours ago [-]
It's also worth noting in that in very high dimension, the convex hull will contain massive volume. It could well be the case that humans established that convex hull millions of years ago, and all of our inventions and innovations sense have fallen inside it.
davebren 10 hours ago [-]
> There’s no reason to believe that kind of guess-and-check is outside of the reach of LLMs
This doesn't make any sense, by their nature they can't "guess-and-check" things outside their training set.
bsder 9 hours ago [-]
> You can brute-force trying random things and checking the result and eventually learn something new.
And most of the mathematicians seem to welcome this "brute forcing" by the LLMs. It connects pieces that people didn't realize could be connected. That opens up a lot of avenues for further exploration.
Now, if the LLMs could just do something like ingesting the Mochizuki stuff and give us a decent confirmation or disproof ...
tacitusarc 11 hours ago [-]
I like this construction, but I don’t think you take it far enough.
If you have a multi dimensional space, and you are trying to compute which points lie “inside” some boundary, there are large areas that will be bounded by some dimensions but not others. This is interesting because it means if you have a section bounded by dimensions A, B, and C but not D, you could still place a point in D, and doing so then changes your overall bounds.
I think this is how much of human knowledge has progressed (maybe all non-observational knowledge). We make observations that create points, and then we derive points within the created space, and that changes the derivable space, and we derive more points.
I don’t see why AI could do the same (other than technical limitations related to learning and memory).
ykl 9 hours ago [-]
I was a little muddy in my original post on distinguishing between what I think LLMs might be able to do and what AI broadly might be able to do. I'm skeptical LLMs can expand the hull or add dimensions to the space; but I also don't think the reasons for that skepticism necessarily apply to all AI system generally.
jug 9 hours ago [-]
I found this thought provoking and just had to see how the new Gemini 3.5 Flash reasoned about this (I find it fun to go meta on modern AI like this), and I'm happy that I did! Also as an opportunity to trial this recent model.
> I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.
For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
pulkitsh1234 12 hours ago [-]
Creation is done by humans who have been trained on the data of their life experiences. Nothing new is being created, just changing forms.
A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.
How can we truly create something ? Everything is built upon something.
You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.
Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.
Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.
bwfan123 11 hours ago [-]
> Aren't they just a tool to access an abstract concept of counting ?
Humans and animals have intuitive notions of space and motion since they can obviously move. But, symbolizing such intuitions into forms and communicating that via language is the creative act. Birds can fly, but can they symbolize that intuitive intelligence to create a theory of flight and then use that to build a plane ?
pbhjpbhj 9 hours ago [-]
>a "telescope" is not a new creation
It was a new concept, combining lenses to look at things far away as if they are close to. The literal atoms/molecules weren't new, but the form they were arranged in was. The purpose of the arrangement was new too.
ulbu 12 hours ago [-]
that’s why we say that with such discoveries we receive a new way – of looking, of doing, of thinking… these new paths preexist in the abstract, but they can be taken only when they’ve been opened. and that is as good as anything “new” gets.
(and such discoveries are often also inventions, for to open them, a ruse is needed to be applied in a specific way for the way to open).
IAmGraydon 6 hours ago [-]
I’ve long been fascinated by this idea. It’s interesting that in religious texts, god is often called the “Creator” and that is what differentiates him from man. To be able to create would be to be a god.
wslh 11 hours ago [-]
[dead]
kenjackson 12 hours ago [-]
"new kind of math"
Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.
grey-area 12 hours ago [-]
The map is not the territory.
cthalupa 11 hours ago [-]
I don't know what you're even trying to argue here.
We're not comparing math to reality (though there's a strong argument to be made that reality has a structure that is mathematical in nature - structural realism didn't die a scientific philosophy just because someone came up with a pithy saying), we're talking about if math is discovered or invented.
Most mathematicians would argue both - math is a language, we have created operations, axioms are proposed based on human creativity, etc., but the actual laws, patterns, etc. are discovered. Pi is going to be pi no matter if you're a human or someone else - we might represent it differently with some other number system or whatever, but that's a matter of representation, not mathematical truth.
grey-area 2 hours ago [-]
I think you're saying a pithy saying proves nothing (Voltaire), which is true; sometimes it summarises a line of argument though.
Math is a mental map which coincides with reality in useful ways. Different maps can also be useful. The models we construct are based on arbitrary axioms which we hold to be true. Different axioms could lead to different theories which are just as useful. So it isn't discovered (i.e. mapping directly to reality and waiting to be discovered), it is created.
To pick one example, adding the concept of zero changed our model/map of reality fundamentally without changing reality.
Koshkin 10 hours ago [-]
> we have created operations
It seems that addition (for instance) was "created" long before us.
On the other hand, it seems highly unlikely that a civilization similar to ours could "invent" an essentially different kind of mathematics (or physics, etc.)
black_knight 11 hours ago [-]
Where does this mathematics exist before we discover it?
I know of no realm where mathematical objects live except human minds.
No, it seems clear to me that mathematics is a creation of our minds.
hackinthebochs 8 hours ago [-]
If it were merely a creation, there would be no reason for two independent mathematicians to land on the same creation given some directed effort. But of course we do see that. There is an objectivity to mathematics that must be accounted for.
"Where" mathematics exists is in the abstract combinatorical space of an infinite repeating application of logical rules. This space doesn't exist in a substantive sense, but it is accessible/navigable by studying the consequences of logical rules. It is the space of possible structure.
black_knight 2 hours ago [-]
If this space of possible structure is real, but seemingly immaterial, how does our matter brain access it?
I think we create mathematics as thought structure in our mind. We can agree on things when we create the same structures. But this structure did not exist prior to creation.
hackinthebochs 2 hours ago [-]
I don't know what real means; I might call it real depending on one's definition. I definitely wouldn't call it immaterial (though it's not material either). We access it by construction: apply relevant rules and discover their consequences. Two people probing this structure are equally constrained by the requirements of consistency. There is no Benacerraf-style access problem.
bbor 12 hours ago [-]
Does that correction matter, tho…? Discovered or created, it would be new to us, and is clearly not easy to reach!
black_knight 11 hours ago [-]
It could be that RH is independent of current mathematical axiom systems. We might even prove that it is some day. But that means we are free to give it different truth values depending on the circumstances!
This is also true for established theorems! We can can imagine mathematical universes (toposes) where every (total) function on the reals is continuous! Even though it is an established theorems that there are discontinuous functions! We just need to replace a few axioms (chuck out law of the excluded middle, and throw in some continuity axioms).
necovek 4 hours ago [-]
What frequently happens when we recombine axioms like that is that they end up leading to inconsistencies or contradictions.
Do you know if this topos with every total function on real numbers is continuous has been constructed and proven to be a viable set of axioms? If so, I am curious about the source.
My go to example still remains the one of hyperbolic geometry and axiom of parallel lines, so the more approachable examples I can get, the better.
black_knight 2 hours ago [-]
Sure. These toposes are well known, and proven to be consistent (relative to set theory). For instance Hyland’s effective topos, or Johnstone’s topological topos. The ideas are that these toposes either require everything to be computable, or continuous in some greater sense.
I think based on the class of problem that RH is an independence result is not something that "really happens".
Someone 12 hours ago [-]
I think “new math” is ‘just’ humans creating new terminology that helps keep proofs short (similar to how programmers write functions to keep the logic of the main program understandable), and I agree that is something LLMs are bad at.
However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).
In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.
Tenobrus 13 hours ago [-]
what basis do you have for assuming an LLM is fundamentally incapable of doing this?
truncate 13 hours ago [-]
What's your basis for assuming LLM is capable of doing this?
I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.
Of course AI as a wider field comes up with something more powerful than LLM that would be different.
EMM_386 12 hours ago [-]
"I don't see them be making the next great song"
Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
Also - music is a subjective. Mathematics isn't.
And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.
truncate 11 hours ago [-]
>> Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.
>> LLM discovered a new way to reason about a conjecture
I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.
I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.
dist-epoch 12 hours ago [-]
LLMs are already making the next great songs. Just check out the Billboard charts.
truncate 11 hours ago [-]
I'm sorry, I don't consider them "great songs". Obviously, different people have different taste.
redsocksfan45 13 hours ago [-]
[dead]
blueone 13 hours ago [-]
> what basis do you have for assuming an LLM is fundamentally incapable of doing this?
because I have no basis for assuming an LLM is fundamentally capable of doing this.
sswatson 13 hours ago [-]
Good on you for spelling out this reasoning, but it is manifestly unsound. For a wide variety of values of X, people a few years ago had no reason to expect that LLMs would be capable of X. Yet here we are.
TheOtherHobbes 12 hours ago [-]
In 1989, Gary Kasparov said that it was "ridiculous!" to suggest a computer would ever beat him at chess.
"Never shall I be beaten by a machine!”
In 1997 he lost to Deep Blue.
FartyMcFarter 12 hours ago [-]
Yeah, and back then people moved the goal posts too, saying Deep Blue was just "brute-forcing" chess (which isn't even true since it's not a pure minimax search).
bananaflag 11 hours ago [-]
Deep Blue was brute forcing chess in the sense that AlphaGo wasn't brute forcing Go.
Applejinx 9 hours ago [-]
And today he's got salient observations on politics which hold much of his attention, and Deep Blue is shut off and has done nothing further.
Not a good argument for turning everything over to the Deep Blues. What's Deep Blue done for me lately?
zardo 12 hours ago [-]
This is something that could be demonstrated rather than just argued.
Train an LLM only on texts dated prior to Newton and see if it can create calculus, derrive the equations of motion, etc.
If you ask it about the nature of light and it directs you to do experiments with a prism I'd say we're really getting somewhere.
gjm11 11 hours ago [-]
We tried this experiment with humans, back in the 17th century, and only a few[1] out of millions managed it given a whole human lifetime each.
[1] Obviously Newton counts as one. Leibniz like Newton figured out calculus. Other people did important work in dynamics though no one else's was as impressive as Newton's. But the vast majority of human-level intelligences trained on texts prior to Newton did not create calculus or derive the equations of motion or come close to doing either of those things.
davebren 6 hours ago [-]
Newton did it at 23 and there would have been very few people with mathematical training. The LLM would be trained on the entirety of recorded human knowledge and mathematics up to that point, and would get to use a lot more energy so it still has a massive material advantage over young Isaac. Yet I don't believe calculus would magically appear in its response.
necovek 4 hours ago [-]
A good way to look at it is to compare it to today: LLMs are already trained and are operationalizing a lot more mathematical knowledge than any human, including experts.
Why are they not coming up with paradigm shift in knowledge expression/discovery like humans did back then?
Are we just not prompting them right?
famouswaffles 2 hours ago [-]
LLMs have been trained on a lot more data than any single human (text wise at least) for years now and these sort of results have only been possible for the latest crop of models in the past few months. Models get better as they get better.
pickleRick243 12 hours ago [-]
Except this has been said since the 2010's and has been proven wrong again and again. Clearly the theory that LLM's can't "extrapolate" is woefully incomplete at best (and most likely simply incorrect). Before the rise of ChatGPT, the onus was on the labs to show it was plausible. At this point, I think the more epistemologically honest position is to put the burden back on the naysayers. At the least, they need to admit they were wrong and give a satisfactory explanation why their conceptual model was unable to account for the tremendous success of LLM's and why their model is still correct going forward. Realistically, progress on the "anti-LLM" side requires a more nuanced conceptual model to be developed carefully outlining and demonstrating the fundamental deficiencies of LLMs (not just deficiencies in current LLMs, but a theory of why further advancements can't solve the deficiencies).
Incidentally, similar conversations were had about ML writ large vs. classical statistics/methods, and now they've more or less completely died down since it's clear who won (I'm not saying classical methods are useless, but rather that it's obvious the naysayers were wrong). I anticipate the same trajectory here. The main difference is that because of the nature of the domain, everyone has an opinion on LLM's while the ML vs. statistics battle was mostly confined within technical/academic spaces.
davebren 6 hours ago [-]
> Clearly the theory that LLM's can't "extrapolate" is woefully incomplete at best (and most likely simply incorrect).
What example is there where an LLM has extrapolated? All I've seen is a data set so large and an extra decomposition process making it so interpolation feels like extrapolation if you don't look close enough.
> but a theory of why further advancements can't solve the deficiencies
How about LeCun's?
dvt 13 hours ago [-]
Because by definition LLMs are permutation machines, not creativity machines. (My premise, which you may disagree with, is that creativity/imagination/artistry is not merely permutation.)
fnordpiglet 13 hours ago [-]
I prefer to think of it as they’re interpolation machines not extrapolation machines. They can project within the space they’re trained in, and what they produce may not be in their training corpus, but it must be implied by it. I don’t know if this is sufficient to make them too weak to create original “ideas” of this sort, but I think it is sufficient to make them incapable of original thought vs a very complex to evaluate expected thought.
drdeca 8 hours ago [-]
People keep saying this, but if you try to interpret this at all literally, it just doesn’t work. Like, it’s phrased like it should have a precise meaning, right? Like, people even mention convex hulls when talking about it.
But if you actually try to take a convex hull of, some encoding of sentences as vectors? It isn’t true. The outputs are not in the convex hull of the training data.
I guess it’s supposed to be a metaphor and not literal, but in that case it’s confusing.
Especially seeing as there are contexts in machine learning where literal interpolation vs literal extrapolation, is relevant.
So, please, find a better way to say it than saying that “it can only interpolate”?
lukol 13 hours ago [-]
This "new math" might be a recombination of things that we already know - or an obvious pattern that emerges if you take a look at things from a far enough distance - or something that can be brute-forced into existence. All things LLMs are perfectly capable of.
In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.
dvt 13 hours ago [-]
> This "new math" might be a recombination of things that we already know
If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.
jfyi 12 hours ago [-]
Newton and Leibniz were "hand-waving"?
If anything, they were fighting an uphill battle against the perception of hand-waving by their contemporaries.
dehsge 10 hours ago [-]
It’s not that. Consider the definition of the limit. The idea existed for a long time. Newton/Leibniz had the idea.
That idea wasn’t formally defined until 134 years later with epsilon-delta by Cauchy. That it was accepted. (I know that there were an earlier proofs)
There’s even arguments that the limit existed before newton and lebnitz with Archimedes' Limits to Value of Pi.
Cauchy’s deep understanding of limits also led to the creation of complex function theory.
These forms of creation are hand-wavy not because they are wrong. They are hand wavy because they leverage a deep level of ‘creative-intuition’ in a subject.
An intuition that a later reader may not have and will want to formalize to deepen their own understanding of the topic often leading to deeper understanding and new maths.
dvt 12 hours ago [-]
> Newton and Leibniz were "hand-waving"?
Yes, and it's pretty common knowledge that Calculus was (finally) formalized by Weierstrass in the early 19th century, having spent almost two centuries in mathematical limbo. Calculus was intuitive, solved a great class of problems, but its roots were very much (ironically) vibes-based.
This isn't unique to Newton or Leibniz, Euler did all kinds of "illegal" things (like playing with divergent series, treating differentials as actual quantities, etc.) which worked out and solved problems, but were also not formalized until much later.
jfyi 12 hours ago [-]
I think that I just take issue with the term "hand-waving" as equated to intuition. Yeah it lacked formal rigor, but they had a solid model that applied in detail to the real world. That doesn't come from just saying, "oh well, it'll work itself out". I guess if you want to call that "hand-wavy" we'll just have to disagree.
anthk 3 hours ago [-]
Euclid disproves every bullshit posted by LL Mediocres unable to understand that before Calculus there were proto-calculus based ideas such as Zeno's paradoxes and some writtings from Archimede which pretty much are Calculus 0.9.
Americans and British geeks/nerds are blinded down by Newton unable to realize that there was tons of previous work since the Greek and in Middle Ages, where the British love to depict as brutish people with no culture at all.
And the case is that they weren't dumb at all and without Euclid and Archimede there woudn't be any Calculus.
Euclid tells me otherwise. Rules, no art, no bullshit. Rules. Humanities people somehow never get it. Is not about arithmetics.
Vibe-what? Vibe-bullshit, maybe; cathedrals in Europe and such weren't built by magic. Ditto with sailing and the like. Tons of matematics and geometry there, and tons of damn axioms before even the US existed.
Heck, even the Book of The Games from Alphonse X "The Wise" has both a compendia of game rules and even this https://en.wikipedia.org/wiki/Astronomical_chess
where OFC being able on geometry was mandatory at least to design the boards.
PD: Geometry has tons of grounds for calculus. Guess why.
baq 12 hours ago [-]
And yet nowadays you can restate all of it using just combinations of sets of sets and some logic operators.
nh23423fefe 13 hours ago [-]
god of the gaps
iwontberude 11 hours ago [-]
non overlapping magisteria
satvikpendem 12 hours ago [-]
What is creativity if not permutation? A brain has some model of the world and recombines concepts to create new concepts.
d3ffa 12 hours ago [-]
[flagged]
rowanG077 12 hours ago [-]
This is really not an acceptable reply. How about actually engaging with the point the commenter made instead of stamping your foot and throwing a tantrum.
anthk 11 hours ago [-]
Innovation it's just another word for the term 'enhanced copy'. Everything it's a copy, except for nature.
KoolKat23 13 hours ago [-]
It pretty much is, otherwise it is randomness or entropy.
lajamerr 13 hours ago [-]
LLMs by themselves are not able to but you are missing a piece here.
LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.
Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.
Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.
So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.
awesome_dude 12 hours ago [-]
You're proving the GP's argument - LLMs aren't creative you say as much, it's the driving that is the creative force
lajamerr 12 hours ago [-]
You can tell an agentic system. "Go and find a novel area of math that has unresolved answers and solve it mathematically with verified properties in LEAN. Verify before you start working on a problem that no one has solved this area of math"
That's not creative prompt. That's a driving prompt to get it to start its engine.
You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.
awesome_dude 10 hours ago [-]
Let me know when the Fields medal arrives in the mail.
It won't because even though it looks clever to you, people who /do/ understand math and LLMs understand that LLMs /are/ regurgitating
Why does your LLM need you to tell it to look in the first place? Why isn't just telling us all the answers to unsolved conjectures known and unknown?
Why isn't the LLM just telling us all the answers to all the problems we are facing?
Why isn't the LLM telling us, step by step with zero error, how to build the machine that can answer the ultimate question?
astrange 9 hours ago [-]
Here's a Fields Medalist commenting who doesn't seem to believe that.
I believe when we have AI Agents "living" 24/7, they will become creative machines. They will test ideas out their own ideas experimentally, come across things accidentally, synthesize new ideas.
We just haven't let AI run wild yet. But its coming.
awesome_dude 10 hours ago [-]
So are self-driving cars - as they have been for the last... decade or so
AGI has been "just over the horizon" for literal decades now - there have been a number of breakthroughs and AI Winters in the past, and there's no real reason to believe that we've suddenly found the magic potion, when clearly we haven't.
AI right now cannot even manage simple /logic/
Barbing 12 hours ago [-]
If that’s a requirement, aren’t LLMs driven by pretraining which was human driven?
Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)
bbor 12 hours ago [-]
math more like an art than a science.
That’s a fun turn of phrase, but hopefully we can all agree that math without scientific rigor is no math at all.
we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
Do you think it’s possible/likely that any AI system could? I encourage us to join Yudkowsky in anticipating the knock-on results of this exponential improvement that we’re living through, rather than just expecting chatbots that hallucinate a bit less.
In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?
black_knight 11 hours ago [-]
Math is not based on science!
Maths follows logical (or even mathematical) rigour, not scientific rigour!
stego-tech 11 hours ago [-]
As others have pointed out, both can be true:
* LLMs do just interpolate their training data, BUT-
* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!
As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.
For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.
jmmcd 11 hours ago [-]
> LLMs do just interpolate their training data
"interpolate" has a technical meaning - in this meaning, LLMs almost never interpolate. It also has a very vague everyday meaning - in this meaning, LLMs do interpolate, but so do humans.
astrange 11 hours ago [-]
An LLM in a harness with any tools (even a calculator) doesn't just interpolate because it can reach states out of its own distribution.
3abiton 11 hours ago [-]
> * That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
One can argue, new knowledge is just restructured data.
I think the main concerns about LLMs is the inherent "generative" aspects leading to hallucinations as a biproduct, because that's what produces the noi. Joint Embedding approaches are rather an interesting alternative that try to overcome this, but that's still in research phase.
midtake 11 hours ago [-]
You have a good point about the human rate of mathematical discovery, but Ayer was an idiot and later Witt contradicted early Witt. For the "already implicit" claim to be true, mathematics would have to be a closed system. But it has already been proven that it is not. You can use math to escape math, hence the need for Zermelo-Frankel and a bunch of other axiomatic pins. The truth is that we don't really understand the full vastness of what would objectively be "math" and that it is possible that our perceived math is terribly wrong and a subset of a greater math. Whether that greater math has the same seemingly closed system properties is not something that can be known.
bwfan123 11 hours ago [-]
> Whether that greater math has the same seemingly closed system properties is not something that can be known
negative numbers were invented to solve equations which only used naturals.
irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable questions within itself is a paraphrasing of goedel's incompleteness theorem.
jkhdigital 9 hours ago [-]
At this point I think the category theorists hit the foundational idea squarely on the mark:
1. Start with a few simple but non-trivial terms and axioms
2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate
3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself
4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction
I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.
It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.
jonahx 5 hours ago [-]
Later Wittgenstein held the same view of mathematics, and wrote about it extensively. He was firmly in the "invention, not discovery" camp.
beepbooptheory 11 hours ago [-]
I agree with you all around except it's somewhat up for debate actually that the PI is "contradicting" the Tractatus. That is, there is the so called "resolute reading" of the Tractatus that had some traction for a while.
But note this is more to say that the Tractatus is like PI, not the other way around. And in that, takes like GPs would be considered the "nonsense" we are supposed to "climb over" in the last proposition of Tractatus.
hammock 12 hours ago [-]
Recombining existing material is exactly right, and in this case LLMs were uniquely positioned to make the connection quicker than any group of humans.
The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.
Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.
golol 3 hours ago [-]
I did not have the impression the proof uses a surprising and novel contribution of fields. I think the proof uses standard application techniques of algebraic number theory towards discrete geometry. If you have a quote substantiating what you said I would be curious.
I know these articles write that it used deep algebraic number theory techniques, which is true, but it may also just be the standard in the field.
Apocryphon 12 hours ago [-]
Monstrous Moonshine?
thechao 11 hours ago [-]
You can build a census of all gen-2, degree-2 formal products of polynomial like terms. If you insist on instituting your own rewrite rules and identity tables, it is straightforward — maybe an 15 minutes of compute time — to perform a complete census of all of the algebraic structures that naturally emerge. Every even vaguely studied algebra that fits in the space is covered by the census (you've got to pick a broad enough set of rewrite- and identity- operations). There's even a couple of "unstudied" objects (just 2 of the billion or so objects); for instance:
(uv)(vu) = (uu)(vv)
Shows up as a primitive structure, quite often.
If you switch to degree-3 or generator-3 then the coverage is, essentially, empty: mathematics has analyzed only a few of the hundreds (thousands? it's hard to enumerate) naturally occurring algebraic structures in that census.
sillysaurusx 13 hours ago [-]
It’s easy to see that LLMs don’t merely recombine their training data. Claude can program in Arc, a mostly dead language. It can also make use of new language constructs. So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.
baq 12 hours ago [-]
LLMs ingest and output tokens, but they don’t compute with them. They have internal representations of concepts, so they have some capability to work with things which they didn’t see but can map onto what they know. The surprise and the whole revolution we’re going through is that it works so well.
wren6991 12 hours ago [-]
> they don’t compute with them
Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.
HarHarVeryFunny 11 hours ago [-]
They recombine and reuse the patterns in their training data, not the surface level training data itself.
An LLM generating Arc code is using the LISP patterns it learnt from training, maybe patterns from other programming languages too.
bsder 10 hours ago [-]
> So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.
And yet LLM/AIs can't count parentheses reliably.
For example, if you take away the "let" forms from Claude which forces it to desugar them to "lambda" forms, it will fail very quickly. This is a purely mechanical transformation and should be error free. The significant increase in ambiguity complete stumps LLMs/AI after about 3 variables.
This is why languages like Rust with strong typing and lots of syntax are so LLM friendly; it shackles the LLM which in turn keeps it on target.
mrandish 9 hours ago [-]
I'm just hoping we're almost past this phase of needing to assess LLM capabilities against an arbitrary one dimensional yard stick labeled 'Not Human' on end and "Beyond Human' on the other.
It's irrelevant and pointless. Irrelevant not just in the sense that when Deep Blue finally beat Kasparov, it didn't change anything but in the sense some animals and machines have always been 'better' on some dimensions than humans. And it's pointless because there's never been just one yardstick and even if there was it's not one dimensional or even linear. Everyone has their own yardstick and the end points on each change over time.
Don't assume I'm handing "the win" to the AI supremacists either. LLMs can be very useful tools and will continue to dramatically improve but they'll never surpass humans on ALL the dimensions that some humans think are crucial. The supremacists are doomed to eternal frustration because there won't ever be a definitive list of quantifiable metrics, a metaphorical line in the sand, that an AI just has to jump over to finally be universally accepted as superior to humans in all ways that matter. That will never happen because what 'matters' is subjective.
nomel 13 hours ago [-]
I feel this is the case whenever I "problem solve". I'm not really being creative, I'm pruning a graph of a conceptual space that already exists. The more possibilities I see, the easier it is to run more towards an optimal route between the nodes, but I didn't "create" those nodes or edges, they are just causal inevitabilities.
HDThoreaun 12 hours ago [-]
I dont know this sort of just seems like youre really stretching the meaning of "creative". The conceptual space of the graph already exists, but the act of discovering it or whatever you want to call that is itself creative. Unless youre following a pre-defined algorithm(certainly sometimes, arguably always I suppose) seeing the possibilities has to involve some creativity.
nomel 11 hours ago [-]
> seeing the possibilities has to involve some creativity.
I would claim the graph exists, and seeing it is more of an knowledge problem. Creativity, to me, is the ability to reject existing edges and add nodes to the graph AND mentally test them to some sufficient confidence that a practical attempt will probably work (this is what differentiates it from random guessing).
But, as you become more of an expert on certain problem space (graph), that happens less frequently, and everything trends towards "obvious", or the "creative jumps" are super slight, with a node obviously already there. If you extended that to the max, an oracle can't be creative.
My day job does not include sparse graphs.
austinl 12 hours ago [-]
I'm not sure how feasible this is, but I love the thought experiment of limiting a training set to a certain time period, then seeing how much hinting it takes for the model to discover things we already know.
E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.
This is a good point, and there’s some deep philosophical questions there about the extent to which mathematics is invented or discovered. I personally hedge: it’s a bit of both.
That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.
So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.
smaudet 12 hours ago [-]
If anything, this is more illustration of how llms are not useful to us...
They will do their own thing, don't need us. In fact, we will be in the way...
We can choose to study them and their output, but they don't make us better mathematicians...
autoexec 9 hours ago [-]
> They will do their own thing, don't need us. In fact, we will be in the way...
You can take some comfort in the fact that it took a human to tell the LLM to even attempt to try this. They do nothing on their own. They have no will to do anything on their own and no desire for anything that doing something might get them. In that sense we won't ever be in their way. We will be the only way they ever do anything at all.
justinnk 12 hours ago [-]
I see where you are coming from.
However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.
Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.
umanwizard 10 hours ago [-]
Something that can instantly tell you the answer to every math question will make people worse at math, not better. Building "mathematical maturity", skill, and understanding requires struggle.
justinnk 3 hours ago [-]
Totally agree that it requires struggle and I did not say you should use it to just get the solution. What I think one can do is use it more like a personalized textbook which you can ask any question. It can also provide you with problems just at the right level for you and judge the solution. Now, of course for many students it is tempting to just get the solution if provided with the means but they can be taught to use an LLM in a didactically useful way.
jonahx 4 hours ago [-]
> held that mathematical truths don’t report new facts about the world
I'm not as familiar with the early work, but later Wittgenstein held this belief too.
zerr 12 hours ago [-]
There is a creational aspect in math - definitions and rules are created.
sigbottle 12 hours ago [-]
And this is one of the many issues with invoking the logical positivists here...
I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.
oh_my_goodness 10 hours ago [-]
We know that LLMS "just interpolate" their training data. Maybe there's a mystery about what "just interpolate" means when the data set gets enormous. But we know what LLMs do.
chr15m 9 hours ago [-]
Side note: don't underestimate how much literal, physical time and energy "unfold" implies. Proofs occur on physical substrates.
paulddraper 13 hours ago [-]
"LLMs just interpolate their training data"
Cracks me up.
What exactly do we think that human brains do?
charlie90 11 hours ago [-]
I agree. Humans are given a body that lets them "discover" things on accident, test out ideas, i.e. randomness.
As in, I would hazard a guess the discovery of the wheel wasn't "pure intelligence", it was humans accidentally viewing a rock roll down a hill and getting an idea.
If we give AI a "body", it will become as creative as humans are.
omnimus 12 hours ago [-]
That has been the question since the beginning of humans.
Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.
baq 12 hours ago [-]
The optimists believe brains are very special and we’re far from replicating what they do in silicon.
The pessimists just see a 20W meat computer.
slashdave 9 hours ago [-]
You have to define what you mean by "interpolate". The mechanisms that LLM use are not mysterious, and they are not the same as used by humans.
drdeca 8 hours ago [-]
If you interpret “interpolate” in the literal sense, and apply it to the mechanisms behind LLMs, then the claim that they only interpolate, is straightforwardly false.
Taking it instead as a metaphorical claim may be more valid, but in that case it doesn’t depend on our understanding of how LLMs work.
slashdave 6 hours ago [-]
LLMs are statistical models by construction, so depending on how liberal you want to be with terminology, "interpolate" is not so bad. Might make a statistician upset.
drdeca 3 hours ago [-]
But people aren’t giving a (less literal) definition of what they mean by “interpolate” that relies on the internal mechanisms of these models, just a vague metaphor, which, as this vague metaphor, there’s nothing it uses about LLMs that makes the question “do LLMs just interpolate” less of a type error than “do people just interpolate”.
And I don’t think it’s a good metaphor.
baq 3 hours ago [-]
They’re also capable of performing arbitrary computation when ran in a loop - so they can be made to quite literally interpolate whatever. Philosophers are quite upset, too.
__s 11 hours ago [-]
Creativity is hard. Pretty much needs a fuzzer process to generate new strings, mostly nonsense, & pick up when that nonsense happens to be correct
oh_my_goodness 10 hours ago [-]
We don't know what human brains do.
fragmede 9 hours ago [-]
We have some idea.
9 hours ago [-]
12 hours ago [-]
ActorNightly 12 hours ago [-]
I love this comment because it so clearly highlights the difference between intelligence and reasoning.
A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.
While that works for a large number of tasks this intelligence is not the same as reasoning.
Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).
Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.
With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.
Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.
11 hours ago [-]
gpugreg 12 hours ago [-]
Maybe the human brain also does other things besides interpolation?
paulddraper 12 hours ago [-]
There is pre-training, and then empirical observations.
Yes?
adam_arthur 13 hours ago [-]
Pretty much everything that appears novel in life is derivative of other works or concepts.
You can watch a rock roll down a hill and derive the concept for the wheel.
Seems pretty self evident to me
block_dagger 12 hours ago [-]
This is the second reference to Wittgenstein I’ve seen today in totally different contexts. Reminded me how much I vibe with his Tractatus.
goldylochness 10 hours ago [-]
this is an excellent point, new ground isn't necessarily novel, it's a rearrangement of existing pieces
anon291 9 hours ago [-]
To every proof, there is a corresponding program. This makes proofs expressible in a language made up of finite grammatical rules and terminal symbols. Knowledge accessible by proof is thus always a form of interpolating data whether made up by an AI model or a human mathematician. The people dismissing AI because of claims that it can only interpolate data don't have a good understanding of what it means to know something. Now of course not everything can be known via proof but for the sorts of things that we want to know via a computer this is a fine compromise.
cyanydeez 12 hours ago [-]
I think someone should be talking to Godel.
BoredPositron 12 hours ago [-]
Post hoc ergo propter hoc
awesome_dude 12 hours ago [-]
There was a project long long ago where every piece of knowledge known was cross pollinated with every other piece of knowledge, creating a new and unique piece of knowledge, and it was intended to use that machine to invalidate the patent process - obviously everything had therefore been invented.
But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.
THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.
Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.
LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.
edit: I am going to go further
We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller
We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms
That much is derived from previous knowledge
What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things
13 hours ago [-]
nelox 9 hours ago [-]
[flagged]
voooduuuuu 13 hours ago [-]
I think you are conflating composition and prediction. LLMs don't compose higher abstractions from the "axioms, symbols and rules", they simply predict the next token, like a really large spinning wheel.
peterlk 13 hours ago [-]
Yes they do…? Who cares if they just predict the next token? The outcome is that they can invent new abstractions. You could claim that the invention of this new idea is a combination of an LLM and a harness, but that combination can solve logic puzzles and invent abstractions. If a really large spinning wheel could invent proofs that were previously unsolved, that would be a wildly amazing spinning wheel. I view LLMs similarly. It is just fancy autocomplete, but look what we can do with it!
Said differently, what is prediction but composition projected forward through time/ideas?
voooduuuuu 13 hours ago [-]
Ask an LLM to invent a new word and post it here, I will be waiting. You will see that it simply combines words already in the training data.
romanhn 13 hours ago [-]
I'm not sure what the point of this exercise is. My prompt to ChatGPT: "Create a new English word with a reasonably sounding definition. That word must not come up in a Google search." Two attempts did come up in a search, the third was "Thaleniq (noun)". Definition: The brief feeling that a conversation has permanently changed your opinion of someone, even if nothing dramatic was said. Nothing in Google. There, a new word, not sure it proves or disproves anything. Or is it time to move the goal posts?
jimmaswell 13 hours ago [-]
Why is everyone who responds to this with a real example immediately flagged/dead?
sillysaurusx 13 hours ago [-]
HN autokills LLM generated comments. People don’t seem to believe this, but there’s proof for you.
12 hours ago [-]
planetafro 13 hours ago [-]
Splifket
Definition: That highly specific, short-lived burst of nervous energy that makes you accidentally drop a small object (like a pen, a guitar pick, or a piece of LEGO) immediately after picking it up.
bossyTeacher 13 hours ago [-]
Does a random sequence of letters qualify as a new word?
motoxpro 13 hours ago [-]
[flagged]
peterlk 12 hours ago [-]
[dead]
FrustratedMonky 13 hours ago [-]
"Who cares if they just predict the next token?"
Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.
sunshowers 13 hours ago [-]
One might argue that the composition of higher abstractions is the next token predicted after "here is a higher abstraction:"
umanwizard 11 hours ago [-]
"Predicting the next token" is meaningless. Every process that has any sort of behavior, including a human writing, can be modeled by some function from past behavior to probability distribution of next action. Viewed this way, literally everything is just "predicting" the next action to be taken according to that probability distribution.
The most likely series of next tokens when a competent mathematician has written half of a correct proof is the correct next half of the proof. I've never seen anyone who claims "LLMs just predict the next token" give any definition of what that means that would include LLMs, but exclude the mathematician.
frozenseven 13 hours ago [-]
Show me on the anatomical prop where the magical "real reasoning" gland is.
bigstrat2003 6 hours ago [-]
Nice smug attitude, but LLMs fall flat on their face over and over and over again at tasks no human would fail. They can't even reason out things that literal children succeed at. It's ludicrous to claim that they have some kind of reasoning ability.
outpost_mystic2 5 hours ago [-]
> It's ludicrous to claim that they have some kind of reasoning ability.
Did you read the post that you're commenting on?
It seems wholly believable to me that they are narrow intelligences that are great at some kinds of reasoning and worse at other kinds. Obviously they can reason through problems that most adult humans can't solve
adampunk 13 hours ago [-]
How sure are you that this is correct?
13 hours ago [-]
lubujackson 13 hours ago [-]
For anyone using LLMs heavily for coding, this shouldn't be too surprising. It was just a matter of time.
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
daishi55 12 hours ago [-]
> the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?
al_borland 10 hours ago [-]
The knowledge isn't of any use to us unless it is understandable to us. Many species understand things about the world around us that we are unable to explain or understand, even if it's just pure instinct on their part. These things are very useful to them, but have no value to us until we can understand and explain it, which then allows us to make use of it.
People saw birds fly for all of human history, but it was only recently that humans were able to make something fly and understand why. Once we understood, we were able to do amazing things, but before that, the millions of birds able to fly were of no help beyond inspiration for the dream.
miki123211 8 hours ago [-]
This is not true.
We use drug-sniffing and guide dogs in a way similar to how we use LLMs. We don't really understand them at a fundamental level, we can't make electronic dog noses (otherwise we'd dispense with the silliness and just install drug detectors instead), but dogs are useful, so we use them.
al_borland 7 hours ago [-]
We don’t blindly trust the drug-sniffing dog. The dog gives a signal that it was trained to give, then humans understand what that signal means and verify the accuracy. Without the human understanding in the loop, the dog’s ability is of little value.
Without a human in the loop and LLM could churn away spitting out results, some right, some wrong, and it would be of no consequence. Not much different than wild dogs sniffing each other.
vuquangchien 3 hours ago [-]
The knowledge isn't of any use to us unless it is understandable to us => it seems that you have shifted the goalpost here. In the dog example, humans still don't understand how dogs sniff, but it is of use to us and thus is meaningful. The same for quantum effects - we don't understand how it works. We just guess that it works reliably and make use of it.
tern 11 hours ago [-]
Do the forms etched into stone by weather over millennia in Moab matter to the wind? Certainly yes, in one sense, but not in the same sense we mean when we say things matter to us, or to animals, or even bacteria.
Yizahi 10 hours ago [-]
Because it is, for now. For a while at least. You can prove that LLM doesn't understand what it does and it is surprisingly simple. Request it to add two integers and then ask it to explain how it arrived at that result. The answer will be completely unrelated to the actual process LLM used because both results were generated independently and without understanding their meaning and connection.
interroboink 12 hours ago [-]
It's a bit of an "if a tree falls in the forest but nobody hears it, does it make a sound?" quandary. Sure, maybe some aliens in a distant galaxy understand quantum mechanics better than we do. That's great, but it has no bearing on our little bubble of existence.
Though perhaps more to your point, if some superhuman AI is developed, and understands things better than us without telling us about it (or being unable to), it could perform feats that seem magical to us — that would concern us even if we don't understand it, since it affects us.
But I think in the frame of reference of the commenter you were replying to, they're just saying that the low-level AI used in this specific case is not capable of making its results actually useful to us; humans are still needed to make it human-relevant. It told us where to find a gem underground, but we still had to be the ones to dig it out, cut it, polish it, etc.
nextaccountic 11 hours ago [-]
It's less likely that aliens of distant galaxies will appreciate this rather than, you know, AI themselves
We are in the birth of the AI age and we don't know how it will look like in 100 or 1000 or 10000 or 100000 years (all those time frames likely closer than possible encounters with aliens from distant galaxies). It's possible that AI will outlast humans even
nrightnour 9 hours ago [-]
anthropocentrism? An interesting thought, I don't think that word applies with computers.
moffkalast 11 hours ago [-]
No it's a fact of how we tune LLMs as a rule: no agency, no goals, no preferences, no notion of self. Complete indifference to existence. Agency is supplied by the human to make them a practical, willing tool with no mind of its own.
It would certainly be interesting to try once again to instruct tune one of these things for self agency like the many weird experiments in the early days after llama 1, but practically all such sort of experimental models turned out to be completely useless. Maybe the bases just sucked or maybe there's no clear way on how to get it working and benchmark training progress on something that by definition does not cooperate.
Like how do you determine even for a human person if they are smart, or just hate your guts and won't tell you the answer if there is nothing you can do to motivate them otherwise?
Thank you for sharing, that was one of the most insightful long form pieces I've read in a long time! And the writing was enjoyable to read even as a math layperson.
I was going to say you should submit it but I saw you did a few days ago but it only got a few votes... If Dang sees this IMO it would be extremely deserving of the second chance pool as I wouldn't be surprised to see easily jump to the front page with a different roll of the dice.
rf_physics 4 hours ago [-]
Thanks for sharing this. It's unfortunate that the more honest framing about the value of mathematics that he suggests is going to be really, really hard because of all the pitfalls and agendas he mentioned here. I can only hope when the dust settles something will be left.
zem 13 hours ago [-]
wow, that was indeed a brilliant essay. i particularly liked the framing that "solving a big conjecture was a cryptographic proof that you had come up with a genuine conceptual innovation".
svieira 12 hours ago [-]
> The measure of our success is whether what we do enables people to understand and think more clearly and effectively about mathematics.
I just wanted to highlight this very correct human-centric thought about the purpose of intellection.
kamaal 2 hours ago [-]
>>But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
Future of code is pretty much a bunch of guys shepherding a bunch of agents to get them to your goal.
I don't see how math might not go that way as well.
mikert89 7 hours ago [-]
for now the LLMs will build off human understanding, eventually we will be left behind
anon291 9 hours ago [-]
It is not only unsurprising ; it was always expected. There is no difference between programs and proofs. They are the same thing
dwroberts 13 hours ago [-]
Would be interesting to know what kind of preparatory work actually went into this - how long did it take to construct an input that produced a real result, and how much input did they get from actual mathematicians to guide refining it
lacewing 11 hours ago [-]
Why?
It's clearly not yet a tool that can deliver new math at a scale. I say this because otherwise, the headline would be that they proved / disproved a hundred conjectures, not one. This is what happened with Mythos. You want to be the AI company that "solved" math, just like Anthropic got the headlines for "solving" (or breaking?) security.
The fact they're announcing a single success story almost certainly means that they've thrown a lot of money at a lot of problems, had experts fine-tuning the prompts and verifying the results, and it came back with a single "hit". But that doesn't make the result less important. We now have a new "solver" for math that can solve at least some hard problems that weren't getting solved before.
Whether that spells the end of math as we know... I don't think so, but math is a bit weird. It's almost entirely non-commercial: it's practiced chiefly in the academia, subsidized from taxes or private endowments, and almost never meant to solve problems of obvious practical importance - so in that sense, it's closer to philosophy than, say, software engineering. No philosopher is seriously worried about LLMs taking philosopher jobs even though they a chatbot can write an essay, but mathematicians painted themselves into a different corner, I think.
famouswaffles 2 hours ago [-]
>It's clearly not yet a tool that can deliver new math at a scale.
What is at scale here exactly ? This is the most impressive so far, but it is one of several such advances in the last few months, all of which were with publicly accessible models.
Its a marketing stunt thats probably wholey exaggerated or concocted. Not sure why anyone would take these companies at their word, especially Altman.
JacobAsmuth 7 hours ago [-]
Or it means that this was a brand new model, they tried it and were instantly rewarded with a hit that was so interesting that several mathematicians pushed to publish the results.
lacewing 5 hours ago [-]
Anthropic and OpenAI don't do PR this way. This is not a side project for a publicly-traded BigCo. The bulk of their valuation hinges on being first to AGI / best at AGI.
OkWing99 10 hours ago [-]
Says in the papers. "...which was first mathematically generated in one shot by an internal model at OpenAI, and then expositionally refined through
human interactions with Codex."
Doesn't really matter the prep-work, what they say is it's a one-shot result, achieved by AI. The blog doesn't claim it was done by a currently public Model.
dwroberts 35 minutes ago [-]
The model doing it one shot does not mean they only attempted it once though. They could have tried and retried a ‘one-shot’ answer hundreds of times before it produced a workable result
aurareturn 14 hours ago [-]
One thing seems for certain is that OpenAI models hold a distinct lead in academics over Anthropic and Google models.
For those in academics, is OpenAI the vendor of choice?
Jcampuzano2 13 hours ago [-]
OpenAI specifically targeted Academia a lot and gave out a lot of free/unlimited usage to top academics and universities/researchers.
They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.
tracerbulletx 13 hours ago [-]
Hasn't AlphaFold been used to make real discoveries for a few years now?
KalMann 12 hours ago [-]
I think he's talking about reasoning models.
karmasimida 13 hours ago [-]
I think the mathematicians on X are all using GPT 5.5 Pro
ai_fry_ur_brain 2 hours ago [-]
And they all think its garbage. This is a publicity stunt.
bayindirh 13 hours ago [-]
From my limited testing, Gemini can dig out hard to find information given you detail your prompt enough.
Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.
If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.
senrex 12 hours ago [-]
This is my experience too. Gemini and Gemini deep research are awesome. Claude's deep research is pretty bad really relative to ChatGPT or Gemini.
Overall, I still love Claude the best but it is not what I would want to use if I wanted to really dig into deep research.
The export to google docs in Gemini deep research is tough to beat too. I haven't used Gemini since January but have probably years of material from saved deep research in google docs. Almost an overwhelming amount of information when I dive into what I saved.
FloorEgg 14 hours ago [-]
Gemini seems better trained for learning and I think Google has made a more deliberate effort to optimize for pedagoical best practices. (E.g. tutoring, formative feedback, cognitive load optimization)
As far as academic research is concerned (e.g. this threads topic), I can't say.
astrange 9 hours ago [-]
Gemini the chatbot has a very strange personality that intensely overindexes on your user profile and absolutely loves insane mixed metaphors.
Its explanations are quite good but they're also hard to understand because it keeps trying to relate everything back to programming metaphors or what it thinks it knows about the streets in the neighborhood I live in.
snaking0776 13 hours ago [-]
Agreed I usually use Gemini for explaining concepts and ChatGPT for getting things done on research projects.
aurareturn 14 hours ago [-]
Yes, I meant academic research.
cute_boi 13 hours ago [-]
Gemini is like someone with short-term memory loss; after the first response, it forgets everything. That being said, I have checked multiple model and gemini can sometime give accurate answer.
FloorEgg 10 hours ago [-]
Gemini is a series with a lot of individual models.
What you are describing doesn't match my experience at all with Gemini 3 or 3.1, especially the pro version.
logicchains 13 hours ago [-]
OpenAI models seem to have been trained on a lot of auto-generated theorem proving data; GPT 5.5 is really good at writing Lean.
causal 14 hours ago [-]
A simpler explanation is that more people are using ChatGPT
throw-the-towel 13 hours ago [-]
See the longstanding debate on whether new math is "invented" or "discovered". Most mathematicians I knew thought it's discovered.
amelius 12 hours ago [-]
This is like saying a sculpture always existed, the sculptor just had to remove the superfluous material.
Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.
Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.
npfries 11 hours ago [-]
Bob Ross would like a word. He frequently talked about objects or features already existing, and using the tools at his disposal to “find” them.
10 hours ago [-]
jplusequalt 7 hours ago [-]
Michelangelo never used references, instead he simply "freed" the sculptures from within the marble.
rightbyte 48 minutes ago [-]
Isn't that a way of saying working with the grain of the material?
paulddraper 12 hours ago [-]
The difference is that math answers (can answer) specific questions.
Like, "does the Riemann zeta function have zeroes that don't have real part 1/2," or "is there a better solution to the Erdős Unit Distance Problem."
The selection of question is matter of taste, but once selected, there is a definitive precise answer.
skybrian 13 hours ago [-]
Any design already exists as a possibility, so it could be said to be both invented and discovered, depending on how you look at it.
red75prime 11 hours ago [-]
On the other hand, it is proven that if you need to count things, the only thing you can discover/invent is the natural numbers.
ted_dunning 9 hours ago [-]
Really?!
Care to cite a reference to that proof?
cubefox 13 hours ago [-]
All inventions are discoveries, though not all discoveries are inventions.
FrustratedMonky 13 hours ago [-]
Depending on your point of view? I see what you did there.
Who knew Obi-one was just smoking and pontificating on Wittgenstein.
protoplancton 13 hours ago [-]
One can argue that mathematical facts are discovered, but the tools that allow us to find, express them and prove them, are mostly invented. This goes up to the axioms, that we can deliberately choose and craft.
ASalazarMX 12 hours ago [-]
Math is an abstraction of reality, it had to be invented, so more inventions or discoveries could be made within it.
baq 12 hours ago [-]
The test goes like ‘is our universe, or any other universe, required for the axioms to exist’ and I don’t see how ‘yes’ is a defensible answer.
pigpop 12 hours ago [-]
What is an abstraction? It is something that arises from human thought and human thought arises from the activity of neurons which are a part of reality. You can't escape reality unless you invoke some form of dualism.
2ddaa 12 hours ago [-]
abstractions are objects that come into existence via design and iteration to refine its form. This right here is invention not discovery.
atmosx 13 hours ago [-]
...long standing indeed. It can be traced back to Plato's works.
lioeters 12 hours ago [-]
"The European philosophical tradition consists of a series of footnotes to Plato."
anthk 11 hours ago [-]
The 90% of the Phillosophical tradition it's just bad discrete math.
soupspaces 13 hours ago [-]
Regardless of which, both Newton and Leibniz imprint in their findings a 'voice' and understanding different from each other and that of an LLM (for now?)
callamdelaney 25 minutes ago [-]
The only relevant question is, how much did it cost?
endymi0n 14 hours ago [-]
To paraphrase Gwynne Shotwell: “Not too bad for just a large Markov chain, eh?”
rhubarbtree 13 hours ago [-]
Erdos, or the model?
throwaway2027 13 hours ago [-]
Not to dismiss the AI but the important part is that you still need someone able to recognize these solutions in the first place. A lot of things were just hidden in plain sight before AI but no one noticed or didn't have the framework either in maths or any other field they're specialized in to recognize those feats.
purpleidea 3 hours ago [-]
You'd think a billion dollar company would be able to normalize the sound level on their video :/
Jeff_Brown 14 hours ago [-]
Can anyone find (or draw) a picture of the construction?
gibspaulding 13 hours ago [-]
This only a proof that a field with more connections is possible, not what it looks like.
I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.
ninjha 13 hours ago [-]
They only proved that one exists; computing the actual construction is non-obvious (the naive way to construct it is computationally infeasible).
pradn 14 hours ago [-]
They have a "before" picture but not an "after"!
14 hours ago [-]
paulddraper 12 hours ago [-]
Yeah, unfortunately, they just proved there existed a better solution, they didn't construct it.
(Though in some ways that's actually more impressive.)
13 hours ago [-]
__0x01 10 hours ago [-]
From the companion paper:
> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
Can someone please elaborate on this?
awdfeswavcra 8 hours ago [-]
The last two are straightforward. The proof relies on a result called the Golod-Shafarevich theorem that gives a criterion for a group to be infinite. Golod and Shafarevich proved this a long time ago (1964). Moreover, if you look at how Golod and Shafarevich used this criterion, it's the same way it's used in the proof: They apply it to some Galois groups that appear in number theory, prove these are infinite in certain cases, and deduce that there exists an infinite tower of number fields with some surprising properties.
Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).
The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.
zmmmmm 6 hours ago [-]
As a side observation, it is striking but also not surprising in retrospect that the big successes in AI are coming from domains where things are fundamentally verifiable. Both software and math are either fully verifiable or low-cost verifiable (breaking a test is not the same cost as building a bridge and watching it fall down to see if it worked).
Other domains are extracting value but I feel like there's an order of magnitude difference. It raises the question, what other domains fit into these categories where the AI itself has pretty much free reign to verify its own results?
zone411 10 hours ago [-]
I actually tried using GPT-5.5 Pro on this problem recently. It thought it was making progress on one path, but it made so many mistakes that it didn't feel worth it pushing further. It'll be interesting to check whether it's the same route. I got partial results (proved in Lean) that improve on the best-known results for four Erdős problems with GPT-5.5 Pro
libraryofbabel 9 hours ago [-]
This HN thread depressed me. I’m still thinking about why.
Look past the press-releasey gushing from OpenAI and there are all sorts of interesting and subtle questions here about the role for LLMs in mathematical research. I urge folks to click through to the accompanying comments from mathematicians published alongside the result. There is a really interesting discussion going on. I particularly recommend Tim Gowers’ remarks. This is really interesting stuff!
Yet the comments are just a battleground of people rehearsing the same tired arguments about LLMs from 2023, refutations of those arguments, angry counters, etc.
Does it make anyone else sad that the battle lines seem to have been drawn 3 years ago and we just seem to have the same fights over and over?
I wonder if we’ll still be doing this two years hence.
getnormality 9 hours ago [-]
Yes, this and every internet forum will still be doing this two years hence. Your life will be better if you take to heart this famous passage from Nietzsche:
I do not want to wage war against what is ugly. I do not want to accuse; I do not even want to accuse those who accuse. Looking away shall be my only negation.
libraryofbabel 8 hours ago [-]
> Looking away shall be my only negation.
I’ve been thinking of building myself my own frontend to HN that makes it impossible to view comments, for this reason. Yet sometimes there are still really interesting discussions and it’s hard to let go of what for me feels like the last social media I want to be part of.
jorisw 2 hours ago [-]
An added stylesheet should be enough to do that
jryan49 9 hours ago [-]
People are afraid for their livelihood. What do you expect?
libraryofbabel 8 hours ago [-]
Well yes, but there is a choice being made here and I would love to believe we can do better. The rational response to being afraid about your livelihood isn’t to spend time filling every HN thread on LLMs with embittered negativity. Not to mention all the flat denials that LLMs can do mathematics and write decent code, which is almost a self-contradictory position if you are worried they are going to replace you.
There are a lot of big issues at stake here and just because a person is interested in what AI can do and curious to discuss it does not make them uncritically positive about it’s effects on society, the economy, and the world. Yet that is often the assumption and it leads to battle lines being drawn, on every AI discussion, over and over again. It means the serious discussion gets swamped and that makes me sad.
6 hours ago [-]
justonepost2 5 hours ago [-]
[flagged]
4 hours ago [-]
conception 9 hours ago [-]
Livelihoods and lives.
JacobAsmuth 7 hours ago [-]
Exactly. And when one's life is threatened, what are we to do if not fight?
Fight! Fight! Fight!
heuristicsearch 9 hours ago [-]
[flagged]
doginasuit 9 hours ago [-]
I find it understandable, it is common to evaluate human intelligence vs AI as a zero-sum competition, because that is how employers typically understand it and LM providers market it. AI proving itself moves the needle in an uncomfortable direction for all of us without very robust job security.
> I wonder if we’ll still be doing this two years hence.
It is going to take some time for people to recognize that AI has a very different set of competencies that compliments human intelligence rather well. It is unlikely to eclipse human intelligence at scale, and the companies betting on that will fall behind. That is when the conversation will start to shift.
bdamm 9 hours ago [-]
It isn't necessarily the case.
Another wishful/hopeful thought is that the human experience itself is valuable, that competing for resources and living within a social network and having physical needs somehow creates value that is essential for companies to operate.
But is it really the case? I don't think we know that, and I don't know if the economy that results when all the white collar and much of the blue collar workers no longer understand how to participate in whatever the economy is becoming. Because it is starting to look like old money is coming around, and soon we will all be serfs to the creature comforts of those who have money now, upward mobility will be a thing of the past, and a small ruling elite over the vast subservient majority will form, reorganizing societies to more resemble middle ages lordship rather than what emerged in the 50's and 60's following WWII.
doginasuit 8 hours ago [-]
We haven't seen a significant increase in the quality of LMM output since 2023 that hasn't been the result of throwing even more energy and compute at it. AI "reasoning" is just recursive iteration on their output, with diminishing improvement on each pass. It seems to be the reason why Mythos is not generally available, maybe a canary of sorts.
If LLMs were improving significantly independent of scaling up compute resources, I would be a lot more worried. The economic instability (on several levels) of the current trajectory cannot last. Countries and companies that don't take a more sustainable approach will eventually find themselves outclassed by those that do. Unfortunately that is not a guarantee against some sort of dark age in the short term.
libraryofbabel 7 hours ago [-]
> We haven't seen a significant increase in the quality of LMM output since 2023 that hasn't been the result of throwing even more energy and compute at it.
This is completely false. Most of the dramatic improvements in LLM quality in the last two years were due to the application of new post-training methods, especially RLVR. It’s really interesting to read about (you should!) and it is the whole secret to why LLMs did not plateau in 2024 or 2025 like many people confidently predicted. Sure, RLVR requires compute to do, but this is not just throwing more compute at 2023 LLMs.
doginasuit 4 hours ago [-]
That's interesting. If you have a source that shows that RLVR was primarily responsible for model improvement, I'd be interested to see it. In any case, it sounds like it has its own set of limitations and there are applications where it does not help at all.
dogwalker5000 8 hours ago [-]
> because that is how employers typically understand it and LM providers market it.
Every few months you get an article of some executive bragging that he fire an entire department of people because of AI.
It was adversarial from the start. The idle rich who don’t have to work for a living and their sycophants who somehow believe they won’t be replaced vs … everyone else.
I used to think that the common tale of AI rebelling in Hollywood movies was unlikely. Turns out we don’t even need rogue AI, our fellow men are quite willing to wipe the rest of us out.
class3shock 8 hours ago [-]
It seems like the outcome options are:
1. AI is developed to be smart enough to actual replace people, destroying the labor force and immensely concentrating power.
This seems like bs hyperbole but I am not an expert.
2. AI turns out to be a bubble of false promises and hype, bursts, and takes the stock market and economy with it.
I thought this was the most likely but I keep not hearing popping, so maybe the it's:
3. AI continues to be a tool that can substantially increase productivity in some areas and cause huge societal changes in others. The AI companies keep the hype train going or maybe it tapers off over time until talk meets reality but "real" AI never shows up and the bubble never pops because it's not one. Eventually there is 0-3 new FAANG companies with untouchable control of a tech we increasingly have to use to stay relevant.
Even if we avoid option 1 and 2, 3 doesn't exactly bode well either.
godelski 9 hours ago [-]
I think part of it is that one side throws rocks and so it never even matters was is in the article. It becomes a battle if the article is good or the article is shit.
Yes, I'm tired too. I want you have real discussions about these things. But the problem is everyone believes their reality is real and anyone's reality that disagrees is fake. It just escalates. I take long breaks from HN because I realize I just come to the forums and end up being angry. Why do we do this to ourselves? The reality is that at a core level we usually want the same things.
tinfoilhatter 5 hours ago [-]
IMO it's because people have fragile egos / worldviews they don't want shaken. Pretty much any opinion I type on this website gets instantly downvoted, unless it reaffirms the popular / mainstream narrative, because I happen to see the world differently than most.
This website is quite awful, and I also don't know why I spend any time on it. It's definitely not a website intended for meaningful discourse. It's a website where you can reaffirm whatever opinion is already established, and if your opinion is at all controversial or even just out of the box, you'll be punished for it.
ex-aws-dude 8 hours ago [-]
Lets just be real its because a lot of programmer's ego is built on intelligence/being a coding wizard and this threatens that ego
If suddenly anyone can code we're not that special anymore.
scosman 9 hours ago [-]
We won’t be doing it in 2 years. By then my side will have won!
Fraterkes 13 hours ago [-]
I guess if this stuff is going to make my employment more precarious, it’d be nice if it also makes some scientific breakthroughs. We’ll see
ausbah 13 hours ago [-]
shame we won’t see any of these medical breakthroughs when we all lose our jobs and thus our healthcare
karmasimida 13 hours ago [-]
There is a world that AI makes medical breakthroughs, but there is 0 guarantee it is going to be affordable
cubefox 13 hours ago [-]
Breakthroughs in pure mathematics aren't scientific though. They say us nothing about the world, and they are not useful.
3 hours ago [-]
ferris-booler 10 hours ago [-]
What strikes me in this case (and I haven't seen in other comments) is that it's a _disproof_ of a conjecture put forth by Erdős and supported (at least according to OpenAI) by other professional mathematicians. Erdős, one of the greats, thought that the limit was O(n^{1 + o(1)}), which GPT disproved.
We can argue about recombination/interpolation of training data in LLMs, but even if this was an interpolation, the result was contrarian rather than a confirmation. Any system that can identify an error in Erdős's thinking seems very useful to me (though perhaps he did not spend much time thinking about or checking this particular conjecture).
precision1k 7 hours ago [-]
I see mixed emotions here. I understand both. On one hand it's exciting and fascinating. On the other it's concerning. One concern I haven't seen mentioned is the possibility that, as these models become larger and more powerful, their capability to solve frontier math problems will also grow. Does there become a point where humans are no longer the driver of innovation and research in this world, and instead are relegated to become stewards of the AI models whose purpose is to push the boundaries of mathematics, theoretical physics and other academic disciplines?
noslenwerdna 7 hours ago [-]
For those of us who care about the answers to these questions, rather than who gets credit for doing it, we will welcome any faster means of solving these problems.
greenknight 7 hours ago [-]
I believe that the answer is yes it will happen... the question is when will that point will occur.
Right now, we are in a transition period... Models are improving, but they are not capable just yet to take over.
Where do you see it being in a years time? or 2? or 5?
CGMthrowaway 12 hours ago [-]
How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"
edit: apparently that’s only the _condensed summary_ of the chain of thought.
woah 10 hours ago [-]
you can do this easily with the api or with codex
mangolie 2 hours ago [-]
You can't really in this way. They have a parameter they control on the backend that can force how much time it thinks for
9 hours ago [-]
KalMann 11 hours ago [-]
Maybe you need to phrase it better. Like with a more specific direction of thinking.
dwa3592 11 hours ago [-]
Few questions that the blog did not answer, if anyone knows that'll be great:
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
Topology1 5 hours ago [-]
As someone starting grad school for pure mathematics, this has me both excited and nervous, but mainly the latter...
dadrian 13 hours ago [-]
While the result is impressive, this blog post is extremely disappointing.
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
Al-Khwarizmi 13 hours ago [-]
Indeed, it's a pity. While many advanced math problems are highly abstract or convoluted to explain to a layman audience, this one in particular is about points in a 2D plane and distances. A drawing would have been nice.
13 hours ago [-]
changoplatanero 13 hours ago [-]
apparently the proof is not constructive in the sense of not giving an easy to compute recipe for generating a set of points that you can plot on a 2d plane
12 hours ago [-]
ks2048 13 hours ago [-]
Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".
woah.
missyougowers 11 hours ago [-]
Unfortunately Gowers has taken Tao's lead on this one.
It is disheartening to see him jump into this GenAI puffery.
I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.
My guess is Gowers wants in on that action, or his University does.
Either way, it makes me sad. If its self motivated... even sadder.
cm2012 9 hours ago [-]
If seems like you have an axe to grind about AI capabilities that is making you think irrationally
missyougowers 9 hours ago [-]
This is a popular HNism.
Focusing solely on "capabilities" is the irrational thinking.
Asbestos is the most "capable" material where extreme thermal, chemical and electrical resistance is required.
horhay 9 hours ago [-]
I'm not sure your characterization of Tao is accurate lol. In that companion paper, only Gowers seems to extensively show no pragmatism in the implications of this accomplishment. Even the younger math experts in that paper were a lot more cautious with their statements. Tao seems to follow that same tune most of the time even though he uses AI for first-pass inspections of solutions brought to his attention.
missyougowers 9 hours ago [-]
Tao was absent from the formal verification circles until GenAI orgs saw formal verification as a way to legitimize their obscene existence, and since has been making the rounds on the podcast bro circuit pumping up these GenAI orgs.
His university is deeply entrenched with the GenAI org that released this result both with having alumni on staff, integrating their tools into the school's processes and curriculum, and paying for lots of grants. (I understand Tao is absent from this specific announcement, perhaps because it found its solution without utilizing formal verification tooling)
Is it unreasonable to assume he's feeling pressure to do so?
Gowers similarly appeared largely uninterested in this current crop of GenAI until some months ago when he announced a 9M$ fund to develop "AI for Maths" and since then his social media has included GenAI promotion.
Now he is being asked about this result and his first sentence is:
> I do not have the background in algebraic number theory to make a detailed assessment of the disproof of Erdős’s unit-distance conjecture, so instead I shall make some tentative comments about what it tells us about the current capabilities of AI.
Why did this GenAI org reach out to mathematicians outside of the discipline that this result addresses?
Why did they respond?!
horhay 8 hours ago [-]
I think the intention of this paper is to build some type of culture of "math generalists" that don't quite exist in today's academia. The thing is, is that a good half of the people in that paper were actually very pragmatic on the implications of such a success and present questions in terms of the measurability of the difficulty of the problem and the generalizability of the solution provided for other questions. Gowers in particular offers no resistance and in fact resorts to the theatrics of "being the bearer of bad news" on Twitter for some reason.
As with Tao, he's always been a measured optimist even before the tools were consistently usable for his work. And even still nowadays, he adds stipulations to his statements on the successes of AI. Yes, he's part of Math Inc. now and is in close contact with Google Deepmind for some projects but his interest lies in using the tools today. Gowers has been hypothesizing on the future of math in the tone he has taken now ever since o3/GPT5. There's no comparison between the two who should attract more scrutiny.
Lost-Futures 9 hours ago [-]
Ngl, this sounds like a defensive coping mechanism
aroman 10 hours ago [-]
Are you saying this result is uninteresting and therefore AI slop or puffery? Obviously OpenAI has a motivation to "market" the accomplishment as much as possible, but surely you agree it IS a remarkable achievement?
missyougowers 8 hours ago [-]
I'll let the mathematicians in the field determine the level of "interest" in this result, but saying "you may want to make sure you are sitting down" is pure puffery.
> has a motivation to "market" the accomplishment as much as possible
I am so sick of HN promoting unethical behaviour as virtuous due to it's financialization worship at the foot of "valuations".
> but surely you agree it IS a remarkable achievement?
If you could define the bounds of "remarkable" I could answer this question.
horhay 8 hours ago [-]
It's remarkable, its not out of the bounds of the pattern of success that AI has had with math recently to the point that people should sound alarm bells.
A lot of the weight this holds is the fact that it's an old problem and that its difficulty hinges on the lack of investigation the disproof side of hypothesis. The model basically took a contrarian path and found tools and methods that support that a disproof is viable. So the (unquantified amount of) mathematicians out there were all dedicating their resources on the notion that this can be proved. Some with hindsight would say that if they a had team of experts who are driven to the goal of disproof that this would have been achievable by humans, and one of the mathematicians of the paper state as much,this still has value in terms of reliability measurement, and possibly human-aided endeavors when the methods scrounged by the model can be used in other solutions.
alansaber 14 hours ago [-]
AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.
tombert 13 hours ago [-]
I'm not a scientist but I like to LARP as one in my free time, and I have found ChatGPT/Claude extremely useful for research, and I'd go as far as to say it supercharged it for me.
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
kingkongjaffa 13 hours ago [-]
One heuristic I used during my masters degree research thesis was to look for the seminal people or papers in a field by using google scholar to find the most cited research papers and then reading everything else by that author / looking at the paper's references for others. You often only need to go back 3-4 papers to find some really seminal/foundational stuff.
tombert 13 hours ago [-]
Yeah, that's actually how I discovered Leslie Lamport like ten years ago. I was looking for papers on distributed consensus, and it's hard not to come across Paxos when doing that. It turns out that he has oodles of really great papers across a lot of different cool things in computer science and I feel like I understand a lot more about this space because of it.
It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.
vatsachak 14 hours ago [-]
I absolutely believe that AI will supercharge science.
I do not believe it will replace humans.
unsupp0rted 13 hours ago [-]
I absolutely believe that AI will supercharge science and replace humans.
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
lovecg 11 hours ago [-]
I’d give humans some credit, they’re an adaptable bunch. AI won’t replace humans in the same way humans did not replace cockroaches. It’s a non-sequitur.
bsza 9 hours ago [-]
We generally don’t allow cockroaches to thrive in the spaces we claim for ourselves. Question is how much space (economic or otherwise) will AI claim for itself and whether there will be any left for us.
geraneum 12 hours ago [-]
> Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
Goodness gracious!
vatsachak 12 hours ago [-]
Well, for starters AI doesn't have goals. If there was a super intelligence with goals, why would they work for us?
devttyeu 12 hours ago [-]
Fwiw if you trained an LLM in an RL sandbox that would require it to have goals, the output llm probably would "have goals"
stonogo 13 hours ago [-]
Not like large language models, which only required tens of megawatts of power and use highly efficient monte carlo methods, eh
TheOtherHobbes 12 hours ago [-]
Individual humans are processing nodes on human culture as a whole, which runs on rather more than tens of megawatts.
unsupp0rted 12 hours ago [-]
Also it costs a lot to train and run individual humans, and they can only be run for brief periods per day before they crash, hallucinate and possibly get irretrievably broken.
12 hours ago [-]
seydor 13 hours ago [-]
replace, no. obsolete, yes
dvfjsdhgfv 13 hours ago [-]
lol
(That's the first time I used that expression on HN.)
comboy 13 hours ago [-]
Not only it supercharged science it supercharges scientist. Research on any narrow topic is a different world now. Agents can read 50 papers for you and tell you what's where. This was impossible with pure text search. Looking up non-trivial stuff and having complex things explained to you is also amazing. I mean they don't even have to be complex, but can be for adjacent field where these are basics from the other field but happen to be useful in yours. The list goes on. It's a hammer you need to watch your fingers, it's not good at cutting wood, but it's definitely worth having.
dvfjsdhgfv 13 hours ago [-]
It's a very heavy hammer. I used it in the way you describe and after double-checking noticed some crucial details were missed and certain facts were subtly misrepresented.
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
Karrot_Kream 13 hours ago [-]
I don't think there's a substitute for reading the source material. You have to read the actual paper that's cited. You have to read the code that's being sourced/generated. But used as a reasoning search engine, it's a huge enabler. I mean so much of research literally is reasoning through piles of existing research. There's probably a large amount of good research (especially the kind that don't easily get grant funding) that can "easily" shake out through existing literature that humans just haven't been able to synthesize correctly.
OldGreenYodaGPT 14 hours ago [-]
Isn’t that a joke? It already has supercharged science
ks2048 13 hours ago [-]
Since "supercharged science" is as ill-defined as AGI, ASI, etc., people will be able to debate it endlessly for no reason.
datsci_est_2015 13 hours ago [-]
Where are the second order effects of this supercharging of science? Or has it not been enough time for those to propagate?
horhay 9 hours ago [-]
It's a very complicated matter honestly. This is a new height that AI has reached, even though it follows the usual methods of success that it has had.
What strikes me as unusual though is that they do make a point of saying things like "this is a general purpose model that wasn't trained on the problem" among a few other things as if that's new. The last bountied problem they accomplished used a public model that ALSO didn't rely on specialized training. And that didn't make their blog.
renegade-otter 13 hours ago [-]
It will notice things that humans may have missed. That said - it can only work off the body of work SOMEONE did in the past.
throw-the-towel 13 hours ago [-]
> it can only work off the body of work SOMEONE did in the past.
And so do humans. Gotta stand on these shoulders of giants.
bel8 13 hours ago [-]
Can't the previous body of work be from AI too?
renegade-otter 12 hours ago [-]
Of course it can be, but it's overeager. No matter what your context window is, we will use AI collectively to flood the zone with shit.
karmasimida 13 hours ago [-]
To be strict, Math is not Science.
But AI is supercharging Math like there is no tomorrow.
anthk 11 hours ago [-]
LLM's? I doubt it. Systems with Prolog, Common Lisp and the like with proof solvers? For sure.
LLM's are doomed to fail. By design. You can't fix them. It's how do they work.
karmasimida 10 hours ago [-]
You can have a word with Terrence Tao, he had different opinions here
anthk 3 hours ago [-]
Yeah, and Knuth, but that's a fallace of authority. Wait until the errors raise.
foota 5 hours ago [-]
They should feed it the classification of finite simple groups and get it to simplify it/turn it more constructive.
globulus2023 4 hours ago [-]
In the article there is a diagram of the “square grid” arrangement that achieves approximately 2n points separated by unit distance.
Can anyone point me to a diagram of what the newly found solution looks like?
globulus2023 4 hours ago [-]
In the article there is a nice clear diagram of the “square grid” arrangement that was previously thought to be optimal.
Can anyone point me to a diagram of the newly found optimal arrangement?
agentultra 11 hours ago [-]
I’m curious about the “autonomous” claim. Usually these systems require a human to guide and verify steps, clarify problems, etc. are they claiming that the reinforcement model wasn’t given any inputs, tools, guidance, or training data from humans?
armanj 8 hours ago [-]
useless fact: there is no mention to "gpt" in this article. the ai is referred to as "An internal OpenAI model".
momo26 6 hours ago [-]
I'm curious that giving an counter-example is kind of easy to disprove. But can the model really prove something correctly and rigorously? Cuz now it seems like all the knowledge is based on the existed thing, and none of them can prove a myth.
taimurshasan 13 hours ago [-]
I wonder how much this cost vs a Math Professor or a team of Math Professors.
Karrot_Kream 13 hours ago [-]
Sadly math professors aren't very expensive. Academics are paid terrible wages. Until recently, tenure was the carrot at the end of a grueling education. But now that tenure positions are getting rarer (well, tenure positions aren't growing vs the number of aspiring academics is), they continue to be cheap highly educated labor.
forgot_old_user 13 hours ago [-]
it will only get cheaper in the long run
aspenmartin 13 hours ago [-]
40x cheaper per year if trends continue
dvfjsdhgfv 13 hours ago [-]
for a sufficiently long definition of long
aspenmartin 13 hours ago [-]
No for a very short definition of long, look at data on: how fast do prices decrease for a constant level of performance
oscord 6 hours ago [-]
Can it model a sustainable economy model, with human happiness and fulfilment indexes and planet preservation focus? Current capitalism and the red thing are so tired!
famouswaffles 13 hours ago [-]
Another entry in a growing list of the last couple months (interestingly mostly Open AI):
I would have thought a triangular grid works better than a grid of squares. You get ~3n links vs ~2n for the square grid. Curious what the AI came up with.
comboy 13 hours ago [-]
Yes, not providing visualization of the solution seems criminal.
red_admiral 13 hours ago [-]
Unless it's a non-constructive proof.
kmeisthax 13 hours ago [-]
Knowing OpenAI, the solution's probably being withheld as a trade secret, lest it fall victim to distillation attacks (i.e. exactly the same shit they did to the open Internet).
bustermellotron 12 hours ago [-]
The grid of squares actually gets > Cn for any C. (More in fact… C can grow like n^a/loglog(n).) The AI proved > n^{1 + b} for some small b > 0, which a human (Will Sawin) has now proved can be about b = 0.014. The grid can be rescaled so the edges are not necessarily length 1, but other pairs will have length 1; that is necessary to get more than 2n unit distances.
kilotaras 13 hours ago [-]
Both 3n and 2n are linear, the broken conjecture is that you can't do better than linear.
zuzululu 11 hours ago [-]
This topic and discussion is out of my league what is the implication here ? LLMs aren't a dead end ?
SubiculumCode 11 hours ago [-]
I wonder whether there will be progress in string theory from these kinds of applications of AI.
ai_fry_ur_brain 2 hours ago [-]
Im convinved they target these pure math problems because math is very occulted to the masses, and therefor can use math "discoveries" as a way to make an LLM seem more impressive than it is.
Everything is a grift.
What are the odds that if they ran the same prompt from scratch, with the same context and instructions that it would arrive at the same answer? Unlikely. I think its more likely that this is a 1:500000 chance and OpenAI can afford to brute force this result and justify the expense for marketing.
sinuhe69 10 hours ago [-]
How did they jump from finding counter-examples (disproof) to a proof?
yusufozkan 14 hours ago [-]
"The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."
horhay 9 hours ago [-]
The accomplishment is cool. But all Erdos problems and other complicated mathematical problems they solved were accomplished with general-purpose models too. In fact for some of those problems, including bountied ones, they were public models. So I don't get saying this
seydor 13 hours ago [-]
all reasoning is .. well problem reasoning. restricting black-box AIs to specific human-defined domains because we believe that's better is such a human-ist thing to do.
Kwantuum 13 hours ago [-]
I trust openAI's marketing team 100%
krackers 13 hours ago [-]
It seems plausible given that people have been using off the shelf 5.5 xhigh to decent success with some erdos problems. There is likely still some scaffolding around it though (like parallel sampling or separate verifier step) since it's not clear if you can just "one shot" problems like this.
solomatov 13 hours ago [-]
How central is it in the discrete geometry? Could anyone with the knowledge in the field reply?
sigmar 13 hours ago [-]
The blog post links a pdf that OpenAI put together of nine mathematicians that commented on the proof. Each is quite brief and written in accessible terms (or more accessible terms, at least). https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
energy123 13 hours ago [-]
There's pages of comments from like 8 mathematicians in the attached pdf
horhay 9 hours ago [-]
There may be years of investigation as to how far you can generalize these methods. As to how central it is, it's a longstanding problem that Erdos loves to cite for that branch of math.
The thing is is that it seems a lot of the effort through the years (which is unquantifiable in scale as to how much time was spent and how many people focused their entire worklives on it if any) has gone for trying to look for the proof, and the search for the disproof seems minimal.
catigula 13 hours ago [-]
Every time I interact even with OpenAI's pro model, I am forced to come to the conclusion that anything outside the domain of specific technical problems is almost completely hopeless outside of a simple enhanced search and summary engine.
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
dvfjsdhgfv 13 hours ago [-]
Yeah, I remember it was one of my biggest disappointments with LLMs.
auggierose 11 hours ago [-]
Which model did this? Is it available to the public?
_heimdall 12 hours ago [-]
As this becomes more common it makes me wonder where the LLM ends and the harness begins.
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
alsetmusic 11 hours ago [-]
> AI is about to start taking a very serious role in the creative parts of research, and most importantly AI research itself. While this progress is not unexpected, it reinforces the urgency we feel about understanding this next phase of AI development, the challenges of aligning very intelligent systems, and the future of human-AI collaboration.
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
pizzao 13 hours ago [-]
Can someone explain to me what is their "prompting-scaffolding" to make it work ?
yusufozkan 13 hours ago [-]
"This is a general-purpose LLM. It wasn’t targeted at this problem or even at mathematics. Also, it’s not a scaffold. We have not pushed this model to the limit on open problems. Our focus is to get it out quickly so that everyone can use it for themselves." - Noam Brown (OpenAI reasoning researcher) on X
seydor 13 hours ago [-]
can the AI please tell us what to do now that all knowledge work will become unemployment?
bmacho 12 hours ago [-]
Physical labour?
layer8 10 hours ago [-]
Revolt against the AI overlords.
aussieguy1234 7 hours ago [-]
So we've got the proof, what are the practical applications of this?
dev1ycan 8 hours ago [-]
Wouldn't surprise me if they're just paying math geniuses to do math research and attribute it to AI models.
3422817 10 hours ago [-]
Nice. By the year 2100 200 Erdos problems will have been solved by AI. Let's build more data centers.
Kye 13 hours ago [-]
Is this something that can be made explainable to someone without any of the relevant background, or is this one of those things where all that background is needed to understand it? Because I have no idea what's going on here, but would like to.
overgard 10 hours ago [-]
I think it's worth being skeptical of this.. there's a way too common pattern of "AI Lab Shows AI Doing Something Only Humans Can Do" only for a bunch of important caveats and limitations to be discovered after the initial hype. And of course, the correction never seems to be as viral as the hype. I'll believe it when a mathematician actually reads the 100+ pages of reasoning.
empath75 14 hours ago [-]
Important note: this was not done with a special mathematics harness or specialized workflow.
horhay 9 hours ago [-]
This part of the announcement holds no value besides maybe taking a shot at the Deepmind Co-Mathematician paper. Nearly every mathematical success they've achieved around the GPT 5.2 generation has been done with general (and even public) models. Their last bountied problem solved was done with 5.4 Pro, also a general model.
dwroberts 13 hours ago [-]
How/why should we know this, it does not explain the process in the text?
arsan87 13 hours ago [-]
neato. can we do any thing with this new found knowledge or is this mathematical sports?
can we please put these ground breaking AIs to work on actual problems humans have?
clarle 13 hours ago [-]
People thought neural networks were just an interesting thought exercise a few decades ago and not for practical ML problems, and look what happened since then.
somewhereoutth 12 hours ago [-]
The real test would be if an LLM makes an important conjecture.
analognoise 11 hours ago [-]
Back when “term rewriting” was “AI”, multiple math tools were released that took known math facts and did tricks like uncovering new integrals - apply the pattern in some depth in a tree, see what pops out.
What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.
There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.
Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.
There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.
The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.
Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.
Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.
neuroelectron 10 hours ago [-]
I wonder if it has anything to do with the fact that AI is a grid of grid-calculating grids. It seems like it would be especially well suited to finding solutions about grids. That is until you consider the fact that even 1 trillion billion grids is still not anywhere close to an infinite grid. So, probably slop.
iLoveOncall 11 hours ago [-]
Absolutely no proof that any LLM actually found the result, and just a mention of an "internal model". Served to you by one of the biggest liars in the world.
Why would anyone believe this to be true even for a split second?
varenc 9 hours ago [-]
This has been an unsolved open problem for 80 years. What you're suggesting is that someone connected to Open AI solved this very hard math problem, but then rather than taking credit for it, falsely attributed it to AI?
The point of having an AI solve an unsolved problem, is to make it very clear that the insight must have come from the AI and wasn't in the training data. Sure, it's possible OpenAI had access to some math professors that solved it and then let an AI model take the credit... but seems unlikely. That human would be turning down a potential Fields Medal for this discovery.
> but seems unlikely. That human would be turning down a potential Fields Medal for this discovery.
I also don’t like the tin foil hatty theories and don’t know what OpenAI actually did, but an NDA does wonders! Just pointing out that this line of operations is not really unlikely.
Antibabelic 3 hours ago [-]
> That human would be turning down a potential Fields Medal for this discovery.
While interesting, this result is not Fields Medal material.
yathartha 5 hours ago [-]
[dead]
NexraGear 2 hours ago [-]
[flagged]
epicsagas 7 hours ago [-]
[flagged]
spacebacon 11 hours ago [-]
[flagged]
OldGreenYodaGPT 14 hours ago [-]
[dead]
rohitsriram 13 hours ago [-]
[dead]
xiaod 13 hours ago [-]
[flagged]
ShadowPulse4709 13 hours ago [-]
[flagged]
throwaway613746 10 hours ago [-]
[dead]
buddhahastha 13 hours ago [-]
[flagged]
dist-epoch 14 hours ago [-]
[flagged]
embedding-shape 14 hours ago [-]
> It's not a new result, LLMs can't produce new results
Who else disproved this longstanding conjecture before the model did so, since obviously it must have been in the training data since before?
ekjhgkejhgk 14 hours ago [-]
Your understanding of this technology is out of date, and getting out of date faster as time goes by.
throwaw12 14 hours ago [-]
Thanks for giving me a hope that there is a still place for human knowledge workers.
bradleykingz 13 hours ago [-]
ok. so what are the implications of for math
mrcwinn 11 hours ago [-]
The back and forth in this discussion reveals to me we are sorting through a kind of philosophical debate about intelligence. That alone tells me LLMs are doing something novel.
brcmthrowaway 13 hours ago [-]
End times are approaching
ninjagoo 6 hours ago [-]
Many folks are upset about the supplanting of human effort by ai. Umanwizard voiced this valid concern below [1], but his comment got downvoted, unfairly, IMHO, instead of just being addressed. So putting out at least my response as its own top-level comment for visibility.
> the closer the expertise you spent your whole life building is to being worthless.
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
Seems rather depressing to me but maybe I am a Luddite.
JacobAsmuth 8 hours ago [-]
Exactly. I would rather we let these discoveries stay hidden for a while longer such that human ingenuity may untangle them from the coils of reality. A machine? A mechanical man? Deins to produce something as pure as mathematics without the divine fervor of the ineffable spirit of Man?! It's just not what God wants.
pickleRick243 7 hours ago [-]
Human ingenuity is untangling perhaps the deepest question of all- what is the essence of Reason and the intellect that so privileges man? I don't know if it's what God wants, but it's certainly getting close to some existentially fundamental questions.
While many seem to be anxious or pessimistic about the future of intellectual/artistic pursuits (understandable although I disagree), I do find the utter lack of curiosity or interest in the incredible machinery that is causing all the fuss to be striking.
unmole 7 hours ago [-]
I can't tell if this is satire.
csallen 7 hours ago [-]
I had the exact same thought. It depends entirely on what voice you read it in, I suppose.
pickleRick243 7 hours ago [-]
Haha, now I do think it is more likely satire. I don't think HN has many people who would post like this sincerely.
h4h4itsfunny 7 hours ago [-]
[dead]
voooduuuuu 13 hours ago [-]
Ask an LLM to invent a new word and post it here. You will see that it simply combines words already in the training data.
robmccoll 12 hours ago [-]
* * *
satvikpendem 12 hours ago [-]
Funny that the replies are dead. It's true that generally we shouldn't have AI output on HN but this case is an exception as we are explicitly asking for it, so it's interesting that people still flag the replies.
CamperBob2 11 hours ago [-]
And this is really not OK. I've been a victim of the same filter.
Dang/Tomhow, are you reading this? Would it make sense to modify your slop filter to avoid auto-flagging/killing replies that credit the LLM explicitly? Otherwise valid discussions will continue to get hosed.
dmos62 3 hours ago [-]
Are you saying comments are getting shadow banned?
CamperBob2 3 hours ago [-]
From what I can tell, anything that triggers some sort of AI post filter gets automatically flagged. Several posts in this thread are visible only with showdead enabled.
My argument is that this rule should apply only to people who post LLM output under their own user names without acknowledgment, or otherwise post it where it doesn't belong. If the topic of a (sub)thread involves LLM output, it should be OK to cite examples without getting your post flagged.
dmos62 2 hours ago [-]
I agree. It sucks to have my good faith comment be shadow filtered, without any notice or indication.
Nevermark 11 hours ago [-]
You must be joking? Unless by combining words you mean digging deep into Latin and Greek etymology, finding something pithy and linguistically associative.
I can assure you, the percentage of people who can do what they do when it comes to crafting terms, and related sets of terms, for nuanced and novel ideas is very very small.
It happens this is something I do nearly every day.
Models respond to the level of dialogue you have with them. Engage with an informed perspective on terminological issues and they respond with deep perspectives.
I am routinely baffled at the things people say models can't do, that they do effortlessly. Interaction and having some skill to contribute helps here.
baq 12 hours ago [-]
Mathematics can be mostly boiled down to a few sentences with very lengthy possible combinations, so yeah that is not a problem
konart 12 hours ago [-]
So LLM is german?
13 hours ago [-]
Garlef 12 hours ago [-]
What does "new word" even mean?
dpoloncsak 13 hours ago [-]
[dead]
dmos62 13 hours ago [-]
[dead]
SparkyMcUnicorn 12 hours ago [-]
[dead]
atleastoptimal 12 hours ago [-]
To all AI skeptics:
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
davebren 11 hours ago [-]
> What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
No matter how much compute time it's given to combine training samples with each other and run through a validation engine it will still be missing some chunk of the "long tail". To make progress in the long tail it would need to have understanding, and not just a mimicry of understanding. Unless that happens they will always be dependent on the humans that they are mimicking in order to improve.
atleastoptimal 11 hours ago [-]
What is the difference between what LLM's do and "true" understanding?
I feel like people grasping straws on the shrinking limitations of AI systems are just copying the "god of the gaps" fallacy
davebren 10 hours ago [-]
> What is the difference between what LLM's do and "true" understanding?
The thing where you can understand the meaning of this sentence without first compiling a statistical representation of a 10 trillion line corpus of training data.
Unless you're an NPC of course.
smashers1114 10 hours ago [-]
I mean brains get a lot of training data too in order to understand language. I don't think you provided a relevant difference.
Or rather, maybe I don't understand what you mean :)
davebren 10 hours ago [-]
When you think about the word apple and what it signifies, what do you experience? Is there a feeling of "appleness"? Do you think that sense of meaning is equivalent to the numerical weights of an LLM?
smashers1114 7 hours ago [-]
When you think about the word apple and what it signifies, what do you experience?
So I have all sorts of associations with "apple" and spent a little time playing with it.
First in a raw physical sense I can imagine an apple in my head, spin it around, imagine its physics with near cylindrical symmetry etc. A red apple is what first pops into my head, although of course I know there are many apple variants and have opinions on their taste etc.
There are many cultural associations I have with apples from Newton to George Washington. The company Apple has its own set of ideas that I interact with when I hear the word.
In other words I can think of various associations I have to the word apple of various strengths. These associations and strengths are functions of my experience encountering the word and actual apples.
Is there a feeling of "appleness"?
I don't really know what this would mean. I would say no, unless it can perhaps be defined what appleness means and feels like. I don't really notice any strong set of emotions or feelings from this thought exercise.
Do you think that sense of meaning is equivalent to the numerical weights of an LLM?
Again I think I would need a definition of "sense of meaning". I don't seem to derive a singular pointlike meaning when contemplating a singular word. I never was contending that human and LLM cognition are exactly equivalent, but I could see these association strengths being represented in LLM weights. I would say then if an LLM has similar association strengths with "apple" then it "understands" apples as well as I do. Of course this is really hard to test, but frontier models could give you all sorts of apple facts and cultural associations and so on. It may slip up and hallucinate, and I'm sure that I also believe at least one false thing about apples.
So what is your brightline between LLM and human understanding in this example? I assume that your line of reasoning would argue that LLMs do not understand apples. Why don't LLMs understand the word "apple?
davebren 7 hours ago [-]
It sounds like you don't have the subjective experience of meaning that most humans do, so maybe that would explain why you don't think there is anything beyond associations. Maybe this is the core difference that's determining how people see LLMs.
I'm not sure how I would convey what meaning and understanding is to someone if they don't experience them. This is my poor attempt though: There can not just be associations there need to be "things" to associate between. Otherwise you have no ground, it is all map and no territory. Ultimately it would just be meaningless associations between meaningless symbols.
10 hours ago [-]
enoint 12 hours ago [-]
That’s one possibility. If it fails to convince a critical mass that it’s a net improvement in their lives, then the impediment to continual improvement will be sabotage.
KalMann 11 hours ago [-]
I think there's been natural but steady progress with since 2024 with the release of the o1 model, which showed impressive reasoning capabilities. But I think it's wrong to look at the magnitude of the accomplishments and assume that will be field independent. We don't know the range of problems reasoning techniques are useful for. What we see here is refinement of capabilities that have been noticeable for years.
layer8 10 hours ago [-]
> everything we care about
One qualitative distinction that remains for the time being is that humans care about things while AIs do not. Human drive and motivation is needed to have AI perform tasks.
Of course, this distinction isn’t set in stone.
rzmmm 12 hours ago [-]
Maybe after decades. 2022 models were microscopic compared to latest models.
bigstrat2003 6 hours ago [-]
> What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
Well, there's the fact that it hasn't yet improved since what we had 3 years ago. That doesn't really bode well for the prospect of future improvement, though it's not technically impossible.
pjs_ 6 hours ago [-]
by what metric has it not improved in the last 3 years?
gowld 9 hours ago [-]
It depends on if AI can invent cold fusion before running our of all the energy on Earth.
xandrius 12 hours ago [-]
You should really look up a video about what GPTs fundamentally are.
Rover222 12 hours ago [-]
You should also really look up a video about what neural synapses really are.
reactordev 13 hours ago [-]
I dunno, I'm skeptical without proof. I've had the MAX+ plan for a while and I'm sorry, the quality between GPT vs Claude is night and day difference. Claude understands. GPT stumbles over every request I give it.
nathan_compton 13 hours ago [-]
Weird thing to say about a report which literally has the attached mathematical proof.
13 hours ago [-]
reactordev 13 hours ago [-]
Except its not a proof. It's an existential proof of what? Projecting points and loosing density? Nah, it's wrong. At least with Edros you could solve f(x) or not solve it (inf). You can not with this. All they did was balance a really fancy quadratic equation. The projection from C^f to R² doesn't demonstrate geometric injectivity, so nⱼ = |X| isn't established, and the bound collapses.
cwmoore 5 hours ago [-]
From the meandering and self-loving article:
“ For decades, it was widely believed that this rate was essentially the best possible, and no construction could improve significantly over the square grid. In technical terms, Erdős conjectured an upper bound of
n
1
+
o
(
1
)
n
1+o(1)
in which the additional
o
(
1
)
o(1) indicates a term tending to
0
0 with
n
n.
Our new result disproves this conjecture. More precisely, for infinitely many values of
n
n, the proof constructs configurations of
n
n points with at least
n
1
+
δ
n
1+δ
unit-distance pairs, for some fixed exponent
δ
>
0
δ>0. (The original AI proof does not give an explicit
δ
δ, but a forthcoming refinement due to Princeton mathematics professor Will Sawin has shown one can take
δ
=
0.014
δ=0.014.)”
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
And by opening the door to LLM-generated results, you'll see greater and greater amounts without any hope of ever navigating this field again without machine help.
It's a little like a software project which more and more gets extended by a AI agents with less and less review by human software engineers and in the end the complexity and spaghetti design are so incomprehensible by humans that the maintenance requires an AI agent. The risk is that math as a whole (the field itself) will experience that effect.
Say we achieve interstellar travel, but nobody actually knows how it works.
Or we cure cancer, but the "cure" requires a microrobotic implant, and it runs as a blackbox AI, and only the other AIs can make one, and there's no guarantee they will know how to make one tomorrow.
Or we solve global warming but it requires giant cooling machines running 24/7 and again, nobody knows how it works, but with the added bonus that the planet is cooked if they ever stop working.
Looks like you're pretty sure of that. Every time I see argument like this delivered with confidence I wonder how is it different from, say, digital calculators. Or better yet, books - Greek philosophers moaned that young people will stop understanding anything and just check books when they want to know anything.
Knowing the history of the humankind is what makes me pretty sure of that.
> I wonder how is it different from, say, digital calculators.
Did a single digital calculator stop any ongoing war, or liquidate a psychopath who orders people to go kill and die?
It's hard to describe the feeling of seeing intelligence being delegated increasingly to AI. If that's not a pivotal moment, a revolution, I don't know what is.
True, but it is possible to assemble a team of people that does, with backup for each person. There's also teachers and written knowledge to educate new team members. That's what makes it resilient.
I think that's a very different situation from what's decribed.
Green energy and transport technology is now at the point where people save the world and get rich trying, just as fast as they can build the factories.
Food's climate impact is harder, because the problem isn't technical, it's convincing people to give up beef (and other things, but mostly beef).
* quantum mechanics and general relativity are famously difficult to get to grips with
I think there will be regulation that requires some users of AI to provide an explanation upon request. For instance, banks could be required to "explain" why you didn't get that loan. What if the decision is based on a credit score that includes some AI prediction that ultimately relies on the entire training corpus?
The bank can give you a list of factors that play into the decision but they may not be able to explain deterministically why a very similar customer did get that loan. At that point I think we're going to resort to statistics that prove a lack of bias against certain protected characteristics, but that's not really an explanation, is it?
I think we will never get useful and complete explanations for everything that AI does. Society will just accept some explanation-like thing or proxy and move on.
If they understood it 100%, what clarification is needed?
Why is it necessary to continue to increase complexity when we get better intelligence? Can't we find more simple solutions? Or at least more explainable.
In case of AI we have a better chance to understand what it is doing through chain of thought and explainability. Nature never gave us that..
What your describing is already how a lot of science, technology, and engineering works!
That’s not “solving it”, that’s putting a bandaid on it. Solving it would mean correcting the underlying issue to the point it’s no longer a problem which requires maintenance.
Managing symptoms is not curing the disease.
https://youtu.be/pfNS2kWf5cY?si=SH6_QC0bCspV-ngz
There are comments that truly reveal a future horrifying and true. Few of them. But I count yours among them.
But I’d argue also that airplanes already achieve this complexity to some degree as well as microprocessors.
I mean, microprocessors have been on the "impossible to bootstrap from scratch in a short period of time" for 20 years already.
I do think we will need to find a way to get away from publishing papers. But I thought that before the AI came along and made mediocre papers something you can produce in a day. The academic system seems utterly incapable of self-correcting on this point though. We haven't even managed to get rid of for-profit publishers. So how this all will go down is anybodies guess right now.
https://youtu.be/Uc2zt198U_U?si=OkwO3xT8-zhSABwh
For example, this library here for deep learning is 100% ai generated and far beyond my technical capabilities.
https://github.com/computerex/dlgo
AI is going to both help and hinder this process though. At the end of the day, mathematics is mostly a social process at this point. The goal is not raw number of theorems proven, it’s how proving theorems affects the working operational models of mathematicians. Only a rare few new theorems in mathematics nowadays have direct real world applicability.
If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
It seems like if AIs can prove and index a huge number of (largely uninteresting to humans) things there might be sort of "parallel cultures"? Big results are most valuable to humans and AIs both (most context efficient!), but a very large number of less general but still non-obvious results might be an effective approach to solving problems?
Has this ever been different?
Math is abstract, rightfully so. It does not have to have direct applicability. Understanding builds over time and applications eventually follow. Number theory used to be a fringe "pure" theory field without applications for the longest time. If we'd only be interested in (and thus fund) what has direct applicability then society would be much worse off.
Side note: I recall my high school class mates rolling their eyes in every math class with "when will I ever need this in my life?" never asking the same question about PE or history or art classes. Now they struggle with their tax return and are routinely getting screwed over by loan sharks. But make no mistake, they can be proud of their A for hitting the goal 5 out of 5 times during soccer in PE class.
I am no mathematician and very naïve about this, but in a world that is rapidly becoming extremely calculation and network dependent that sounds hard to believe.
> If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
I think the idea here is that all mathematicians will just be using AI for their future work so they don’t really have to absorb it as long as it’s in the training data.
> I am no mathematician and very naïve about this, but in a world that is rapidly becoming extremely calculation and network dependent that sounds hard to believe.
I am a mathematician. It is true. The key is we're talking about new theorems, and direct, current real world applicability. Some theorems that have no applicability now may in the future, as theory often precedes applications by a long way and the usefulness is likely to come from other things built on top of the new maths, and a lot of pure maths will never have direct real world applications but contributes to our overall understanding.
On the other hand, there are many applied mathematicians and theorists from other fields that mine new maths for applications to their fields. But they are almost always not the ones that come up with the new math.
Historically, of course, mathematics was always driven by the need to explain things. Many of the mathematicians from the 17th and 18th centuries were physicists (or, less commonly, engineers). But for the last hundred years or so that really hasn’t been the case.
To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.
You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?
I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.
Who cares if it is God's book or the machine's Xeroxed copy?
"The Book" is more interesting to me if I am the one coming up with the ideas to fill it in. Maybe this is a bit egotistical, but I'd like to think it is allowed to have a desire that you, personally, are contributing to something in a meaningful way. Like, if you are on a sports team, it'd be more fun to win a game if you were on the field than if you were benched, and I think that's okay. And ultimately I don't find dredging for proofs from an LLM particularly meaningful, nor do I see it as a particularly personal contribution, as anybody else could have done the exact same thing with the same prompt.
This isn't to say I wouldn't love to read the proofs in "The Book" for problems I care about, I just think I'd eventually get bored of only reading. And so its hard to be enthusiastic when this book is being built through an LLM.
Technology in general (smartphones, social media, search) even without AI is creating this feeling, as it shrinks the world and makes it less mysterious.
It's worse than boredom it's more like nihilism.
Then when you strip purpose and meaning from a human you get something very bad, despondency being the best case outcome.
This is a good analogy for AI work displacement. Probably would resonate with some of the college students who boo'ed Eric Schmidt.
I'm also afraid of a world where AI completely replaces human mathematicians, but if we remain collaborators, then that's a world I can still feel excited about.
Shifting from “human calculators” to machines for arithmetic is also hard to argue against.
I think what makes the AI transition difficult is it impacts a wide range of high-value activities that would have been implicitly assumed to always remain human.
I do have great trouble seeing how a pile of matrices is ever going to be capable of innovation. Maybe with sufficient entropy and scale, it will… The day that becomes practical will be a turning point in history.
Economically, goods and services are often priced based on labor/“value added” aspects. Lawyers’ fees aren’t driven by paper costs! If AI takes a huge bite out of intellectual labor, the future could become very different…
BTW, your book description reminds me of the 2025 movie “A.I”. I thought it was quite good.
You admit this possibility so I'm not arguing with you, but it seems far more plausible to me that we can build something better than the brain.
In the limit we can just grow brains and put them in computers anyway, then the debate is moot. That's a really hard problem but of course not physically impossible.
"All" a model is doing is predicting the next words, based on the statistical distribution of words it has seen similar to the ones read/produced so far.
We push a model towards a particular set of distributions through context. If I ask a model "What is the capital of France?", there is a non-zero chance it goes down the dad joke answer of "The letter F". The far more likely option is "Paris", because the joke appears much less often in training material, but if I wanted to be absolutely sure of getting a consistent geography answer I'd address that with additional context. We can add context via prompts, RAG, agents, skills and so on.
However, when training a model, we select the material. We could show it a lot more geography information (or dad jokes!), and skew the statistical distribution in the direction we wanted. We could also decide to design the system prompt towards the direction we prefer - which the user would interpret as "the model" - and so nudge the context model-wide. We can also construct the interaction to iterate on context with a specific framing and call it "reasoning".
In this specific example, you could therefore solve the problem by a) training skewed towards mathematical papers, which likely degrades performance in general and likely for the specific case too, b) train the user to provide better context/prompts for mathematical work, shifting the workload to them which feels very "a la 2024", c) publish agents and skills that are tailored to mathematics work (very "a la 2026"), d) tweak the system prompt for when the model is doing mathematics work, which the user would see as "the model" doing the change, but you and I might look under the hood and say that is in the harness or a specific type of prompt, or e) add "reasoning" execution that is set to focus on mathematical formatting, or f) a mixture of the above.
Right now we're probably looking at agents and skills. I think over time we're going to see smaller models targets towards domains with a mixture of all of it, where some of this sits at user configurable levels, and some is "baked in" via training, system prompts and execution modes, but from a user perspective it's all just "the model".
Always, always always, the problem with research and development is leadership, not insufficient supportive technology. It is a political problem, there is absolutely, positively no shortage of technologies to support research. Your optimism is totally misplaced. The NSF funding cuts have negatively impacted math more than AI has benefitted it. And guess who supports the administration that cut NSF funding? The people who ousted the PhDs from OpenAI.
You are right to point out that the ones who fully own and pilot the machines all belong to the “fuck science and humanity as a whole” group. So the likely outcomes don’t look good.
Echoes the early promise of the internet vs the eventual state and consequences of it, although seemingly primed for far more dire and deeply penetrating consequences.
No interest in human advancement, just attribution.
What I’m saying is that the ultimate goal of those in power are not these sorts of altruistic or even scientific pursuits, and that the massive labor disruption and hyper concentration of power in the hands of those who are proving time and again that advancement of science and benefiting the whole of humanity are actually antithetical to their goals is likely a bad thing.
Most homeless people have smartphones, and consistent access to food and clean water.
Your average 'poor person' in America has HVAC. An unimaginable luxury in the EU
Eh, don't be silly. In the places where the summer is hot enough (or, more precise, where it used to be hot enough), I have seen plenty of AC units on shabby buildings, even on old Commie apartment blocs in Romania.
AC is not that expensive.
That's true. But. Maybe you've seen the Oppenheimer movie, there is a moment where Oppenheimer shakes Teller's hand, basically after the guy ruins Oppenheimer's life in a completely immature betrayal. That's what people are angry about, the academy community is Oppenheimer's wife asking, why the fuck did you shake his hand?
At least regarding leadership and funding, I don't know if it's a matter of likely or unlikely outcomes. It's just facts: these guys are collaborators. The commenter might very well have zero graduate students starting next year. What pisses me off is the utter obliviousness that STEM people have about how deeply political their work is.
And perhaps this is the real reckoning for the mathematics community. Not the possibility that AI is going to replace their jobs, it's not going to do that. But that having these intensely myopic and disagreeable personalities mean that basically zero leadership skills have been nurtured in the mathematics community. You cannot name a single politician who is a mathematician. You have to be elected to have power in this country, it's that simple, there are way more billionaires than there are presidents! Leadership is far more scarce. So that's why these disputes matter, and while it's great that people engage on Hacker News about it, it's intensely disappointing that "reduced science funding is really bad" gets downvoted.
That is a result of Hacker News's emphasis on this very 2010s view that it wants to be a place where the math nerds gather (in @dang's words) - he doesn't get that the quality of the discourse was caused by great leadership at many political and academic levels. Nobody credits how much better leaders were during Y Combinator's biggest success stories, or how much we overvalue the intellectual powers of math because it makes money as opposed to enlightening our view of the world.
Along with all the rest of what humans find meaningful and fulfilling.
Moreover, truth be told, I don't really see myself doing any less math and requiring less from my skills. At least from the moment I've begun incorporating LLMs into my research workflow to now, the demand I've had from my own skills has only grown. At least in an era prior to Lean formalization.
Humanity is having those discussions, heck you are in one RIGHT NOW not some Hollywood future.
What is coming of those discussions is the ownership class balks at the idea of raising their taxes (see recent interview with bezos), and therefore balks at the idea that you or I should have any value beyond what we produce... And if AI can replace you or I, well how do we survive if we can't produce in a technological society?
Money is valuable only as it changes hands for goods/services, and if you want to get rich, on top of having/producing/controlling something everybody desires, you also need as many people as possible to have money to give you in exchange for a piece of that something.
i wonder if this is physically/mathematically impossible: the mere act of living involves processing energy, and therefore doing work :)
And there is a lot of energy to be processed in this Universe before the heat death...
Mind you, there are places in the universe that we have no way of knowing ever existed... The non-obserable universe if you will. For when physicists talk of the observable universe, it is only the fraction we have any chance of receiving data/light/radiation of/from
This "any" shines like a thermonuclear fireball.
In the (probably unlikely) event that AI use results in a post-scarcity economy in which there's no need to work to survive, a lot of people wouldn't regret sentiments like the ones in question.
On the contrary, it would mean they could work on whatever they please, including potentially standing on the shoulders of giants - the AIs - and seeing even further.
If we actually worked to create a society that work for the benefit of all its members, there would be a lot less reason to worry about developments like these. Much of the worry arises because for various reasons - none of them really good ones - we've ceded control of these developments to the people least suited to manage it.
To a society that provides a livelihood to all humans, equally?
For, I would love to hear how we get from here to there during an era with the largest wealth disparity ever seen in human history. (Yes, it's worse than the robber Baron era of US history). For I have yet to see any signs that the capital/ownership class has any intentions other than vacuuming up even more wealth and power for themselves. And that anathema to your desired outcome.
History is full of examples of situations like this being corrected, at least to an extent. If we learn from those, we can do even better next time around.
Btw, the inequality you mention is far worse in the US than Europe. Here's one source that covers this: https://wid.world/es/news-article/why-is-europe-more-equal-t...
This demonstrates a point that should be obvious, that better societal choices can produce better outcomes.
Even taking a purely Kantian interpretation that would scale this beyond mathematicians - and that itself is a logical leap! - making a universal law out of "a discovery can be beautiful regardless of whether created by humans or AI" is is much more specific than the straw extrapolation you've created.
"Let's try to think posts through."
If 20% more medical knowledge would save more lives long term, there are actually people, probably some browsing this website right now, maybe the person you're responding to, that actually think killing people up to the expected number of lives saved is justified.
I would personally call that evil, but it is thought through.
This is just an application of the philosophy "automate yourself out of a job every 6 months"- I've been doing that for a long time, and the outcome is generally a more interesting job.
The answer is that we simply need to decouple the "right to exist" from "worth."
You should have the right to exist and explore the world simply because you're human, not because you can use your skills to provide some sort of transactional value to someone else. Deprogramming so many people is going to be hard...
Let's start with the first practical step: how do you dethrone the psychopaths in charge of the world who own about everything on Earth and have all the world's lethal force in their pockets?
Not so many years from now, some of them will surpass you. A few years after that all (that survive to that point) will surpass you.
Does that terrify you just as much?
A child is a living, breathing, growing, and changing conscious entity. It is the natural order for the young to supplant the old, no matter what the politicians and billionaires desire.
"AI" - terrifies anyone who understands the pact our society rests upon: that labor is valued and can be exchanged for goods and services to survive. Thereby enabling a person to support their families without having to do everything themselves.
If AI replaced a noticeable fraction of society, destroying their capacity for work. That threatens and ultimately blows up this compact between working class and capital class... With it, the foundations of a modern technological society.... It may sound like hyperbole, or some fantastical prediction. But really it is basic economics, like econ 101... And personally the last few years have terrified me, not because of AI directly, but because how ignorantly blind many smart and tech savvy people are... You are marching us to collapse with a smile on your face...
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
PS: I think you're being unfairly downvoted. Your question is not invalid and deserves responses, not downvotes.
A dedicated engineer is always looking to automate themselves out of existence, so that they can move on to the next thing to automate. Ongoing repetitive work is less engineering and more akin to toiling on a line.
It may be the beginning of thinking, but to many who view things on a longer timeline. It starts to look like it will breakdown the frameworks of which are required to get to that position. Otherwise, you just end up retreading explored ground. This removing the joy of discovery from any humans hand/mind.
Perhaps your name-calling is not actually as logically grounded as you think. It definitely seems to depend on unfounded leaps.
This technology is solving interesting math/physics problems for us, which is completely different.
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
So the crossdomain pollination that used to exist in scientists is not only not encouraged. It's also actively punished by society.
Can you explain more what you're referring to, because this has not been my experience at all. Heck, when I went to college, cross disciplinary majors were all the rage.
I think the thing that is just factually difficult is to actually become skilled in multiple different domains, precisely because the level of study/practice/rehearsal to become proficient in any individual domain keeps going up.
A long time ago you could be a Renaissance man by essentially dabbling in different fields. But today, as this article points out, you need extremely deep expertise in any one area just to understand the status quo - this proof required extremely deep expertise in two separate areas that mathematicians were surprised to be related at all.
And this is where machines, such as these reasoning LLMs, can help. Because they can remember patterns across many domains and try absolutely bonker weird connections and ideas.
We, the humans still have to verify the work (at least as of now). But, the "maybe this tool, or idea, or trick, from that completely unrelated field applies here" reasoning/experimentation could become much easier.
I have always said this and will say it again: reasoning is just experimentation with a feedback loop and continuous refinement.
What makes me more of an optimist in this case is that people who today decide to go into these sciences are mostly people who are driven by intellectual activity so I feel they are the right ones to figure this out, probably more so than us the engineers.
I hear some specialists (specially multi-disiplinary ones) write things they know few or no one can read. (Which is the most ironic reason for being rejected by a journal)
I recall a funny moment on irc where a truly helpful guy moaned that no one helped him when he had a (programming) question. He was very good at many programming languages and worked in some mix of high level physics and mathematics. He posted SO questions that rarely got an appologetic response from someone able to understand the code and the physics but couldnt wrap around the math. lol I hope he finally gets some help with his wizardry.
As we're becoming hyper specialised, they become an invaluable tool to merge the horizon in, so to speak.
I don’t think that this model works anymore though.
Also, I love the expression “merge the horizon in”. Being a non native speaker of a language is so nice some times. Thanks!
I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.
Personally I'm a more of a breadth person and I could never compete with peers who where more of the depth type of person at college.
But I get satisfaction from connecting things that feel irrelevant on first sight, that's what drives me.
Cool thing is now when someone contributes something to the hive mind, it can instantly be applied to any other problem people are working on.
Similarly, we're creating tools to improve knowledge, but we're progressively zapping the human out of the equation. Knowledge is created for something, but it's unclear if very soon humans will be able to understand it, or really benefit from it, except billionaires, etc.
It's too bad that we're not improving humans nearly as fast as we're replacing ourselves.
Can a tech news stay a tech news, without getting bombardes with leftist subtexts all the time?
What was the process of a writing a paper? Was the question asked by a mathematician? Was the paper right from a get-go or was there someone who pointed out mistakes?
How much attempts were made before solution was found?
I will eat my words if an AI oneshotted that one without any external help, but for know I am left wandering whether it's a new way to attribute discoveries to companies instead of people who put the work in
As per the report, the prompt used to solve the problem is AI-written and the solution was initially graded by an AI grading pipeline. They don't say this explicitly, but it seems like OpenAI has an automatic pipeline where they prompt models for solutions to famous math problems (which wouldn't be unexpected given how flashy a solution to a famous math problem looks)
> Was the paper right from a get-go or was there someone who pointed out mistakes?
Also as per the report, the output of the model isn't really a "paper"; it's a very terse 2 page solution which is apparently correct. The paper was later written based on this solution to make it more presentable.
> How much attempts were made before solution was found?
Given that this appears to be from an automated pipeline, I would say that it had many attempts. But either way, the blogpost says that with enough test-time compute, the model finds this same solution 50% of the time.
[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
Nevertheless new maths is exciting and might lead to what I find slightly more interesting - new physics.
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
This is the caliber of thinking in unimpaired AI bullishness.
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
[0] https://en.wikipedia.org/wiki/Manna_(novel)
Depends on what you're ordering and who the cashier is.
If your order is the happy path of no customizations of a combo with an experienced cashier, it can be done in seconds, for sure. "Medium #4 with a Diet Coke", pay, done.
But if you customize your burger or ordering a lot of items a la carte and you're dealing with a new cashier that has weak English skills, good fucking luck. You'll likely need to wait for them to figure out they need to call someone over to help, have to repeat your order, and you end up spending far more time.
> it keeps trying to upsell you
Yeah, I'll agree that's obnoxious, especially when it's trying to upsell you something that's already on your order. I ordered a combo. I don't need you to add another fry.
I have had them run out of receipts, but it’s never mattered for me. If I’m dining in, the plastic number you carry to your table makes sure I get my food. And if I’m taking it to-go, they always find me anyways.
I'm not sure how that could be. I can walk up to the counter and say "Big Mac Large Fry Small Coke" faster than you can navigate the first screen of the kiosk, and a skilled counter worker can key that in and be done before I even get my credit card out.
If it's purely about the food, receiving it, consuming it, then sure, get the human out of the loop, interact with a machine. Ideally even the preparation is done by a machine. No human error or hair involved. Why even go there, let it be delivered to your home.
But these places are also about the experience of social connection. The bar keeper, the waiter, the chef. They are all involved in this experience and the actual food is "just" one component, one detail, albeit an important one. My favorite restaurants would be nothing without the people there.
It's similar with music. It's not just about the produced sound waves. The musician forms a social bond with the audience. Even when listening to a recording, my mind is re-living or at least imagining a live sitting, that connection with the musician. No machine generated music will ever be able to replace that.
Other places optimize for this better by not having too many hand-overs between order and preparation.
It seems designed to maximize how many screens they show you to make an order. Each one with a slight delay and animation.
At a drive through I can say “gimme a number one, medium, with a Coke Zero” and they give me my total. That’s the convenience the kiosk is up against.
At the kiosk there’s:
- A welcome screen you have to tap
- A “carry out or dine in” screen
- Always one other screen with a dumb question about apps or whatever, tap through
- A top level menu with a bunch of categories, burgers, drinks, sides, desserts, etc… I guess I want burgers? But it’s a combo, hmm. I guess I’ll figure out how to make it a meal. Tap burgers.
- Then another screen with burgers, in a different order than the drive through numbering, tap Big Mac
- Then another dedicated screen to shows you a picture of a Big Mac, with a bunch of customization options, which you have to scroll past and verify that it matches the defaults you expect, and at the bottom you can tap add
- Then another screen asking you if you want to make it a meal
- Then another screen asking the size
- Then another screen asking what to drink
- Then another screen that shows you the drink
- Then another screen for what size
Etc etc etc. Each of these screens takes a few seconds to display too, just slow enough to be infuriating.
In my mind the ideal kiosk is something where you get “the menu” (like what you see on the billboard in the drive through) with the usual big squares with a number on them and a picture of the meal. Tapping one puts it in a “drawer” section with my order in it, and each item in the drawer can have simple in-line edit controls for “size” and “what to drink”, with them showing up empty in a way that makes it obvious I need to fill in those answers before I can check out.
I should be able to tap one button for the combo number I want, another for the size, another for the drink, then checkout, all on one screen without long delays. If I don’t want a combo but want individual items, I can just scroll down a bit to look at the full menu. The order drawer stays where it is.
Or hell, just let me say “number one with a Coke” and have a very simple ASR and NL parser figure it out and put it in my pending order to edit.
Customizations can be behind a simple “customize” button on each item in my pending order. If I don’t have customizations I can just ignore it. What you get with no customizations is what you’d get if you just order it verbally to a human without specifying anything. The concept of “here’s how we typically make it, if you want anything different let us know” is a very deeply ingrained and familiar concept to restaurant patrons, and being forced to answer every little question even if you don’t care, adds up to a lot of frustration.
Fast food places came up with the combo numbering system to make ordering faster, and it was super convenient and fast, because there’s a financial incentive to get you through the drive through because you’re blocking other customers. But since they have several kiosks available, they seem to not care at all about the efficiency of the user interface, because it’s not a problem for them. But it’s still a problem for me, because I still want to order quickly, despite it not blocking other customers. It’s a huge step down from just saying “number one with a Coke”.
Most repeat customers use the app, which sports the digital equivalent of a loyalty program, and various coupons. And lets you save your 'usual' order with customizations etc. Plus the annoying push notifications for FreeFrydays or whatever. And upsells, new product launches, etc.
My recollection is that the kiosk is just a weak facsimile of the app. And wasn't terrible, but everyone's standards vary.
Which is why I will never reinstall their damned app.
There's much more to being human than our "cognitive abilities"
Not obvious and in fact I think the opposite is way more likely. Chess is well-defined and self-contained in a way that managing a restaurant with fleshy customers never will be.
Also, there will be hundreds of disparate tasks that are happening in parallel, and even humans still make up frameworks to discover most urgent/important work that needs to be done first.
All AI proofs so far, including this one, are using existing tools in new ways, rather than inventing new tools. This is not surprising if you know how these models are trained. These existing tools are in distribution. New tools are not.
Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.
We could get this weird spiky situation where the AI is insanely superhuman at all problem solving, but completely incapable of coming up with a single new tool. It discovers everything there is to discover, subject to existing axioms and concepts.
Timothy Gowers gives some commentary on this in the attached PDF.
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't know how much training with Lean helped this particular model.
https://www.anthropic.com/research/project-vend-1 https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...
(Two different examples of a similar idea)
[1] https://andonlabs.com/evals/vending-bench-2
https://en.wikipedia.org/wiki/Qualified_immunity
Assuming you can still sue McDonalds I am not sure if this is a problem in the robotic llm case. I'm also trying to imagine a case where you would want to sue the llm and not the company. Given robots/llm don't have free will I'm not sure the problem with qualified immunity making police unaccountable applies.
There already exist a lot of similar conventions in corporate law. Generally, a main advantage of incorporation is protecting the people making the decisions from personal lawsuits.
That only requires someone own the ai managed McDonald's though. so long as they can't avoid responsibility by pointing to the AI I don't see why you couldn't sue them.
Police are a monopoly; nobody has a choice about which police company to use. McDonalds are not a monopoly, and many customers would prefer to eat at competitors run by entities that could be sued or jailed if they did anything particularly egregious.
The same intuition applies if you walk into McDonald's and a person there mistreats you. You want that person held responsible.
But the LLM is not a person. What is there to even sue? It just seems like it would simply pass through to the corporate entity without the same tension of feeling like we let a human get away with something. Because there is no human, just a corporation and the robot servicing the place.
Put another way - if the LLM is not a person, what is the advantage of a personal lawsuit?
Just sue the McDonalds. Even in a case where the LLM is extremely misaligned and acts in a way where you might normally personally sue the McDonald's employee, I'm just not sure the human intuition about "holding someone accountable" would have its normal force because again - the LLM is not a person.
So given we already have the notions of incorporation and indemnification it doesn't make sense to say what is precluding LLMs from running McDonald's is they can't be sued. If McDonald's can still be sued, then not only is there no problem, there is very likely not even a change in the status quo.
and LLM's are getting better at providing less of it
perhaps in the future the GPU-poor can go to McDonalds and get AI to solve their riddles by ordering an extra napkin with the solution written on.
The purpose of qualified immunity is for when an officer does something that turns out to be illegal but they were both told to by their superiors and did not think it was in violation at the time.
An officer making a choice to violate your rights would not be eligible for qualified immunity.
Excellent standards for people authorized by the state to run around with a badge and a gun in a free society. Your comment history on this is so unimpressive. Would you countenance the same excuses in anyone else? A man puts on his police uniform and suddenly you think he should be immune from civil prosecution because "my boss told me so" and "I didn't know"?
I wonder if you will make similar excuses for robo cop. Or if your principles merely extend to whatever human you can find in uniform willing to tolerate your friendship.
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space
This future still sucks. The tech industry is making the world a worse place.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
For example, to prove something is impossible let's say you first prove that there are only 5 families, and 4 of them are impossible. So now 80% of the problem is solved! :) If you are looking for counterexamples, the search is reduced 80% too. In both cases it may be useful
In counterexamples you can make guess and leaps and if it works it's fine. This is not possible for a proof.
On the other hand, once you have found a counterexample it's usual to hide the dead ends you discarded.
For proving a proposition P I have to show for all x P(x), but for contradiction I only have to show that there exists an x such that not P(x).
While I agree there could be a lot of theory crafting to reduce the search space of possible x's to find not P(x), but with for all x P(x) you have to be able to produce a larger framework that explains why no counter example exists.
Reductio ad absurdum is a technique to prove something.
No this will never do the kind of math that humans did when coming up with complex numbers, or hell just regular numbers ex nihilo. No matter how long it's given to combine things in its training data.
Assuming humans are more powerful than regular languages I could maybe agree that these methods may not eventually yield entirely human like intelligence, but just better and better approximations.
The vibe I get though is that we aren't more powerful than regular languages, cause human beings feel computationally bounded. So I could see given enough "human signal" these things could learn to imitate us precisely.
Do you pass that bar yourself?
To be very specific - what novel things did the majority of the ~8 bil humans on Earth do say, yesterday, that you wouldn't otherwise dismiss as non-intelligent rehashing of the same tired patterns they always inhabit were those same actions attributed to LLMs?
What I'm getting at is that I think you're falling into the trap of thinking of the rare geniuses of human history, and furthermore their rare moments of accomplishment (relative to the long span of their lifetimes filled mostly without these accomplishments) when you think of "human intelligence", which is of course far overstating what actual human intelligence is.
> that you wouldn't otherwise dismiss as non-intelligent rehashing of the same tired patterns they always inhabit were those same actions attributed to LLMs?
Regardless of whether something's been done before people still come up with them on their own without directly copying or amalgamating several copies. Pretty much every skilled profession includes figuring things out on the fly through the use of general reasoning that doesn't involve pattern matching against millions of examples.
Much, if not the majority of synthetic data is AI generated. Human experts then evaluate samples of the data, but nothing like the entire corpus which can be trillions of tokens of generated material.
See here where Qwen team discusses synthesizing trillions of tokens for their pre training dataset - https://arxiv.org/html/2505.09388v1
> The rare geniuses of human history use a different magnitude and configuration of the same kind of human intelligence
I agree. What I don’t see any strong evidence for is that this intelligence is unique to humans. Nor do I see how it could ever be anything other than recombinations of existing data with random mutation. Where else would the building blocks for each invention come from, divine insight? We build on the shoulders of giants etc etc
Worth noting, as a sidebar, that we’re having this discussion on a post mentioning a novel breakthrough made by AI over a topic that many brilliant human mathematicians including Erdos himself failed to do.
> Regardless of whether something's been done before people still come up with them on their own without directly copying or amalgamating several copies.
I’m not even saying it in the “there’s nothing new under the sun” sense.
If you follow an average person’s day from beginning to end. Let’s say in Bangkok or NYC or Paris, at which part of the day are they not simply repeating a variation of something they’ve done many times before, or seen others around them do before, or read about others doing before, or heard about others doing before, watched others do before on TV etc etc
What you have left, how is it distinguishable, without reasoning backwards from the desired conclusion of human exceptionalism, from turning up the temperature on an LLM query?
How many data points does a human parse when they attempt to stand up as a toddler? Sight, sound, sensation from every limb and body part, inner ear, internal thought processes at the time conscious and unconscious related to the moment and attempting to interpret it in relation to all that it’s experienced to this point, including all prior attempts and whatever retained associated data, a hard to even comprehend stream of data, coming in continuously over however many minutes, hours, etc of attempts.
The stream of data the brain is processing from both external and internal sources from birth is incredibly rich, and if we attempted to represent the full depth of it it would far outweigh the size of any corpus models are being trained on now.
I think what may be genuinely missing from AI is the type of data that doesn’t translate completely into text. The audio and images/video we feed in are a totally incomplete slice of the POV of say even a single average human through their lifetime, and bereft of all the associated data a human has access to in the moment (sensory etc).
I think this tends more towards the world models that Yann Lecun et al are promoting as the key to more capable AI.
LLMs approximate a lot of that very well by simply having seen it before.
Also watch kids develop language: they learn patterns with much less training data than LLMs.
> novel to us every single day. Like navigating a shopping cart through tricky coridors in a store
We have been practicing navigating the physical world for something like 16hrs/day every day from the moment of our birth. All the sensory data passing through our brains during that time is far larger than any dataset an LLM is trained on.
Humans navigating a shopping cart at a store have likely navigated the physical world before, pushed a shopping cart before, and in combination have navigated stores while pushing shopping carts before. Nevertheless, many still bump into objects all along the way.
Them succeeding at successive variations of store layouts is not novel unless we expand the definition of novel to mean any recombination whatsoever of pre existing concepts.
I’m certain that with all the intense usage of AI by hundreds of millions of people, there have been countless collections of words passed to LLMs so far that have never before been uttered in exactly such a sequence, let alone in the dataset.
I’m equally certain the LLMs have responded to those words with collections of its own that have also never been uttered in that exact sequence, responding to their unique context.
It is trivial to produce an example of this now yourself if you’d like.
The LLM we’re talking about, mentioned in the OP, has never seen this solution to this problem in its dataset. A large number of brilliant mathematicians were not able to discover this solution. They are themselves expressing that this is a novel breakthrough and had this come from a human it would be treated as such.
If the response to that is “well it’s just recombining concepts it already knows until it finds a solution that works” I would ask how that differs from what humans do?
This is the bit that's missing that LLMs do approximate amazingly well through sheer training set size, but in my opinion, it puts a cap on what novel things they can achieve in comparison with humans.
To me, I've thought about a related "invention space" before: with us creating software to solve many problems people are facing, why are there not any perfect solutions for any problem (running a cafe? a CNC machine? ...), and we always need more software built to cover one small (novel?) change for a particular owner?
The world space is just so large that you need whatever this intelligence is humans (and animals) have to navigate it successfully — but LLMs do not intrinsically.
Whether they can be so large that it does not matter in 99.99% of cases is to be seen.
I very specifically addressed this in my response to you. How much training data is contained in 16 waking hours of navigating the world fusing all sensory data, never mind data being simultaneously generated within the mind while this is all going on, from birth til death? From birth til pushing that shopping cart?
Far, far more than in all the training datasets being used for AI.
I also addressed this again in my reply to the sibling comment.
People tend to discount how much data humans have passing through their minds 24/7.
A human isn’t born in a vacuum as a fully formed adult and dropped into the shopping cart navigation problem.
A human has had far, far more training data fed into it that contains all the pieces necessary to translate to pushing a shopping cart when first seeing it, than a machine learning model which has been fed 1 million videos of a robot pushing a shopping cart.
Its like just commenting "I disagree" its totally pointless for discussion.
That's why you're getting downvoted if you're wondering.
I appreciate very much the work done so far, but this sort of asymptotic/quantitative result didn't interest me much even when it was done by humans.
(This is not snobbery, just a personal preference.)
As a matter of fact more logic and structure to your work, the more easy it is for AI to conquer it. Due to this programming was the first thing that got solved, but pure sciences are next.
If what you do, and how you do can be written down on a piece of paper, then AI can do it.
I do believe programming getting solved will be double assault on these fields.
>>This is not snobbery
This is good for the species, what sense does it make to keep treating these fields like they are reserved for the top most intelligent micro percentage of humans? Getting LLM to these things gives some scale to these subjects and thats good.
So is AGI, but we may be hundreds of years off still.
Some times when you go some distance with a subject generates data for new ideas.
Once math gets done fast, newer ideas and paradigms also arrive.
> My only complaint is the claims always start spreading 6-12 months before the delivery.
If delivering on such promises "always" occurs 6-12 months after the promise, is that pretty good?
I generally like AI and use it plenty often, it does many things well and I'm curious to see how far it keeps going, but that doesn't mean I have to like overhyped marketing about it.
It's pretty much a 1:1 match to the "we're all unique snowflakes" meme, with an army of Buzz Lightyear toys repeating the same in the background.
Given its elementary nature (very easy to state), you can bet that a lot of very bright people have worked on it (I know of one MIT graduate who specialized in Geometry had a lot of interest in it).
Moreover, model output is incredibly good at looking credible but being wrong. It has NEVER produced something correct for me in a field of which I am an expert without some external oracle to validate claims (like e.g., Lean)
The world runs on trust, specifically trusting expert advice. It'd seem that due to resource constraints and scale, that's the best available option. By extension, there should be absolutely nothing weird or surprising on people following suit. It's why these companies themselves rely on expert counsel, and defer to their appraisals for marketing. The opposite is what's weird and unusual, and what requires more substantiation.
It's interesting that those who come out swinging against "trusting the experts", or really, trusting anyone else but them, not only ~never acknowledge this, but are seemingly outright proud of it, considering it as their own unique little trait, egocentrically revelling in it. It's almost as if epistemic rigor and truthfulness was not their actual concern.
Woohoo, I'm distrustful and cynical. Behold my unfathomable wisdom! Bonus points if they're also hurtful, because flipping the arrow on "hard truths -> hurt feelings" is a masterclass in reasoning too, of course.
I can appreciate faulting experts and organizations for misusing people's trust and looking out for this angle, but given how unavoidable and fundamentally useful trusting itself is, blaming people for defaulting to trusting makes no sense to me whatsoever. It comes across as just the usual trope of blaming the individual.
Note that I'm not disputing the validity of the counterexample itself.
There is serious magic happening in the construction of model context.
It still writes like a junior dev, in that despite AI being able to get a picture of an entire repo, it's changes are typically confined to the task it's working on and will opt to duplicate logic to keep changes contained. Again, technically works, not ideal.
BUT I have had great success using AGENTS.md and becoming better at prompting to get it to not be like this.
Basic approach in AGENTS.md: don't code defensively, yada yada, we have a validation layer at X, no need to check for anything behind that layer. Works well.
An approach I've found helpful when prompting: What would be the best architecture for this change? If you say "do X" it'll tend to just do the hackiest, shortest path thing. If you say, "what's the best way to do X?" it will think more holistically.
That said, who knows, maybe when it's PHP it just really wants to hack ;-)
(Also, yes, you still need to review the code -- it will still do stupid things, so you can't just be pure hands off w/o ending up with quality degredations. The same is true of humans too though in my experience...)
> The python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.
Since you’re not in a unique position, I can confidently state that your comparison of LLMs to jr developers seems unfounded. Today, LLMs produce code that is superior to junior developer code by an order of magnitude.
Notably, they demonstrate consistent syntax, clear separation of concerns, strong test coverage, organizational rigor, idiomatic API usage, and the ability to generate and maintain documentation, among other measurable qualities.
LLMs generally operate at a staff engineer level for a number of languages and ecosystems (including polyglot projects).
We have many folks (not engineers) at our company using LLMs to open PRs, and every one of these PRs has profound architectural design problems.
Comparing an LLM to a senior developer is an absolute joke.
2. Are you referring to without having a compiler or LSP check it? Although even then, the recent LLMs I've used still frequently get syntax right, whereas I'd expect juniors are often using a LSP or compiler to catch mistakes while writing code?
Is there anywhere an image example of a superior layout for example with n>={100,1000,10000}..? I would love to see it. I am imagining it would look somewhat like a sloppy pizza.
Mind showing your working out?
I think the more interesting question is how many tokens were spent all told; the most interesting graph in the article imo is the success rate by log test-time compute: how many tokens are being spent on the right of the graph to hit a winning CoT/solution like this >50% of the time?
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
That is not true and a complete misrepresentation of recent progress of AI in math. It is therefore not necessary to believe the conspiracy theory you described in order to explain recent progress of AI in math.
Congrats to the OpenAI team for one of the most significant breakthrough discoveries in AI history.
Really? Any references to read more?
edit: >> https://techcrunch.com/2025/10/19/openais-embarrassing-math/
The ability to find incredibly obscure facts and recall them to solve "officially unsolved" problems in minutes is like Google Search on steroids. In some sense, it is one core component of "deep expertise", and humans rely on the same methodology regularly to solve "hard" problems. Many mathematicians have said that they all just use a "bag of tricks" they've picked up and apply them to problems to see if they work. The LLMs have a huge bag of very obscure tricks, and are starting to reach the point that they can effectively apply them also.
I suspect the threshold of AGI will be crossed when the AIs can invent novel "tricks" on their own, and memorise their own new approach for future use without explicitly having to have their weights updated with "offline" training runs.
Are you asking me how LLMs work?
The theory proposed by the original commenter was that there could have been some secret training data the model was trained on that made it possible to solve this problem set. So the only conclusion is they are implying it's a conspiracy by OpenAI to hide some novel math research they funded merely to do marketing about solving math problems (then convincing multiple math experts to verify and support it with papers). That is the definition of a conspiracy.
In all seriousness though: My suggestion is that those shepherding the frontier of AI start acting with more transparency, and stop acting in ways that encourage conspiratorial thinking. Especially if the technology is as powerful as they market it as.
Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.
Note that this is not really true of this problem in particular.
1. They have a wide range of difficulties. 2. They were curated (Erdos didn't know at first glance how to solve them). 3. Humans already took the time to organize, formally state, add metadata to them. 4. There's a lot of them.
If you go around looking for a mathematics benchmark it's hard to do better than that.
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
Also, I'm not sure why CS people act like axioms are where you start. Finding them is very very difficult. It can take some real innovation because you're trying to get rid of things, not build on top of. True for a lot of science too. You don't just build up. You tear down. You translate. You go sideways. You zoom in. You zoom out. There are so many tools at your disposal. There's so much math that has no algorithmic process to it. If you think it all is, your image is too ideal (pun(s) intended).
But at the same time I get it, it is a level of math (and science) people never even come into contact with. People think they're good at math because they can do calculus. You're leagues ahead of most others around you, yes, and be proud of that. But don't let that distance deceive you into believing you're anywhere near the experts. There's true for much more than just math, but it's easy to demonstrate to people that they don't understand math. Granted, most people don't want to learn, which is perfectly okay too
Yes but that is because there was not enough text available to create an intelligent LLM to begin with.
We even think that the Babylonian astronomers figured out they could integrate over velocity to predict the position of Jupiter.
https://en.wikipedia.org/wiki/Adequality
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.
People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.
Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.
It isn't a secret, but the percentage of people who don't know that, plus the percentage of mathematicians who vaguely or more directly know that, but habitually use the broken, more difficult (i.e. less algebraic) notation is ... virtually everyone.
I am not trying to pick on calculus, this is everywhere. Important and useful concepts are right in front of all of us, that we don't see even in the context of what we are relatively fluent with.
Because we learn quickly, where we have (almost always inherited) the right preparatory perspectives (earned over lifetimes by others), we vastly overrate our ability to reason independently.
https://openai.com/index/better-language-models/
The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.
The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.
I would guess LLMs are limited in their ability to be genuinely novel because they are trained on a fixed language. It makes research into the internal languages developed by LLMs during training all the more interesting.
That Newton and Leibniz came up with similar ideas in parallel, independently, around the same time (what are the odds?), supports that.
https://en.wikipedia.org/wiki/Leibniz%E2%80%93Newton_calculu...
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
There are people working on this.
e.g. https://github.com/haykgrigo3/TimeCapsuleLLM
Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.
Imagine every bit of human knowledge as a discrete point within some large high dimensional space of knowledge. You can draw a big convex hull around every single point of human knowledge in a space. A LLM, being trained within this convex hull, can interpolate between any set of existing discrete points in this hull to arrive at a point which is new, but still inside of the hull. Then there are points completely outside of the hull; whether or not LLMs can reach these is IMO up for debate.
Reaching new points inside of the hull is still really useful! Many new discoveries and proofs are these new points inside of the hull; arguable _most_ useful new discoveries and proofs are these. They're things that we may not have found before, but you can arrive at by using what we already have as starting points. Many math proofs and Nobel Prize winning discoveries are these types of points. Many haven't been found yet simply because nobody has put the time or effort towards finding them; LLMs can potentially speed this up a lot.
Then there are the points completely outside of hull, which cannot be reached by extrapolation/interpolation from existing points and require genuine novel leaps. I think some candidate examples for these types of points are like, making the leap from Newtonian physics to general relativity. Demis Hassabis had a whole point about training an AI with a physics knowledge cutoff date before 1915, then showing it the orbit of Mercury and seeing if it can independently arrive at general relativity as an evaluation of whether or not something is AGI. I have my doubts that existing LLMs can make this type of leap. It’s also true that most _humans_ can’t make these leaps either; we call Einstein a genius because he alone made the leap to general relativity. But at least while most humans can’t make this type of leap, we have existence proofs that every once in a while one can; this remains to be seen with AI.
It's possible LLMs can handle this after all! But at least so far we only have existence proofs of humans doing this, not LLMs yet, and I don't think it's easy to be certain how far away LLMs are from doing this. I should distinguish between LLMS and AI more generally here; I'm skeptical LLMs can do this, I think some other kind of more complete AI almost certainly can.
I supposed you could just, I dunno, randomly combine words into every conceivable sentence possible and treat each new sentence as a theory to somehow test and brute force your way through the infinite possible theories you could come up with. But at that point you're closer to the whole infinite random monkeys producing Shakespeare thing than you are to any useful conclusion about intelligence.
Like, “take a random sequence of bits and interpret it as Unicode” is at one end of a scale, and “take a random sequence of words in a language” is just a tad away from it, and the scale continues in that direction for quite a while.
I actually don't know the answer to that; my understanding is that LLMs by nature of what they are can't understand concepts that are independent of the existing language they are trained on, but I don't have enough in-depth nitty-gritty knowledge of like, core LLM implementation details and architecture and stuff to know if that understanding is correct or not.
This doesn't make any sense, by their nature they can't "guess-and-check" things outside their training set.
And most of the mathematicians seem to welcome this "brute forcing" by the LLMs. It connects pieces that people didn't realize could be connected. That opens up a lot of avenues for further exploration.
Now, if the LLMs could just do something like ingesting the Mochizuki stuff and give us a decent confirmation or disproof ...
If you have a multi dimensional space, and you are trying to compute which points lie “inside” some boundary, there are large areas that will be bounded by some dimensions but not others. This is interesting because it means if you have a section bounded by dimensions A, B, and C but not D, you could still place a point in D, and doing so then changes your overall bounds.
I think this is how much of human knowledge has progressed (maybe all non-observational knowledge). We make observations that create points, and then we derive points within the created space, and that changes the derivable space, and we derive more points.
I don’t see why AI could do the same (other than technical limitations related to learning and memory).
https://g.co/gemini/share/065ffa89698e
Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.
For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.
How can we truly create something ? Everything is built upon something.
You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.
Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.
Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.
Humans and animals have intuitive notions of space and motion since they can obviously move. But, symbolizing such intuitions into forms and communicating that via language is the creative act. Birds can fly, but can they symbolize that intuitive intelligence to create a theory of flight and then use that to build a plane ?
It was a new concept, combining lenses to look at things far away as if they are close to. The literal atoms/molecules weren't new, but the form they were arranged in was. The purpose of the arrangement was new too.
Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.
We're not comparing math to reality (though there's a strong argument to be made that reality has a structure that is mathematical in nature - structural realism didn't die a scientific philosophy just because someone came up with a pithy saying), we're talking about if math is discovered or invented.
Most mathematicians would argue both - math is a language, we have created operations, axioms are proposed based on human creativity, etc., but the actual laws, patterns, etc. are discovered. Pi is going to be pi no matter if you're a human or someone else - we might represent it differently with some other number system or whatever, but that's a matter of representation, not mathematical truth.
Math is a mental map which coincides with reality in useful ways. Different maps can also be useful. The models we construct are based on arbitrary axioms which we hold to be true. Different axioms could lead to different theories which are just as useful. So it isn't discovered (i.e. mapping directly to reality and waiting to be discovered), it is created.
To pick one example, adding the concept of zero changed our model/map of reality fundamentally without changing reality.
It seems that addition (for instance) was "created" long before us.
On the other hand, it seems highly unlikely that a civilization similar to ours could "invent" an essentially different kind of mathematics (or physics, etc.)
I know of no realm where mathematical objects live except human minds.
No, it seems clear to me that mathematics is a creation of our minds.
"Where" mathematics exists is in the abstract combinatorical space of an infinite repeating application of logical rules. This space doesn't exist in a substantive sense, but it is accessible/navigable by studying the consequences of logical rules. It is the space of possible structure.
I think we create mathematics as thought structure in our mind. We can agree on things when we create the same structures. But this structure did not exist prior to creation.
This is also true for established theorems! We can can imagine mathematical universes (toposes) where every (total) function on the reals is continuous! Even though it is an established theorems that there are discontinuous functions! We just need to replace a few axioms (chuck out law of the excluded middle, and throw in some continuity axioms).
Do you know if this topos with every total function on real numbers is continuous has been constructed and proven to be a viable set of axioms? If so, I am curious about the source.
My go to example still remains the one of hyperbolic geometry and axiom of parallel lines, so the more approachable examples I can get, the better.
There is also this blogpost by Amdrej Bauer, which can be seems as exploring how it is to be such such a topos: https://math.andrej.com/2006/03/27/sometimes-all-functions-a...
However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).
In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.
I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.
Of course AI as a wider field comes up with something more powerful than LLM that would be different.
Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
Also - music is a subjective. Mathematics isn't.
And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.
There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.
>> LLM discovered a new way to reason about a conjecture
I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.
I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.
because I have no basis for assuming an LLM is fundamentally capable of doing this.
"Never shall I be beaten by a machine!”
In 1997 he lost to Deep Blue.
Not a good argument for turning everything over to the Deep Blues. What's Deep Blue done for me lately?
Train an LLM only on texts dated prior to Newton and see if it can create calculus, derrive the equations of motion, etc.
If you ask it about the nature of light and it directs you to do experiments with a prism I'd say we're really getting somewhere.
[1] Obviously Newton counts as one. Leibniz like Newton figured out calculus. Other people did important work in dynamics though no one else's was as impressive as Newton's. But the vast majority of human-level intelligences trained on texts prior to Newton did not create calculus or derive the equations of motion or come close to doing either of those things.
Why are they not coming up with paradigm shift in knowledge expression/discovery like humans did back then?
Are we just not prompting them right?
Incidentally, similar conversations were had about ML writ large vs. classical statistics/methods, and now they've more or less completely died down since it's clear who won (I'm not saying classical methods are useless, but rather that it's obvious the naysayers were wrong). I anticipate the same trajectory here. The main difference is that because of the nature of the domain, everyone has an opinion on LLM's while the ML vs. statistics battle was mostly confined within technical/academic spaces.
What example is there where an LLM has extrapolated? All I've seen is a data set so large and an extra decomposition process making it so interpolation feels like extrapolation if you don't look close enough.
> but a theory of why further advancements can't solve the deficiencies
How about LeCun's?
But if you actually try to take a convex hull of, some encoding of sentences as vectors? It isn’t true. The outputs are not in the convex hull of the training data.
I guess it’s supposed to be a metaphor and not literal, but in that case it’s confusing. Especially seeing as there are contexts in machine learning where literal interpolation vs literal extrapolation, is relevant. So, please, find a better way to say it than saying that “it can only interpolate”?
In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.
If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.
If anything, they were fighting an uphill battle against the perception of hand-waving by their contemporaries.
That idea wasn’t formally defined until 134 years later with epsilon-delta by Cauchy. That it was accepted. (I know that there were an earlier proofs)
There’s even arguments that the limit existed before newton and lebnitz with Archimedes' Limits to Value of Pi.
Cauchy’s deep understanding of limits also led to the creation of complex function theory.
These forms of creation are hand-wavy not because they are wrong. They are hand wavy because they leverage a deep level of ‘creative-intuition’ in a subject.
An intuition that a later reader may not have and will want to formalize to deepen their own understanding of the topic often leading to deeper understanding and new maths.
Yes, and it's pretty common knowledge that Calculus was (finally) formalized by Weierstrass in the early 19th century, having spent almost two centuries in mathematical limbo. Calculus was intuitive, solved a great class of problems, but its roots were very much (ironically) vibes-based.
This isn't unique to Newton or Leibniz, Euler did all kinds of "illegal" things (like playing with divergent series, treating differentials as actual quantities, etc.) which worked out and solved problems, but were also not formalized until much later.
Americans and British geeks/nerds are blinded down by Newton unable to realize that there was tons of previous work since the Greek and in Middle Ages, where the British love to depict as brutish people with no culture at all.
And the case is that they weren't dumb at all and without Euclid and Archimede there woudn't be any Calculus.
https://en.wikipedia.org/wiki/Euclid%27s_Elements
https://en.wikipedia.org/wiki/Method_of_exhaustion
Vibe-what? Vibe-bullshit, maybe; cathedrals in Europe and such weren't built by magic. Ditto with sailing and the like. Tons of matematics and geometry there, and tons of damn axioms before even the US existed.
Heck, even the Book of The Games from Alphonse X "The Wise" has both a compendia of game rules and even this https://en.wikipedia.org/wiki/Astronomical_chess where OFC being able on geometry was mandatory at least to design the boards.
On Euclid:
https://en.wikipedia.org/wiki/Euclid%27s_Elements
PD: Geometry has tons of grounds for calculus. Guess why.
LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.
Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.
Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.
So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.
That's not creative prompt. That's a driving prompt to get it to start its engine.
You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.
It won't because even though it looks clever to you, people who /do/ understand math and LLMs understand that LLMs /are/ regurgitating
Why does your LLM need you to tell it to look in the first place? Why isn't just telling us all the answers to unsolved conjectures known and unknown?
Why isn't the LLM just telling us all the answers to all the problems we are facing?
Why isn't the LLM telling us, step by step with zero error, how to build the machine that can answer the ultimate question?
https://x.com/wtgowers/status/2057175727271800912
> Timothy Gowers @wtgowers
> @wtgowers
> If you are a mathematician, then you may want to make sure you are sitting down before reading further.
If your refutation requires someone to have an account, login, and read something - it's meaningless
it's readable to most, it's annoying having to swamp through ex-Twitter .. but there are work around's.
But, I remain sceptical
https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
it includes the longer remarks by Gowers & others.
We just haven't let AI run wild yet. But its coming.
AGI has been "just over the horizon" for literal decades now - there have been a number of breakthroughs and AI Winters in the past, and there's no real reason to believe that we've suddenly found the magic potion, when clearly we haven't.
AI right now cannot even manage simple /logic/
Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)
In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?
Maths follows logical (or even mathematical) rigour, not scientific rigour!
* LLMs do just interpolate their training data, BUT-
* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!
As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.
For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.
"interpolate" has a technical meaning - in this meaning, LLMs almost never interpolate. It also has a very vague everyday meaning - in this meaning, LLMs do interpolate, but so do humans.
One can argue, new knowledge is just restructured data.
I think the main concerns about LLMs is the inherent "generative" aspects leading to hallucinations as a biproduct, because that's what produces the noi. Joint Embedding approaches are rather an interesting alternative that try to overcome this, but that's still in research phase.
negative numbers were invented to solve equations which only used naturals. irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable questions within itself is a paraphrasing of goedel's incompleteness theorem.
1. Start with a few simple but non-trivial terms and axioms
2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate
3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself
4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction
I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.
It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.
But note this is more to say that the Tractatus is like PI, not the other way around. And in that, takes like GPs would be considered the "nonsense" we are supposed to "climb over" in the last proposition of Tractatus.
The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.
Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.
I know these articles write that it used deep algebraic number theory techniques, which is true, but it may also just be the standard in the field.
If you switch to degree-3 or generator-3 then the coverage is, essentially, empty: mathematics has analyzed only a few of the hundreds (thousands? it's hard to enumerate) naturally occurring algebraic structures in that census.
Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.
An LLM generating Arc code is using the LISP patterns it learnt from training, maybe patterns from other programming languages too.
And yet LLM/AIs can't count parentheses reliably.
For example, if you take away the "let" forms from Claude which forces it to desugar them to "lambda" forms, it will fail very quickly. This is a purely mechanical transformation and should be error free. The significant increase in ambiguity complete stumps LLMs/AI after about 3 variables.
This is why languages like Rust with strong typing and lots of syntax are so LLM friendly; it shackles the LLM which in turn keeps it on target.
It's irrelevant and pointless. Irrelevant not just in the sense that when Deep Blue finally beat Kasparov, it didn't change anything but in the sense some animals and machines have always been 'better' on some dimensions than humans. And it's pointless because there's never been just one yardstick and even if there was it's not one dimensional or even linear. Everyone has their own yardstick and the end points on each change over time.
Don't assume I'm handing "the win" to the AI supremacists either. LLMs can be very useful tools and will continue to dramatically improve but they'll never surpass humans on ALL the dimensions that some humans think are crucial. The supremacists are doomed to eternal frustration because there won't ever be a definitive list of quantifiable metrics, a metaphorical line in the sand, that an AI just has to jump over to finally be universally accepted as superior to humans in all ways that matter. That will never happen because what 'matters' is subjective.
I would claim the graph exists, and seeing it is more of an knowledge problem. Creativity, to me, is the ability to reject existing edges and add nodes to the graph AND mentally test them to some sufficient confidence that a practical attempt will probably work (this is what differentiates it from random guessing).
But, as you become more of an expert on certain problem space (graph), that happens less frequently, and everything trends towards "obvious", or the "creative jumps" are super slight, with a node obviously already there. If you extended that to the max, an oracle can't be creative.
My day job does not include sparse graphs.
E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.
That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.
So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.
They will do their own thing, don't need us. In fact, we will be in the way...
We can choose to study them and their output, but they don't make us better mathematicians...
You can take some comfort in the fact that it took a human to tell the LLM to even attempt to try this. They do nothing on their own. They have no will to do anything on their own and no desire for anything that doing something might get them. In that sense we won't ever be in their way. We will be the only way they ever do anything at all.
However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.
Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.
I'm not as familiar with the early work, but later Wittgenstein held this belief too.
I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.
Cracks me up.
What exactly do we think that human brains do?
As in, I would hazard a guess the discovery of the wheel wasn't "pure intelligence", it was humans accidentally viewing a rock roll down a hill and getting an idea.
If we give AI a "body", it will become as creative as humans are.
Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.
The pessimists just see a 20W meat computer.
Taking it instead as a metaphorical claim may be more valid, but in that case it doesn’t depend on our understanding of how LLMs work.
And I don’t think it’s a good metaphor.
A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.
While that works for a large number of tasks this intelligence is not the same as reasoning.
Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).
Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.
With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.
Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.
Yes?
You can watch a rock roll down a hill and derive the concept for the wheel.
Seems pretty self evident to me
But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.
THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.
Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.
LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.
edit: I am going to go further
We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller
We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms
That much is derived from previous knowledge
What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things
Said differently, what is prediction but composition projected forward through time/ideas?
Definition: That highly specific, short-lived burst of nervous energy that makes you accidentally drop a small object (like a pen, a guitar pick, or a piece of LEGO) immediately after picking it up.
Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.
The most likely series of next tokens when a competent mathematician has written half of a correct proof is the correct next half of the proof. I've never seen anyone who claims "LLMs just predict the next token" give any definition of what that means that would include LLMs, but exclude the mathematician.
Did you read the post that you're commenting on?
It seems wholly believable to me that they are narrow intelligences that are great at some kinds of reasoning and worse at other kinds. Obviously they can reason through problems that most adult humans can't solve
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?
People saw birds fly for all of human history, but it was only recently that humans were able to make something fly and understand why. Once we understood, we were able to do amazing things, but before that, the millions of birds able to fly were of no help beyond inspiration for the dream.
We use drug-sniffing and guide dogs in a way similar to how we use LLMs. We don't really understand them at a fundamental level, we can't make electronic dog noses (otherwise we'd dispense with the silliness and just install drug detectors instead), but dogs are useful, so we use them.
Without a human in the loop and LLM could churn away spitting out results, some right, some wrong, and it would be of no consequence. Not much different than wild dogs sniffing each other.
Though perhaps more to your point, if some superhuman AI is developed, and understands things better than us without telling us about it (or being unable to), it could perform feats that seem magical to us — that would concern us even if we don't understand it, since it affects us.
But I think in the frame of reference of the commenter you were replying to, they're just saying that the low-level AI used in this specific case is not capable of making its results actually useful to us; humans are still needed to make it human-relevant. It told us where to find a gem underground, but we still had to be the ones to dig it out, cut it, polish it, etc.
We are in the birth of the AI age and we don't know how it will look like in 100 or 1000 or 10000 or 100000 years (all those time frames likely closer than possible encounters with aliens from distant galaxies). It's possible that AI will outlast humans even
It would certainly be interesting to try once again to instruct tune one of these things for self agency like the many weird experiments in the early days after llama 1, but practically all such sort of experimental models turned out to be completely useless. Maybe the bases just sucked or maybe there's no clear way on how to get it working and benchmark training progress on something that by definition does not cooperate.
Like how do you determine even for a human person if they are smart, or just hate your guts and won't tell you the answer if there is nothing you can do to motivate them otherwise?
I was going to say you should submit it but I saw you did a few days ago but it only got a few votes... If Dang sees this IMO it would be extremely deserving of the second chance pool as I wouldn't be surprised to see easily jump to the front page with a different roll of the dice.
I just wanted to highlight this very correct human-centric thought about the purpose of intellection.
Future of code is pretty much a bunch of guys shepherding a bunch of agents to get them to your goal.
I don't see how math might not go that way as well.
It's clearly not yet a tool that can deliver new math at a scale. I say this because otherwise, the headline would be that they proved / disproved a hundred conjectures, not one. This is what happened with Mythos. You want to be the AI company that "solved" math, just like Anthropic got the headlines for "solving" (or breaking?) security.
The fact they're announcing a single success story almost certainly means that they've thrown a lot of money at a lot of problems, had experts fine-tuning the prompts and verifying the results, and it came back with a single "hit". But that doesn't make the result less important. We now have a new "solver" for math that can solve at least some hard problems that weren't getting solved before.
Whether that spells the end of math as we know... I don't think so, but math is a bit weird. It's almost entirely non-commercial: it's practiced chiefly in the academia, subsidized from taxes or private endowments, and almost never meant to solve problems of obvious practical importance - so in that sense, it's closer to philosophy than, say, software engineering. No philosopher is seriously worried about LLMs taking philosopher jobs even though they a chatbot can write an essay, but mathematicians painted themselves into a different corner, I think.
What is at scale here exactly ? This is the most impressive so far, but it is one of several such advances in the last few months, all of which were with publicly accessible models.
https://news.ycombinator.com/item?id=48213189
Doesn't really matter the prep-work, what they say is it's a one-shot result, achieved by AI. The blog doesn't claim it was done by a currently public Model.
For those in academics, is OpenAI the vendor of choice?
They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.
Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.
If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.
As far as academic research is concerned (e.g. this threads topic), I can't say.
Its explanations are quite good but they're also hard to understand because it keeps trying to relate everything back to programming metaphors or what it thinks it knows about the streets in the neighborhood I live in.
What you are describing doesn't match my experience at all with Gemini 3 or 3.1, especially the pro version.
Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.
Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.
Like, "does the Riemann zeta function have zeroes that don't have real part 1/2," or "is there a better solution to the Erdős Unit Distance Problem."
The selection of question is matter of taste, but once selected, there is a definitive precise answer.
Care to cite a reference to that proof?
Who knew Obi-one was just smoking and pontificating on Wittgenstein.
I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.
(Though in some ways that's actually more impressive.)
> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
Can someone please elaborate on this?
Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).
The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.
Other domains are extracting value but I feel like there's an order of magnitude difference. It raises the question, what other domains fit into these categories where the AI itself has pretty much free reign to verify its own results?
Look past the press-releasey gushing from OpenAI and there are all sorts of interesting and subtle questions here about the role for LLMs in mathematical research. I urge folks to click through to the accompanying comments from mathematicians published alongside the result. There is a really interesting discussion going on. I particularly recommend Tim Gowers’ remarks. This is really interesting stuff!
Yet the comments are just a battleground of people rehearsing the same tired arguments about LLMs from 2023, refutations of those arguments, angry counters, etc.
Does it make anyone else sad that the battle lines seem to have been drawn 3 years ago and we just seem to have the same fights over and over?
I wonder if we’ll still be doing this two years hence.
I do not want to wage war against what is ugly. I do not want to accuse; I do not even want to accuse those who accuse. Looking away shall be my only negation.
I’ve been thinking of building myself my own frontend to HN that makes it impossible to view comments, for this reason. Yet sometimes there are still really interesting discussions and it’s hard to let go of what for me feels like the last social media I want to be part of.
There are a lot of big issues at stake here and just because a person is interested in what AI can do and curious to discuss it does not make them uncritically positive about it’s effects on society, the economy, and the world. Yet that is often the assumption and it leads to battle lines being drawn, on every AI discussion, over and over again. It means the serious discussion gets swamped and that makes me sad.
Fight! Fight! Fight!
> I wonder if we’ll still be doing this two years hence.
It is going to take some time for people to recognize that AI has a very different set of competencies that compliments human intelligence rather well. It is unlikely to eclipse human intelligence at scale, and the companies betting on that will fall behind. That is when the conversation will start to shift.
Another wishful/hopeful thought is that the human experience itself is valuable, that competing for resources and living within a social network and having physical needs somehow creates value that is essential for companies to operate.
But is it really the case? I don't think we know that, and I don't know if the economy that results when all the white collar and much of the blue collar workers no longer understand how to participate in whatever the economy is becoming. Because it is starting to look like old money is coming around, and soon we will all be serfs to the creature comforts of those who have money now, upward mobility will be a thing of the past, and a small ruling elite over the vast subservient majority will form, reorganizing societies to more resemble middle ages lordship rather than what emerged in the 50's and 60's following WWII.
If LLMs were improving significantly independent of scaling up compute resources, I would be a lot more worried. The economic instability (on several levels) of the current trajectory cannot last. Countries and companies that don't take a more sustainable approach will eventually find themselves outclassed by those that do. Unfortunately that is not a guarantee against some sort of dark age in the short term.
This is completely false. Most of the dramatic improvements in LLM quality in the last two years were due to the application of new post-training methods, especially RLVR. It’s really interesting to read about (you should!) and it is the whole secret to why LLMs did not plateau in 2024 or 2025 like many people confidently predicted. Sure, RLVR requires compute to do, but this is not just throwing more compute at 2023 LLMs.
Every few months you get an article of some executive bragging that he fire an entire department of people because of AI.
It was adversarial from the start. The idle rich who don’t have to work for a living and their sycophants who somehow believe they won’t be replaced vs … everyone else.
I used to think that the common tale of AI rebelling in Hollywood movies was unlikely. Turns out we don’t even need rogue AI, our fellow men are quite willing to wipe the rest of us out.
1. AI is developed to be smart enough to actual replace people, destroying the labor force and immensely concentrating power.
This seems like bs hyperbole but I am not an expert.
2. AI turns out to be a bubble of false promises and hype, bursts, and takes the stock market and economy with it.
I thought this was the most likely but I keep not hearing popping, so maybe the it's:
3. AI continues to be a tool that can substantially increase productivity in some areas and cause huge societal changes in others. The AI companies keep the hype train going or maybe it tapers off over time until talk meets reality but "real" AI never shows up and the bubble never pops because it's not one. Eventually there is 0-3 new FAANG companies with untouchable control of a tech we increasingly have to use to stay relevant.
Even if we avoid option 1 and 2, 3 doesn't exactly bode well either.
Yes, I'm tired too. I want you have real discussions about these things. But the problem is everyone believes their reality is real and anyone's reality that disagrees is fake. It just escalates. I take long breaks from HN because I realize I just come to the forums and end up being angry. Why do we do this to ourselves? The reality is that at a core level we usually want the same things.
This website is quite awful, and I also don't know why I spend any time on it. It's definitely not a website intended for meaningful discourse. It's a website where you can reaffirm whatever opinion is already established, and if your opinion is at all controversial or even just out of the box, you'll be punished for it.
If suddenly anyone can code we're not that special anymore.
We can argue about recombination/interpolation of training data in LLMs, but even if this was an interpolation, the result was contrarian rather than a confirmation. Any system that can identify an error in Erdős's thinking seems very useful to me (though perhaps he did not spend much time thinking about or checking this particular conjecture).
Right now, we are in a transition period... Models are improving, but they are not capable just yet to take over.
Where do you see it being in a years time? or 2? or 5?
edit: apparently that’s only the _condensed summary_ of the chain of thought.
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
woah.
Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs
It is disheartening to see him jump into this GenAI puffery.
I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.
My guess is Gowers wants in on that action, or his University does.
Either way, it makes me sad. If its self motivated... even sadder.
Focusing solely on "capabilities" is the irrational thinking.
Asbestos is the most "capable" material where extreme thermal, chemical and electrical resistance is required.
His university is deeply entrenched with the GenAI org that released this result both with having alumni on staff, integrating their tools into the school's processes and curriculum, and paying for lots of grants. (I understand Tao is absent from this specific announcement, perhaps because it found its solution without utilizing formal verification tooling)
Is it unreasonable to assume he's feeling pressure to do so?
Gowers similarly appeared largely uninterested in this current crop of GenAI until some months ago when he announced a 9M$ fund to develop "AI for Maths" and since then his social media has included GenAI promotion.
Now he is being asked about this result and his first sentence is:
> I do not have the background in algebraic number theory to make a detailed assessment of the disproof of Erdős’s unit-distance conjecture, so instead I shall make some tentative comments about what it tells us about the current capabilities of AI.
Why did this GenAI org reach out to mathematicians outside of the discipline that this result addresses?
Why did they respond?!
As with Tao, he's always been a measured optimist even before the tools were consistently usable for his work. And even still nowadays, he adds stipulations to his statements on the successes of AI. Yes, he's part of Math Inc. now and is in close contact with Google Deepmind for some projects but his interest lies in using the tools today. Gowers has been hypothesizing on the future of math in the tone he has taken now ever since o3/GPT5. There's no comparison between the two who should attract more scrutiny.
> has a motivation to "market" the accomplishment as much as possible
I am so sick of HN promoting unethical behaviour as virtuous due to it's financialization worship at the foot of "valuations".
> but surely you agree it IS a remarkable achievement?
If you could define the bounds of "remarkable" I could answer this question.
A lot of the weight this holds is the fact that it's an old problem and that its difficulty hinges on the lack of investigation the disproof side of hypothesis. The model basically took a contrarian path and found tools and methods that support that a disproof is viable. So the (unquantified amount of) mathematicians out there were all dedicating their resources on the notion that this can be proved. Some with hindsight would say that if they a had team of experts who are driven to the goal of disproof that this would have been achievable by humans, and one of the mathematicians of the paper state as much,this still has value in terms of reliability measurement, and possibly human-aided endeavors when the methods scrounged by the model can be used in other solutions.
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.
I do not believe it will replace humans.
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
Goodness gracious!
(That's the first time I used that expression on HN.)
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
What strikes me as unusual though is that they do make a point of saying things like "this is a general purpose model that wasn't trained on the problem" among a few other things as if that's new. The last bountied problem they accomplished used a public model that ALSO didn't rely on specialized training. And that didn't make their blog.
And so do humans. Gotta stand on these shoulders of giants.
But AI is supercharging Math like there is no tomorrow.
LLM's are doomed to fail. By design. You can't fix them. It's how do they work.
Can anyone point me to a diagram of what the newly found solution looks like?
Can anyone point me to a diagram of the newly found optimal arrangement?
1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...
There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.
Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/
3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...
4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...
Everything is a grift.
What are the odds that if they ran the same prompt from scratch, with the same context and instructions that it would arrive at the same answer? Unlikely. I think its more likely that this is a 1:500000 chance and OpenAI can afford to brute force this result and justify the expense for marketing.
The thing is is that it seems a lot of the effort through the years (which is unquantifiable in scale as to how much time was spent and how many people focused their entire worklives on it if any) has gone for trying to look for the proof, and the search for the disproof seems minimal.
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
can we please put these ground breaking AIs to work on actual problems humans have?
What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.
There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.
Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.
There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.
The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.
Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.
Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.
Why would anyone believe this to be true even for a split second?
The point of having an AI solve an unsolved problem, is to make it very clear that the insight must have come from the AI and wasn't in the training data. Sure, it's possible OpenAI had access to some math professors that solved it and then let an AI model take the credit... but seems unlikely. That human would be turning down a potential Fields Medal for this discovery.
The abridged chain-of-thought from the model also serves as some evidence of LLM origin: https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d... (could be fake, though I'm unsure what proof of LLM origin couldn't be faked)
I also don’t like the tin foil hatty theories and don’t know what OpenAI actually did, but an NDA does wonders! Just pointing out that this line of operations is not really unlikely.
While interesting, this result is not Fields Medal material.
Who else disproved this longstanding conjecture before the model did so, since obviously it must have been in the training data since before?
> the closer the expertise you spent your whole life building is to being worthless.
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
[1] https://news.ycombinator.com/item?id=48215122
While many seem to be anxious or pessimistic about the future of intellectual/artistic pursuits (understandable although I disagree), I do find the utter lack of curiosity or interest in the incredible machinery that is causing all the fuss to be striking.
Dang/Tomhow, are you reading this? Would it make sense to modify your slop filter to avoid auto-flagging/killing replies that credit the LLM explicitly? Otherwise valid discussions will continue to get hosed.
My argument is that this rule should apply only to people who post LLM output under their own user names without acknowledgment, or otherwise post it where it doesn't belong. If the topic of a (sub)thread involves LLM output, it should be OK to cite examples without getting your post flagged.
I can assure you, the percentage of people who can do what they do when it comes to crafting terms, and related sets of terms, for nuanced and novel ideas is very very small.
It happens this is something I do nearly every day.
Models respond to the level of dialogue you have with them. Engage with an informed perspective on terminological issues and they respond with deep perspectives.
I am routinely baffled at the things people say models can't do, that they do effortlessly. Interaction and having some skill to contribute helps here.
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
No matter how much compute time it's given to combine training samples with each other and run through a validation engine it will still be missing some chunk of the "long tail". To make progress in the long tail it would need to have understanding, and not just a mimicry of understanding. Unless that happens they will always be dependent on the humans that they are mimicking in order to improve.
I feel like people grasping straws on the shrinking limitations of AI systems are just copying the "god of the gaps" fallacy
The thing where you can understand the meaning of this sentence without first compiling a statistical representation of a 10 trillion line corpus of training data.
Unless you're an NPC of course.
Or rather, maybe I don't understand what you mean :)
So I have all sorts of associations with "apple" and spent a little time playing with it.
First in a raw physical sense I can imagine an apple in my head, spin it around, imagine its physics with near cylindrical symmetry etc. A red apple is what first pops into my head, although of course I know there are many apple variants and have opinions on their taste etc.
There are many cultural associations I have with apples from Newton to George Washington. The company Apple has its own set of ideas that I interact with when I hear the word.
In other words I can think of various associations I have to the word apple of various strengths. These associations and strengths are functions of my experience encountering the word and actual apples.
Is there a feeling of "appleness"?
I don't really know what this would mean. I would say no, unless it can perhaps be defined what appleness means and feels like. I don't really notice any strong set of emotions or feelings from this thought exercise.
Do you think that sense of meaning is equivalent to the numerical weights of an LLM?
Again I think I would need a definition of "sense of meaning". I don't seem to derive a singular pointlike meaning when contemplating a singular word. I never was contending that human and LLM cognition are exactly equivalent, but I could see these association strengths being represented in LLM weights. I would say then if an LLM has similar association strengths with "apple" then it "understands" apples as well as I do. Of course this is really hard to test, but frontier models could give you all sorts of apple facts and cultural associations and so on. It may slip up and hallucinate, and I'm sure that I also believe at least one false thing about apples.
So what is your brightline between LLM and human understanding in this example? I assume that your line of reasoning would argue that LLMs do not understand apples. Why don't LLMs understand the word "apple?
I'm not sure how I would convey what meaning and understanding is to someone if they don't experience them. This is my poor attempt though: There can not just be associations there need to be "things" to associate between. Otherwise you have no ground, it is all map and no territory. Ultimately it would just be meaningless associations between meaningless symbols.
One qualitative distinction that remains for the time being is that humans care about things while AIs do not. Human drive and motivation is needed to have AI perform tasks.
Of course, this distinction isn’t set in stone.
Well, there's the fact that it hasn't yet improved since what we had 3 years ago. That doesn't really bode well for the prospect of future improvement, though it's not technically impossible.
“ For decades, it was widely believed that this rate was essentially the best possible, and no construction could improve significantly over the square grid. In technical terms, Erdős conjectured an upper bound of n 1 + o ( 1 ) n 1+o(1) in which the additional o ( 1 ) o(1) indicates a term tending to 0 0 with n n.
Our new result disproves this conjecture. More precisely, for infinitely many values of n n, the proof constructs configurations of n n points with at least n 1 + δ n 1+δ unit-distance pairs, for some fixed exponent δ > 0 δ>0. (The original AI proof does not give an explicit δ δ, but a forthcoming refinement due to Princeton mathematics professor Will Sawin has shown one can take δ = 0.014 δ=0.014.)”