Rendered at 12:13:00 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
throwaw12 2 days ago [-]
> Composer 2.5 is built on the same open-source checkpoint as Composer 2, Moonshot's Kimi K2.5.
Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models
vessenes 2 days ago [-]
Sounds like it's the last Kimi-line model at Cursor? As expected they say they'll be training a larger model on the SpaceX infrastructure, or have already started most likely.
I'm very curious to read about the Composer 3 architecture when it comes out. More frontier coding models are a good thing, especially if they diversify into different strengths/weaknesses.
bfeynman 2 days ago [-]
That only seems plausible if whatever corpse of xAI is around is giving them engineering time. I don't know if they hired a bunch of ex frontier lab staff but its unlikely they have the technical capability to train their own frontier models especially the pretraining. Because the thing is if its not competitive with claude/codex it will be panned.
vessenes 2 days ago [-]
Hmm, I read the situation a little differently. Grok is not a slouchy model. It’s not the best, but it’s not the worst. X currently has one source of proprietary data, Twitter, and grok is by far the best at all the things you might imagine there - today’s zeitgeist, who’s saying what, current news, etc.
Cursor adds in a large corpus of proprietary coding data — I think this is actually fairly hard to acquire right now, because claude and codex are so good.
I bet there’s enough talent at the Grok team to work with the cursor team and data to get something good out the door.
That said, I don’t track Grok’s engineering leads — I’m not sure who’s currently around, and who is not.
ccimmergreen 2 days ago [-]
Unlikely, given that large swathes of talent have already left xAI, ostensibly due to poor leadership management. Simply throwing money in to build the biggest datacenters in the world doesn't do much good without bright minds to back it up.
https://www.fastcompany.com/91531084/inside-the-xai-exodus
vessenes 1 days ago [-]
Be careful taking the headlines at face value - that list of people leaving was mostly product and redundant senior execs to my eyes, post spacex merger. You’d expect those folks to be asked to leave as part of a re-org in any event. I don’t think it’s dispositive one way or the other on the tech org.
ccimmergreen 11 hours ago [-]
You are wrong, they were not redundant execs.
They were world-class senior developers and AI engineers most renowned in the AI research communities
(e.g. Jimmy Ba the legend, Christian Szegedy, Igor Babuschkin, Greg Yang),
poached from other companies to join xAI and they were getting very high salaries.
The mass exodus has been happening way before spacex merger though.
vessenes 2 hours ago [-]
Interesting. Agreed that’s a significant list.
Post model 3 launch, Tesla had a number of senior folks leave almost immediately. My read at that time was they had hit or exceeded pareto-optimal on the suffering:wealth scale —- Tesla was clearly going to make it, and they had already vested 90% of the value they’d receive from Tesla ownership: why go suffer through the massive build out?
And in fact, in that era, Tesla did bring in a bunch of auto industry types to help scale, who as it happens also certainly did very well, but order of magnitude less well than the early peeps.
There might be some similar economics here: change of control will often fully vest early founders. Combined with incoming SX IPO, these guys are done financially — as in, already multibillionaires pre-IPO. You’d have to want to stay and the company would have to really want you to stay as well before it made economic sense to re-up.
People say a lot of things about working for Elon; things like “hardest work I ever did,” and “he made me extremely rich”, but you don’t read “that was easy” very often.
I have no idea if there’s enough talent right now at xAI to go build a foundation model, but in the immortal worlds of Carl Icahn: “don’t bet against Elon”
zxspectrum1982 1 days ago [-]
There's been also a lot of good talent joining xAI lately.
scosman 2 days ago [-]
> I am optimistic Kimi K open models soon will outperform Opus models
Hard to outperform the model you distill...
nl 2 days ago [-]
Most of the performance on coding comes from RL, not distillation.
Distillation helps with world knowledge and things like that.
Bolwin 2 days ago [-]
They're not distilled. Stop spreading anthropics misuse of the term.
They do use it for synthetic data/judging though, so yes, hard to outperform.
Not that they need to. If they can basically match it for a fifth of the price.
intrasight 2 days ago [-]
Is that true? If the distillation is not lossy and the model runs much faster due to less resource consumption, then it may outperform.
mwigdahl 2 days ago [-]
One of those conditionals is a pretty huge assumption.
intrasight 2 days ago [-]
It's an assumption and it can be tested
howdareme9 2 days ago [-]
Only because last time they tried to hide it lol
trymas 2 days ago [-]
Yes and if I remember the drama correctly - Kimi's license or terms of use says that for commercial use cases (or was it user count?) - you must declare credit to Moonshot and Kimi.
Lennie 2 days ago [-]
It's important to mention: they were compliant, because they trained the model at an AI hosting provider that had a partnership with Moonshot AI, but Moonshot didn't know Cursor was a customer.
Aurornis 2 days ago [-]
This was misinformed Twitter and Reddit drama.
They had properly licensed it and were complying with the terms of the license.
davidatbu 2 days ago [-]
Note that something that helped the misinformation was that, on Twitter, there were Kimi employees expressing their surprise that the base model was Kimi K2.5, and their indignation that Cursor didn't credit Kimi. They later deleted their tweets (what I infer from that is that some employees were not aware of some pre-existing agreement or understanding between Cursor and Kimi until the drama happened).
maxdo 2 days ago [-]
How can distilled opus become better than original? There are numbers of reports including anthropic that kimi team was participating in fraudulent activities
throwa356262 2 days ago [-]
Do we know the "fraudulent " requests really came from moonshot engineers and was not QA team running a ton of benchmarks against other models?
I feel distilling something as big as Opus would require many many more samples, but I dont really know much about this subject
maxdo 2 days ago [-]
sure, sounds like QA lol
Scale: Over 3.4 million exchanges
The operation targeted:
Agentic reasoning and tool use
Coding and data analysis
Computer-use agent development
Computer vision
Moonshot (Kimi models) employed hundreds of fraudulent accounts spanning multiple access pathways. Varied account types made the campaign harder to detect as a coordinated operation. We attributed the campaign through request metadata, which matched the public profiles of senior Moonshot staff. In a later phase, Moonshot used a more targeted approach, attempting to extract and reconstruct Claude’s reasoning traces.
ta20240528 2 days ago [-]
And when you here unsubstantiated rumours* that say Anthropic has been sending exchanges to say Alibaba's Qwen, will you als oconclude the same about the entire US AI industry?
I doubt it.
* publish the logs.
ifwinterco 2 days ago [-]
Even if it's true, it's not like US AI companies can complain, given their entire business is based on ripping off text without attribution
maxdo 1 days ago [-]
chinese ai is not doing the same? or they don't parse?
they do except they also send thousands of sex-spies to do espionage of this kind on the scale.
ifwinterco 1 days ago [-]
Of course they’re also doing this, my point is this is a grubby business where ethics went out of the window a long time ago.
If you’re playing this game in 2026 you know the rules - anything goes
goyozi 2 days ago [-]
I kind of want to try it, to see if and how far they can take an open model and improve it but I really don’t miss the Cursor user experience. Constant UI changes, half-baked features, smaller and smaller limits, useless AI change attribution; I think I’ll wait for others to report if it’s any good.
whywhywhywhy 2 days ago [-]
Noticed recently they keep opening their “Agents” window when the project was last opened in the VSCode fork window in the hopes I’ll just continue working in that when the UI is totally different and missing things I need.
For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.
It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.
dmix 2 days ago [-]
I’ve personally never experienced that issue with Cursor. I never use the agents window and it always shows me the editor.
whywhywhywhy 2 days ago [-]
You're not in the A/B test. I've never opened the agents window consensually.
znpy 1 days ago [-]
> It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.
I fixed that by using cursor the agent but not the UI.
I'm just running cursor in GNU Emacs via agent-shell (https://github.com/xenodium/agent-shell). Their cli client (aptly named "agent") supports ACP (agent client protocol) so the UI can be skipped altogether.
I know this sounds like a meme ("use x in emacs") but at this point at the very least i can keep my workflows and my UI all the same and focus on my work rather than "where did $company put $feature this month".
SebastianKra 2 days ago [-]
It seems obvious that they plan to eventually drop VSCode.
I'd be willing to take them up on that offer. Their agent window is genuinely better as a starting point.
What annoys me is how little they want to integrate with ...anything. Wanna open a link in your default browser? Use our built-in chromium fork, we insist. Wanna open a location in Zed? No, please use our half-baked editor re-implementation. Wanna open a location in Cursors own vscode-based editor? You can't. Managed to work around that somehow? We changed your files to "Worktree TS", disabling all your language servers. It's like programming on an iPhone.
rubyn00bie 2 days ago [-]
Damn do I feel the UI changes being a pain point.
It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.
They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).
One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.
[1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.
animuchan 2 days ago [-]
> Truly feels like the UI/UX is done by people
To me it feels like it's done entirely by an LLM, starting from the product vision.
I gave up, canceled my plan, and went back to boring old VSCode. It feels so much more stable, and my Mac no longer runs out of memory. With cursor I had to reboot my macbook several times a week and had to always be plugged in.
smnscu 2 days ago [-]
That's me with Google Antigravity. Switching back to vscode was such a breath of fresh air. Porting over my (extensive) settings/extensions/keyboard shortcuts was extremely easy too (just ask the agent to do it), and now I can use both Copilot models and Claude Code easily. More to your point though, the speed and stability is incomparable. I can't remember having many issues with Cursor last year when I used it at my last job, but still, vscode has been surprisingly pleasant for agentic use.
tomasz-tomczyk 2 days ago [-]
Yeah I have a soft spot for Cursor because it was my first tool that unlocked huge productivity with AI, but I avoid doing anything there now.
Should try their CLI!
Aurornis 2 days ago [-]
I try it from time to time and feel the same way. Some people I know really like it but I can’t tell if that’s because it’s good or just because it’s what they’ve become familiar with and they don’t like to change tools. Cursor had a good head start and a lot of early PR.
epolanski 2 days ago [-]
Good point.
One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.
fjdjshsh 2 days ago [-]
I've had good experiences with Cursor so far and it's my main IDE.
I've noticed some UI changes, but I've switched fast and they didn't bug me
indiantinker 2 days ago [-]
I agree. I quit cursor and replaced it with conductor and a mix of Claude Code / Codex/ Copilot and i dont miss it as such. Maybe one day I will come back.
ttouch 2 days ago [-]
you can use either the cursor cli and/or zed editor with cursor as the underlying provider with ACP (agent context protocol)
presentation 2 days ago [-]
Tried that, it just seemed way dumber this way unfortunately. And the zed UI provided 0 visibility whenever it was doing tool calls, and for some reason it kept running sleep 30 calls because it couldn’t figure out how to see the results of its own tool calls for some reason.
jstummbillig 2 days ago [-]
Isn't there a cli version of cursor by now?
yourboirusty 2 days ago [-]
It's a bit better than the VSCode fork, but still much worse than competition:
- lags constantly,
- if you type while it's generating you'll get missed inputs,
- 'plan mode' doesn't clear context before starting work,
- you can't directly edit the plan, you can only ask the bot to do it,
- you can't immediately whitelist commands, only accept once or allow all.
The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.
onlyrealcuzzo 3 days ago [-]
> Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.
Impressive, yes. But they still don't have a moat...
infecto 3 days ago [-]
I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.
virgilp 2 days ago [-]
If we ignore cost (which is kinda hard to ignore), I feel Codex kinda' does it for me. Sure it's not really an editor but I find I don't need that _that much_ and it's easy to launch an external editor (they actually have the feature).
The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.
chillfox 2 days ago [-]
Have you tried Zed?
I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.
Anyway, would love to see a comparison from someone who has used a recent version of each.
turastory 2 days ago [-]
A few years ago I tried Zed when it was still pretty early, but eventually settled on Cursor. I gave Zed another shot a few days ago because Cursor’s worktree support still feels pretty weak.
In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.
Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.
So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.
chillfox 2 days ago [-]
Interesting, is it that the tab completion is giving better results, or how it works is better?
ramses0 2 days ago [-]
The tab completion is "faster than vim" from a long-time vimmer. It's at the point where a lot of times i'll lead with the comment instead of the code:
# now take the list and sort by x.lastName
<tab>
...and it'll "do the thing" (w/ type hints, its own comments, etc). Obviously in this very simple, understandable, completely contrived example, it's "trivial" (but 3 years ago would have seemed like magic), but it'll also pick up on "continuation / more of the same" type edits. A comment like `# use random_utility to call the api and only accept matches which supplement addresses that have already been found` will (usually) autocomplete all the gobbledy-gook w.r.t. tokens, URL's, function names, etc. so it's effectively an "automatic omni-complete with simplistic post-processing"
Example #2: I was just fixing some vibe-coded slop, where it was taking `click.echo( some_api.whatever_endpoint() )` and the "slop" portion was literally emitting: `str('{ "A": 1, "B": 2 }')` and that function call was emitting it directly.
On the command line, I was doing `blah whatever-endpoint --something | jq '.'` and got tired of the JQ thing, so I'm like: "I'll just use `json.dumps(...,indent=2)`", but lo and behold, I'm getting a dumb JSON string literal, not a pretty printed object shape.
I start typing `json.loads(` to move from "str()" to "dict()" ... and it autocompletes the whole scenario (on that line), then I move to `def some_other_endpoint` and it basically has that same edit queued up. (ie: it "knows" what i'm about to do).
...so overall, "faster than vim", even with high skill bar for repetition, motion, macros, sed-style edits, etc. You can't beat: "<tab>", especially when it's lightly intelligent (ie: knows when/what/str/int, adapts do different function calls, etc).
nl 2 days ago [-]
I've tried Zed and really didn't like it.
I like VS Code with the Claude Plugin, and sometimes with the Codex Plugin
infecto 2 days ago [-]
Tried it and it’s fine but the AI integration is not tight enough for me.
jmcqk6 2 days ago [-]
I've been using cursor for over a year for my personal projects. At work, I use Claude Code, and so I've been wondering if I'm missing something in the other agents.
Over the last week, I tried out two other agents on my personal projects: dirac and forgecode, after seeing impressive results from both of them on terminal bench.
After a good amount of testing, and over $100 in open router spend, I'm back to cursor.
I really liked forgecode the best, and it feels better than claude code, but cursor definitely feels best to me. Composer 2.5 is fast and effective, and it makes a huge difference. I was running `forge` with Opus, and it was taking dozens of minutes to do things, and the feedback loop was so slow.
The previous version of composer was also much faster, and it makes a difference. Maybe people like context switching, but I prefer to stay focussed on the task in front of me, and I'm reviewing the code carefully.
I think that's a pretty good moat. I was ready to end my subscription a week ago, and now I'm back after learning the grass is not necessarily greener on the other side of the fence.
alach11 3 days ago [-]
Isn't a large user base and the data collected from those users a moat of sorts?
onlyrealcuzzo 3 days ago [-]
A moat is when you have something other's can't easily get.
Every MAG 7 / FAANG company already has more users and more data...
That's not a moat.
That's traction.
LinXitoW 2 days ago [-]
They don't have the same quality and kind of data. For example, Claude Code might have general conversation flow data for implementing feature X, but Cursor has users individual editing actions AND the chat flow. Which line did the user manually edit after the agent did it's thing? What's the commit message (if done manually)? Stuff like that is worth it's weight in gold.
wilg 2 days ago [-]
That's not X.
That's Y.
DonHopkins 2 days ago [-]
I fear the day that large parts of perfectly valid English language and punctuation are off limits for humans to use because LLMs use them too (having learned them from humans), and somebody will always whine and post low effort "slop" comments that are much more annoying and less useful than the slop itself, or even incorrectly whine about human written text that happens to match your hyper-sensitive slop detector.
Plus you are always running the risk of being rude and insulting when incorrectly labeling text actually written by humans as slop — making a jackass of yourself — and opening yourself up to being trolled by humans purposefully inserting em-dashes and catch phrases just to trigger you. That's not clever. That's gullible.
How much cognitive and physical effort and time do you put into trying to figure out if everything you read is slop, then complaining about it? If that's your job or calling in life, you could be easily replaced with AI. Find something more creative to do with your time.
If you really object to low effort slop, and not just relish it as an opportunity to whine, then how about instead of posting low effort whines about slop, you put in the actual effort to do something about it, and rewrite the slop in a way that won't trigger your slop detector, then post that instead, to train AI not to write slop.
Is your problem that it's slop, or that it's AI generated? Because your whining about low effort AI generated slop without contributing to the conversation or addressing the point of the comment you're replying to is just low effort human generated slop.
Please don't post slop while complaining about slop.
uxcolumbo 2 days ago [-]
Been a bit out of the loop.
What's wrong with using very short sentences like 'That's not X. That's Y.'?
arcanemachiner 2 days ago [-]
Commonly used phrase by LLMs. Gives people slop vibes these days.
Kiro 2 days ago [-]
"It's not X, it's Y" is a good way to illustrate a point. Same goes for many other common LLM phrases. It's used because it's effective.
monsieurbanana 2 days ago [-]
Huh. I associate it with LinkedIn slop, which is probably 100% ai nowadays but they certainly didn't wait for llms.
AussieWog93 3 days ago [-]
Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!
NitpickLawyer 2 days ago [-]
> Early attention engineering when humans were still in the loop
Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.
kkukshtel 3 days ago [-]
And its still just a vscode fork
icemelt8 2 days ago [-]
Cursor 3 is a complete rewrite, its no longer a fork.
gkbrk 2 days ago [-]
It's still a VSCode fork. Even Cursor's own About window tells you it's VSCode.
Cursor
Version: 3.4.20
VSCode Version: 1.105.1
muhfournik 2 days ago [-]
I believe the agent view is a complete rewrite, and maybe the other parts but not the editor itself
antirez 2 days ago [-]
How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.
Lionga 3 days ago [-]
They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.
GenerWork 3 days ago [-]
I feel like they've been targeting enterprise pretty hard. I know my company uses them, and the companies that hire us also use Cursor.
Squarex 2 days ago [-]
All enterprises I know use GitHub copilot as they already have Office, Teams, … wonder how will it change with the recent pricing changes
pjmlp 2 days ago [-]
I can tell my company wants nothing with them.
kvetching 3 days ago [-]
Cursor will definitely win the enterprise for coding. Enterprises aren't going to trust a TUI
esafak 2 days ago [-]
Why not? That makes no sense to me.
kilroy123 2 days ago [-]
I think it's going to be brutal for them to compete with OpenAI and Anthropic.
I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.
For that same $200 a month, I could use claude code and basically never hit usage limits.
I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.
liuliu 3 days ago [-]
Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.
wg0 2 days ago [-]
This was the only way forward.
the_duke 2 days ago [-]
In my opinion cursor actually has one of the best harnesses again at the moment.
make3 2 days ago [-]
why is that part impressive specifically? they got purchased by SpaceX, they have access to infinite compute and cash now.
& now they're still losing all of their users to Claude Code and Codex.
DeathArrow 2 days ago [-]
>& now they're still losing all of their users to Claude Code and Codex.
Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.
It's not like Cursor harness is the best out there.
And even if I want to edit the code, I don't need to run the agent harness in an IDE.
wmichelin 1 days ago [-]
Not a cursor shill by any means, I do use it at work but that's because it's what they pay for.
But Cursor has a CLI harness.
make3 2 days ago [-]
these are in the trillion parameters range, not sure it's actually that cheap to have at a reasonable speed without quality degradation & without like.. your own DGX B200
DeathArrow 2 days ago [-]
I didn't say to run them at home. There are some cheap coding plans that gets you plenty of usage for the Chinese models.
DeathArrow 2 days ago [-]
>Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.
With so much money and computing from SpaceX, is not so impressive.
farco12 2 days ago [-]
One would hope the vscode fork with a $50B valuation and no moat, would wisely spend the money they raised to build a moat.
whywhywhywhy 3 days ago [-]
It's still a VsCode fork just now with a Kimi fine tune and still no moat...
I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.
hkleppe 2 days ago [-]
"No moat", well...
How I see this is that its so important to bundle the model with the right tooling.
Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).
So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks
3 days ago [-]
aurareturn 3 days ago [-]
I doubt it's a brand new model. It's likely just Kimi K2.5 further trained on coding.
enraged_camel 3 days ago [-]
They didn't say it's a new model... in fact they said exactly what you just said.
memoryleakgame 3 days ago [-]
If these benches from their site hold up (they likely wont)
Wouldn't this compress ai revenue like 15x quickly
If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing
Maybe they are getting elon to cover cost
vessenes 2 days ago [-]
It's worth being specific:
"Will this decrease Revenue?" -- only if demand for high quality tokens is inelastic. If demand is instead elastic (grows with cheaper pricing) then revenue will likely increase.
"Will this lower earnings?" -- they have a current inference margin for their old models, and with the Elon deal in place, they have a new inference margin. It might be better or worse than their old one. If it's worse, then they'd need to see a concomitant increase in usage. If they don't, then yes it might lower earnings.
"Will this lower corporate value?" -- no - not least because this company is going to be owned by SpaceX approximately 90 days after IPO -- so all the new owner will care about is being benchmark competitive with Anthropic and oAI for the first n quarters. If they can do that, it will massively increase the corporate value of SX; it's hard to build a frontier lab.
infecto 3 days ago [-]
The way I have read their benchmark results is that they trained a model to work insanely well in their coding workflow. It’s not a general purpose model.
One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.
romanovcode 2 days ago [-]
The problem with this is that we do not know the actual cost. For all we know they might be pulling an Anthropic. Subsidizing costs to get users, then increasing them later on.
yorwba 2 days ago [-]
They're offering a model based on Kimi K2.5 for $0.50/M input and $2.50/M output while the cheapest third-party provider on OpenRouter charges $0.40/M input and
$1.90/M output https://openrouter.ai/moonshotai/kimi-k2.5 Those third-party providers have little incentive to subsidize their customers, so Cursor probably has a margin >20% on their inference cost.
The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."
So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.
zackify 3 days ago [-]
this thing is so awesome on fast mode, so far i am impressed, some of its observations feel similar to opus.
i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha
2001zhaozhao 3 days ago [-]
> compress ai revenue like 15x
that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token
smallnamespace 2 days ago [-]
AI revenue has been going up while the cost per token has been rapidly falling. The Jevons paradox applies here. The cheaper software is, the more software is written. There is not a finite demand for software.
rafaelmn 2 days ago [-]
> AI revenue has been going up while the cost per token has been rapidly falling
Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?
jstummbillig 2 days ago [-]
1. GPT 4 has gotten 6x cheaper over it's evolution (from initial release to Turbo to 4o). Maybe you meant "Only since 4o and only since its final release". Alas.
2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.
dktp 2 days ago [-]
Opus 4.5 became significantly cheaper directly per token
rafaelmn 2 days ago [-]
You are right I forgot about that ! I think my point still stands - price per token is not decreasing for frontier capabilities, in fact it's increasing.
radu_floricica 2 days ago [-]
This only means the frontier is growing faster than the price is decreasing. It's just the sum of two separate tendencies, and has little predictive value. TBH, I'm ok with this tradeoff - higher capability at slightly higher cost is perfectly fine.
baq 2 days ago [-]
token efficiency
chillfox 2 days ago [-]
Not seeing that either, tried really using Opus 4.7 today, and it ended up at $50 for the same kida thing that came out to $25 last week with Opus 4.6.
baq 2 days ago [-]
each model is different and nothing should be taken for granted, run your evals for your use cases. I'm not using Opus 4.7 for almost anything. I've seen very good improvements in GPTs since 5.2 and Opus 4.5 to 4.6 was quite an upgrade.
wesammikhail 2 days ago [-]
Models consume more tokens than ever for the same tasks.
vb-8448 2 days ago [-]
I, and I guess basically everyone here, don't have access to OAI or Anthropic books, and it's really difficult to disprove your statements but:
- AI revenue going up & cost/token are not related metrics, at least not in the way you are assuming
- basically all players (except OAI for the moment) struggling with capacity and/or reducing-dismissing subscription based solutions in favour of pay-per-use. If token cost/token was falling, we would see quite the opposite.
lompad 2 days ago [-]
This is conjecture. There is a reason both openai and anthropic refuse to comment on inference costs. If it were falling so much, they would use it to brag.
I really don't understand why so many people keep repeating it without any actual data for the frontier models.
Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.
We could look at tasks instead.
Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?
epolanski 2 days ago [-]
I'm not sure that to be the case, it seems like bringing capabilities up and costs down merely serves to induce more demand.
rcleveng 2 days ago [-]
I have to say the new model is quite good at the basics, I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately.
At this point, more of my complaints are on the harness side, which is odd since originally they were by far the best harness out there.
Support - This is pretty much non-existant, it's community support or sales support.
Interacting with GitHub - this should work and be awesome, Claude code does this well (responding to lint errors and comments). Cursor you have to poke the agent to look at the comments or lint errors, and even then it's about 10% good. Even GitHub Copilot is better here.
Bugbot - I have it setup to trigger manually, but it still seems to wake up and burn 80-120k tokens just to notice it's configured to be manually invoked. When it does run, it tells me there's no issues (but claude or copilot both find real things)
App - When you have both agent window and the ide windows, it's hard to open up the code in the right directory. A simple "cursor ." from the terminal used to do it, now it'll often open the agent window, you have to try a few times for it to work.
I love that they are running super fast, it's just hard when many of the basics break or don't work.
khazhoux 2 days ago [-]
> I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately
Tangent: we've been using Linear at work and I still don't understand why it claims to be "task tracking for agents". Is there anything at all that lends itself better to agentic workflows compared to JIRA or gitlab/github issues or whatever else?
Seems like Linear just hopped on the buzzword hype train at the exact right moment...
dbalatero 2 days ago [-]
> Seems like Linear just hopped on the buzzword hype train at the exact right moment...
I think you nailed it. Provided an agent can connect and ingest the information in the ticket, that's basically what's needed. I guess it's nice to be able to nudge ticket status and post back to it, but all of those seem like wiring up existing APIs to an MCP and calling it good. I don't see why JIRA couldn't execute on that, despite being Atlassian.
rcleveng 2 days ago [-]
Yup, honestly a google spreadsheet could probably do it as well.
I like the "copy prompt" feature, it's super simple but makes it just a few seconds to go from issue -> claude session.
Also assigning directly to cursor or codex, that's how I handle the easier tasks.
We also have scheduled tasks that elaborate existing tickets with information where needed, again that's just MCP but it works well enough
brunooliv 2 days ago [-]
Any reason why they indexed on Kimi K2.5 model? I have tried many open-source ones in Opencode, and, in my experience (standard backend development, Java, Python, Spring, etc) Qwen3.6 is SO MUCH BETTER that's shocking. Kimi can't even get most tool calling arguments right.
CuriouslyC 2 days ago [-]
There's a lead time on models, and there's some tuning gotchas they probably already figured out with Kimi, so they weren't ready to just drop everything and switch. I'm sure they will switch models eventually.
roflcopter69 2 days ago [-]
I recommend reading the entire article
Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute.
With Colossus 2's million H100-equivalents and our combined data and training techniques, we expect this to be a major leap in model capability.
grim_io 2 days ago [-]
I guess this will largely decide if xai is going to pay 60 or 10 billion, depending on the success of the new coding model.
KaoruAoiShiho 2 days ago [-]
Kimi 2.5 has the best long context. For raw coding benchmark scores you can just post train on top of it with more specialized data. 2.5 is kinda old, 2.6 is the current release which is exactly just that and catches up to the frontier in most aspects.
Bombthecat 2 days ago [-]
Cheaper to run?
steviedotboston 2 days ago [-]
It's very confusing that they use the same name as the very well known PHP package manager, composer
I dont know what it is with products names these days. Antigravity, Antimatter, Composer, Clay, Ramp, Bolt, etc.
You'd think the founders would Google for naming conflict before choosing a name.
varun_ch 2 days ago [-]
I genuinely wonder if consulting LLMs for naming advice could be an explanation.
They certainly wouldn’t be great at coming up with new words for a product name.
dewey 2 days ago [-]
Naming issues are as old as time. Apple Computer vs. Apple Records comes to mind as a popular example.
PUSH_AX 3 days ago [-]
They set themselves up for flack when they use whatever these evals are… they did the same for composer 2 which was evaled in close competition with frontier models, spoiler alert, it wasn’t even close in practice.
So now 2.5 is supposed to compete with opus 4.7? Sure…
jmcqk6 2 days ago [-]
That does not match my experience. Composer 2 was fantastic for my uses, and I hit Composer 2.5 with some very difficult things last night, which it handled fast and effectively. I don't really care about benchmarks. I care about practice, and in practice, it's been very very good for me.
tuo-lei 3 days ago [-]
they say it themselves in the post - behavior dimensions "not well captured by existing benchmarks". that was the exact problem with composer 2. not dumber on individual tasks, just bad at session-level decisions like when to stop editing, how much context to carry forward, when to re-read a file vs assume. you don't catch any of that in an isolated eval.
infecto 3 days ago [-]
As I have said before in prior composer threads. The proof is in the usage. I am inclined to somewhat believe the results as I use composer and also take the results for the given context. It’s not a general purpose sota model. It’s a model that runs inexpensively in their coding workflow that is creating results similar to opus or gpt.
criemen 3 days ago [-]
Well is that a statement about the quality of Opus 4.7 or about compose 2.5? :P
jtwaleson 3 days ago [-]
Ok this might be weird but I've moved everyone in my 4 person team to our team plan and costs seem to have sky rocketed compared to the individual plans. Where before most people spent 20-100 USD, now the total bill is more like 1k USD. I haven't gone into the details but it feels like I'm being scammed.
mohsen1 2 days ago [-]
We moved off Cursor and onto Codex + Claude Code. Cost went from multiple thousand per engineer per month to about $500
zackify 2 days ago [-]
Best deal currently:
Cursor team
Codex team
Claude team
Swap between the models when limited.
I am saving our company a lot of money vs Claude enterprise usage cost
skeptic_ai 2 days ago [-]
I did some monitoring.
15 accounts, 300 millions tokens input, 200k output went to 0 the 5h quota in 7 hours. 4 parallel tasks.
I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.
DedlySnek 2 days ago [-]
My company is shifting us from Cursor to Claude due to increased costs.
danbrooks 3 days ago [-]
Check which model you're using.
The fast version of composer is the default now (which costs ~x3 as much).
infecto 3 days ago [-]
Keep in mind I believe there is a larger buffer given to personal plans. If they have 50% extra with the personal plan you now only get 25%.
PUSH_AX 3 days ago [-]
My cursor costs sky rocketed recently too
chemex 2 days ago [-]
I've been using Claude Code as my daily driver on a React Native + iOS codebase for the last few months. The thing that surprised me wasn't quality differences on individual edits — those are pretty close once you control for harness wiring — but how differently I'd ended up structuring my workflow around each style of tool.
Tab completion + chat-in-sidebar feels like an extension of my editing. An agentic harness feels more like delegating a 20-minute task and coming back to review. Different cognitive load, different bug profile. The "which is better" framing tends to skip over the fact that they reward different working styles.
Two things I'd watch on Composer 2.5 specifically:
1. How it handles long-running multi-file refactors that touch 10+ files. My experience with smaller models in that slot is they lose track of which files they've already edited around 30% of the way through. Frontier models keep the plan coherent for longer.
2. How it deals with non-obvious file boundaries. The thing that takes me out of "let it work" mode is the model deciding it needs to edit a config file I didn't think of. Usually that's right, but occasionally it's spelunking somewhere I don't want it to be.
The Kimi K2.5 base is interesting on its own. Open weights below frontier closed models is the thing worth watching from the harness side. If anyone's set up to fine-tune for a specific harness, this is the moment.
1/ CursorBench is so opaque [1] that it makes it hard to trust. Not to mention the v3.1 eval is a newer iteration and there's no insight into the tasks or if the model was just tuned to max it out. Composer 2 previously scored between 60-65% on the previous benchmark eval [2] but scores between 50-55% on CB v3.1[3].
2/ I've experienced Composer 2's performance and it leaves much to be desired as a daily driver for a knowledge worker. but KWs are obviously not the target users and I can see how it's cost-efficient for executing on clearly-defined, discrete coding tasks. Obviously that's their value proposition and they're figuring out how to communicate it well to the target customer. It just doesn't feel like CursorBench is that.
Surprised this got pushed off the front page so quickly! It’s exciting to see what the Cursor team has been able to do with significantly fewer resources than the frontier labs.
I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.
dang 2 days ago [-]
It set off the flamewar detector, a,k.a. the overheated discussion detector. We'll turn that off.
granzymes 2 days ago [-]
Thanks, dang! The blog post[1] might be a better source than the twitter thread. Also I regret my typo above (lab -> labs) but too late now!
As for the typo, s's are cheap and I've added one :)
DeathArrow 2 days ago [-]
I think anybody will be much better by acquiring a coding plan from Kimi.com and using Kimi K2.6, with whatever harness they like, including Claude Code, instead of paying more for Cursor's version of Kimi K2.5.
m_mueller 2 days ago [-]
It's a bit confusing to me why they'd make this 'fast' version the default, as it appears to be much more expensive than Composer 2. Wasn't it supposed to be a very cheap alternative to SOTA models?
mrklol 2 days ago [-]
Isn’t it a really cheap alternative to sota models (according to benchmarks)?
The cost claim is the easy part to sell. The real test is whether it stays useful in ugly codebases, long files, and repos with a bunch of half-broken conventions. That’s where these assistants usually fall apart, even when the benchmark numbers look great.
machiaweliczny 2 days ago [-]
Tested and it's good. Fast version is bad though. I like planning model in Cursor that it works more like human written design doc instead of too detailed AI plan. Seems like this is more responsible for results that model but still on fast it failed but on normal got good results.
sofumel 1 days ago [-]
I'm currently using Claude Code, but should I cancel it at the next renewal and switch to Composer 2.5?
2 days ago [-]
luodaint 2 days ago [-]
Benchmarks measure turn-level capabilities: you feed a task into the system and then grade the result. Capability for production-level usage concerns session-level decision making: does the agent know when to stop editing, retain the right amount of context, or go back and reread the file if the state has changed?
This is not a property of the model, but a property of the discipline; it can be operationalized by what you have documented before the session begins. Without "stop editing where you can no longer follow your changes to the spec" and "go back and read the migration file before changing the schema," there is nothing to halt the process until it fails integration.
Those teams who get consistent results independent of the model being used typically do so because they have operationalized their discipline first. Those switching out models monthly tend to expect the model to supply them.
0fes911 2 days ago [-]
I found composer 2 pretty good as a subagent delegating tasks like auditing for bugs after finishing implementation, but hopefully composer 2.5 will be more reliable so it can be used to implement and execute long running tasks.
WhitneyLand 2 days ago [-]
Say what you want about Cursor but they don’t lack for ambition.
Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.
They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.
And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.
Not for the faint of heart.
pdq 2 days ago [-]
Why is this comment upvoted?
It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").
Aurornis 2 days ago [-]
Good catch. I didn’t even notice it at first, but the hallucinations on top of cliches gives it away.
The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.
WhitneyLand 2 days ago [-]
Please see reply to your other comment on this thread.
WhitneyLand 2 days ago [-]
I wrote this 100% off the top of my head on my phone while eating a sandwich.
Ffs.
edit: removed cursing you out. Sorry but this is frustrating. I don’t leave AI generated comments here (or anywhere else).
Aurornis 2 days ago [-]
EDIT: As others have pointed out, the comment above contains hallucinations (Like the $50 billion number) and a lot of AI tells. The account doesn’t have a history of AI-like comments but the hallucinations and structure in this one are suspicious. If anything, don’t trust the numbers it cites because they’re made up.
Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.
My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.
WhitneyLand 2 days ago [-]
So, you’re wrong on two counts.
1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.
2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.
> They’ve been highly successful so far. Raised $50B,
They have not raised $50B. The article you linked says they're raising $2B, not $50B.
The valuation is not the amount raised.
WhitneyLand 1 days ago [-]
So I made a mistake reading the article? So what?
The point is you made two brigade style comments about my posts sounding suspiciously like an LLM and having hallucinations.
Neither turned out to be true and I think a better response would concede the point.
It may be more helpful for us to stick together as humans since we can’t always recognize each other so easily anymore.
Survey8430 1 days ago [-]
What do you mean neither turned out to be true?
Your comment DOES sound like an LLM and it DOES have hallucinations!
Please make your humanness more recognizeable next time, don't waste readers time with posh fanboying and lazy fact checking.
adamkeys 2 days ago [-]
Same, I kick the tires on Cursor every several weeks wanting to find they've finally crossed some chasm I can't quite explain. But every time, I bounce off the ground-truth that they're forked off vscode, which just isn't for me. I think moving agents to the center of their experience and developing a model that focuses on speed/efficiency over maximum depth is a promising step away from being a spicy vscode fork.
whs 2 days ago [-]
My company is heavy on Cursor and I still ask them to provide me GitHub Copilot, for the sole reason that Cursor is probably the reason Microsoft had to implement technical enforcement of their TOS on proprietary plugins. Previously, you could use PyLance on VSCodium but now those plugins do not work outside VSCode anymore.
If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.
chrisrickard 2 days ago [-]
Cursor 3 is a full rewrite. No VS Code
causal 2 days ago [-]
Yeah I want them to do well. I find Cursor to be a much better tool for actually working with the code the agent writes than whatever the big vendors provide.
highfrequency 2 days ago [-]
> now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.
To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.
They have no choice but to train their own model to try and survive. They're paying API pricing for the top tier models but competing against subsidized subscriptions.
worldsavior 2 days ago [-]
Them raising this much money doesn't mean they're successful, it only means they know how to fool the investors well. A project that is basically an extension to VSCode only adding a chat interface, isn't really worth this much money. Obviously, it's the users, but people think it's something genius and revolutionary, but no.
infecto 2 days ago [-]
This is rsync all over again. Go create it yourself if you think it’s just a simple extension.
worldsavior 1 days ago [-]
You're right, I regret I didn't have the sense to do the same as them at the time.
infecto 1 days ago [-]
Nope you are blowing hot air. Take it elsewhere.
worldsavior 20 hours ago [-]
You can take yourself elsewhere. Good luck.
infecto 18 hours ago [-]
Less hot air and more substance please. It’s easy to deconstruct a company as an arm chair quarterback. It’s much harder to build a viable one. Until you have something constructive, kick rocks. Hot air is boring.
I realize you’re a troll account but at least be a fun troll.
worldsavior 1 hours ago [-]
I think that the product is easy to build, that's what I think because in my gathered experience it's easy. What more do you want?
This is the last time I'm responding. Good luck on whatever journey you're on. I'm sure it's an interesting journey since you've realizations over troll accounts, very interesting.
dtagames 2 days ago [-]
As a heavy user, I don't think the model is their product. Cursor is primarily a harness and lately, a specialized agent dashboard.
Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.
benmusch 2 days ago [-]
they aren't "throwing down the gauntlet", they're trying to find ways to eke margin out of their product by owning a commodity-level coding model. it's an impressive engineering task but it's not particularly ambitious.
Survey8430 2 days ago [-]
AI comment... BOO!
jorl17 2 days ago [-]
I want to like composer, but I just can't.
- Its communication style is completely opposite to Anthropic models. It's not as bad as OpenAI's models, which are obsessed with "shapes", "wrinkles", hyphenated-words, and other cryptic formulations that make you feel like you're not on planet earth after a while talking to them. But it is nonetheless markedly "rude", "dry", "cold", gives off this "entitled I'm right, you're wrong" attitude. I once had composer2-fast accidentally run `rm -rf $HOME` (no harm done) as part of a bug in an install script it wrote and all it could say once it realized it was: "Running script with proper hardening". Qwen's models have clearly been distilled from Anthropic models because they have a much closer communication style and that's why I hope cursor will one day release a new family of composer models derived from that. A damn joy to use.
- It's just dumb. I don't know what they're doing with benchmarks, but for my work (python, bash, docker, whatever), cursor is just incredibly dumb. Always does in 10 lines what could be done in one. Doesn't know loads of internals of things that other models know. Never places things in the right files, constantly makes terrible edits (inline imports, edits without testing). Everything is so complicated when done by composer2, it's just a joke to me at this point. It clearly needs more handholding than Opus 4.x or GPT-5.x. I tried 2.5-fast and it seemed more of the same. And this would sort of be acceptable if it owned up to its incompetence, but it is so confidently incompetent that it's revolting.
I know that for many people the "tone" of the models is not relevant, or maybe they even prefer models like these. I simply cannot work like that.
Ever since Gemini started blowing benchmarks out of the water while being a clearly inferior model incapable of producing anything (and pretty much just doing tool calls without any feedback to the user), I gave up on benchmarks. Composer has been more of the same in that regard.
As a GPT model would say:
"Small wrinkle: the production-ready benchmark results were tainted by real-world data points. I've assimilated the inconsistencies and added guardrails so that v2 has the right shape for future evaluations."
sergiotapia 3 days ago [-]
Congratulations on the launch! I'm interested in trying Cursor but it's very confusing what I should buy. What does the Pro $20 plan get me in usage if I only use Composer 2.5? How fast is the model?
darkwi11ow 3 days ago [-]
I use $20 plan on daily basis for more than a year now, and have yet to exhaust that limit. The plan includes $20 in api costs for non-Cursor premium models and $20 for Composer and Auto models provided by Cursor themselves.
That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.
People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.
kaizoku156 3 days ago [-]
The limits are probably even higher than that, i seem to get about 100$+ of usage on composer and about 45-50 usd on non composer models
uf00lme 2 days ago [-]
I wonder why they didn’t train off Kimi 2.6, I hope is it because they already had a good base and not that they messed up that relationship.
NitpickLawyer 2 days ago [-]
> and not that they messed up that relationship.
There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.
re-thc 2 days ago [-]
That's 3.0
bingud 2 days ago [-]
Seems like a promising and useful model but its probably scary how much customer data they fed into it to reach this performance
2 days ago [-]
Armonsrer 2 days ago [-]
It looks a massive update from cursor and i like their platform
Let hope its good
vanuatu 3 days ago [-]
It's always great that more companies are throwing their hat in the ring, especially focusing on value (latency + intelligence + cost)
I_am_tiberius 2 days ago [-]
I hope people soon wake up to the fact that they use user data for model fine tuning.
polski-g 3 days ago [-]
I don't know why their model isn't on Openrouter yet. They must not have enough capacity to offer it.
try-working 2 days ago [-]
A lot of people saying Cursor have no moat. Sure. Neither do OpenAI or Anthropic.
svantana 2 days ago [-]
You could say they have a sort of anti-moat (drawbridge?) since you can use their product to create a competitor. But that's true of most dev tools, in a sense.
neevans 2 days ago [-]
[dead]
big-chungus4 2 days ago [-]
Can you please train Qwen 3.5 like 0.8B to 9B using the same training techniques
jdlyga 3 days ago [-]
It's a bit odd that they're not comparing it against Sonnet
jjice 3 days ago [-]
I don't think so. They're comparing it to the highest tier available models from Anthropic and OpenAI. Generally speaking, Opus is better than Sonnet in almost every way, so why have the redundancy?
3836293648 2 days ago [-]
Price to performance?
jjice 2 days ago [-]
I think their comparison to how their benchmarks compare to Opus are a great way to show "look at similar benchmarks for a fraction of the cost". If it has Opus benchmarks (I don't actually take benchmarks seriously, but for their comparison purposes) and Sonnet is still more than half the price of Opus, I figure it's close enough where it doesn't matter.
CodingJeebus 3 days ago [-]
The tweet specifies that the new model is geared towards long-running tasks, which is what you'd use a model like Opus for anyway.
lukebrichey 3 days ago [-]
this feels super bullish on cursor/spacexai's ability to train a frontier level model. could be truly SOTA on coding given that their RL data is this powerful
svclaws 3 days ago [-]
Their previous Composer was already marketed as a cheap model capable of competing with SOTA on most tasks. The evals they shared back then backed this up but in my day-to-day usage it fell short across the board. Canceled my cursor subscription and switched to Claude Code a few weeks ago. It has its own shortcomings but in terms of model capability and UX quality Cursor will have a hard time competing in the long term. Elon Musk will be a very good way out for them.
Glohrischi 2 days ago [-]
Hahah wtf? They are training on colossus 2? Their own model?
Dude what the hell happened to Musks Grok? How incapable are they that they give away training compute to Cursor like this?
Weird that the genius Musk doesn't need his own compute, after all shouldn't Macrohard (no joke) already building the worlds software from scratch?
mgambati 2 days ago [-]
Words on the street is that xAI will buy cursor.
Glohrischi 2 days ago [-]
Yeah for 10-60 BILLION. which again makes this even stupider.
For this amount of money you can rebuild cursor and everything else on the market, and with the rest of 9-59 Billion, you just hire experts in coding and let them code real high quality code examples.
And then you just use your existing grok pipeline and just add this functionality.
This xAI stuff has to be run by idiots
radu_floricica 2 days ago [-]
Buy "Cursor", not "Cursor's IP". This means brand, users, and a shitton of data.
And if you combine a shitton of data with a lot of compute, large userbase and good engineers, you have a pretty good chance of doing something interesting.
Glohrischi 2 days ago [-]
Yeah you know how much 10-60 Billion are?
You could literaly just give your compute away for free for a year to pull people in.
Make an API Endpoint for free with the caviat that they are allowed to use the data for traing, what everyone else does too.
mgambati 2 days ago [-]
And you still don’t get the quality of data that cursor have which is the best due to being collected pre vibe coding.
timmmmmmay 2 days ago [-]
it seems like they were trying that last year, it didn't work, so he flipped out and fired everyone and now plan B is to buy Cursor and run a quick rename of "Composer 3" to "Grok 5"
enraged_camel 2 days ago [-]
I tested it yesterday. It is pretty bad. Just like with Composer 2, it's fast, but quality is nowhere near what Cursor claims with their benchmarks. It is not even at Opus 4.5 level.
I gave it a mix of refactoring tasks and new feature tasks. For each one, I had it write a plan, then I had Codex review it. Codex found major issues with every plan: patterns that don't match the rest of the code base, hallucinated variable/function names, and even outright bugs in the way the plan was written. I fed the feedback to Composer 2. After it made the changes and implemented the revised plan, I had Codex and Opus 4.7 do code reviews, and once again both of them found major bugs.
Overall it was a very frustrating experience. I feel like I wasted a whole day. Which is sad, as I have been looking for an excuse to come back to Cursor. But as things stand, Codex + CC combo cannot be beat, not just in terms of price but also quality.
Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models
I'm very curious to read about the Composer 3 architecture when it comes out. More frontier coding models are a good thing, especially if they diversify into different strengths/weaknesses.
Cursor adds in a large corpus of proprietary coding data — I think this is actually fairly hard to acquire right now, because claude and codex are so good.
I bet there’s enough talent at the Grok team to work with the cursor team and data to get something good out the door.
That said, I don’t track Grok’s engineering leads — I’m not sure who’s currently around, and who is not.
They were world-class senior developers and AI engineers most renowned in the AI research communities (e.g. Jimmy Ba the legend, Christian Szegedy, Igor Babuschkin, Greg Yang), poached from other companies to join xAI and they were getting very high salaries.
The mass exodus has been happening way before spacex merger though.
Post model 3 launch, Tesla had a number of senior folks leave almost immediately. My read at that time was they had hit or exceeded pareto-optimal on the suffering:wealth scale —- Tesla was clearly going to make it, and they had already vested 90% of the value they’d receive from Tesla ownership: why go suffer through the massive build out?
And in fact, in that era, Tesla did bring in a bunch of auto industry types to help scale, who as it happens also certainly did very well, but order of magnitude less well than the early peeps.
There might be some similar economics here: change of control will often fully vest early founders. Combined with incoming SX IPO, these guys are done financially — as in, already multibillionaires pre-IPO. You’d have to want to stay and the company would have to really want you to stay as well before it made economic sense to re-up.
People say a lot of things about working for Elon; things like “hardest work I ever did,” and “he made me extremely rich”, but you don’t read “that was easy” very often.
I have no idea if there’s enough talent right now at xAI to go build a foundation model, but in the immortal worlds of Carl Icahn: “don’t bet against Elon”
Hard to outperform the model you distill...
Distillation helps with world knowledge and things like that.
They do use it for synthetic data/judging though, so yes, hard to outperform.
Not that they need to. If they can basically match it for a fifth of the price.
They had properly licensed it and were complying with the terms of the license.
I feel distilling something as big as Opus would require many many more samples, but I dont really know much about this subject
Scale: Over 3.4 million exchanges
The operation targeted:
Agentic reasoning and tool use Coding and data analysis Computer-use agent development Computer vision Moonshot (Kimi models) employed hundreds of fraudulent accounts spanning multiple access pathways. Varied account types made the campaign harder to detect as a coordinated operation. We attributed the campaign through request metadata, which matched the public profiles of senior Moonshot staff. In a later phase, Moonshot used a more targeted approach, attempting to extract and reconstruct Claude’s reasoning traces.
I doubt it.
* publish the logs.
they do except they also send thousands of sex-spies to do espionage of this kind on the scale.
If you’re playing this game in 2026 you know the rules - anything goes
For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.
It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.
I fixed that by using cursor the agent but not the UI.
I'm just running cursor in GNU Emacs via agent-shell (https://github.com/xenodium/agent-shell). Their cli client (aptly named "agent") supports ACP (agent client protocol) so the UI can be skipped altogether.
I know this sounds like a meme ("use x in emacs") but at this point at the very least i can keep my workflows and my UI all the same and focus on my work rather than "where did $company put $feature this month".
What annoys me is how little they want to integrate with ...anything. Wanna open a link in your default browser? Use our built-in chromium fork, we insist. Wanna open a location in Zed? No, please use our half-baked editor re-implementation. Wanna open a location in Cursors own vscode-based editor? You can't. Managed to work around that somehow? We changed your files to "Worktree TS", disabling all your language servers. It's like programming on an iPhone.
It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.
They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).
One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.
[1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.
To me it feels like it's done entirely by an LLM, starting from the product vision.
https://cursor.com/docs/cli/installation
https://github.com/xenodium/agent-shell
I gave up, canceled my plan, and went back to boring old VSCode. It feels so much more stable, and my Mac no longer runs out of memory. With cursor I had to reboot my macbook several times a week and had to always be plugged in.
Should try their CLI!
One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.
- lags constantly,
- if you type while it's generating you'll get missed inputs,
- 'plan mode' doesn't clear context before starting work,
- you can't directly edit the plan, you can only ask the bot to do it,
- you can't immediately whitelist commands, only accept once or allow all.
https://cursor.com/cli
Impressive, yes. But they still don't have a moat...
The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.
I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.
Anyway, would love to see a comparison from someone who has used a recent version of each.
In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.
Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.
So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.
Example #2: I was just fixing some vibe-coded slop, where it was taking `click.echo( some_api.whatever_endpoint() )` and the "slop" portion was literally emitting: `str('{ "A": 1, "B": 2 }')` and that function call was emitting it directly.
On the command line, I was doing `blah whatever-endpoint --something | jq '.'` and got tired of the JQ thing, so I'm like: "I'll just use `json.dumps(...,indent=2)`", but lo and behold, I'm getting a dumb JSON string literal, not a pretty printed object shape.
I start typing `json.loads(` to move from "str()" to "dict()" ... and it autocompletes the whole scenario (on that line), then I move to `def some_other_endpoint` and it basically has that same edit queued up. (ie: it "knows" what i'm about to do).
...so overall, "faster than vim", even with high skill bar for repetition, motion, macros, sed-style edits, etc. You can't beat: "<tab>", especially when it's lightly intelligent (ie: knows when/what/str/int, adapts do different function calls, etc).
I like VS Code with the Claude Plugin, and sometimes with the Codex Plugin
Over the last week, I tried out two other agents on my personal projects: dirac and forgecode, after seeing impressive results from both of them on terminal bench.
After a good amount of testing, and over $100 in open router spend, I'm back to cursor.
I really liked forgecode the best, and it feels better than claude code, but cursor definitely feels best to me. Composer 2.5 is fast and effective, and it makes a huge difference. I was running `forge` with Opus, and it was taking dozens of minutes to do things, and the feedback loop was so slow.
The previous version of composer was also much faster, and it makes a difference. Maybe people like context switching, but I prefer to stay focussed on the task in front of me, and I'm reviewing the code carefully.
I think that's a pretty good moat. I was ready to end my subscription a week ago, and now I'm back after learning the grass is not necessarily greener on the other side of the fence.
Every MAG 7 / FAANG company already has more users and more data...
That's not a moat.
That's traction.
That's Y.
Plus you are always running the risk of being rude and insulting when incorrectly labeling text actually written by humans as slop — making a jackass of yourself — and opening yourself up to being trolled by humans purposefully inserting em-dashes and catch phrases just to trigger you. That's not clever. That's gullible.
How much cognitive and physical effort and time do you put into trying to figure out if everything you read is slop, then complaining about it? If that's your job or calling in life, you could be easily replaced with AI. Find something more creative to do with your time.
If you really object to low effort slop, and not just relish it as an opportunity to whine, then how about instead of posting low effort whines about slop, you put in the actual effort to do something about it, and rewrite the slop in a way that won't trigger your slop detector, then post that instead, to train AI not to write slop.
Is your problem that it's slop, or that it's AI generated? Because your whining about low effort AI generated slop without contributing to the conversation or addressing the point of the comment you're replying to is just low effort human generated slop.
Please don't post slop while complaining about slop.
What's wrong with using very short sentences like 'That's not X. That's Y.'?
Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.
I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.
For that same $200 a month, I could use claude code and basically never hit usage limits.
I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.
& now they're still losing all of their users to Claude Code and Codex.
Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.
It's not like Cursor harness is the best out there.
And even if I want to edit the code, I don't need to run the agent harness in an IDE.
But Cursor has a CLI harness.
With so much money and computing from SpaceX, is not so impressive.
I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.
How I see this is that its so important to bundle the model with the right tooling.
Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).
So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks
Wouldn't this compress ai revenue like 15x quickly
If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing
Maybe they are getting elon to cover cost
"Will this decrease Revenue?" -- only if demand for high quality tokens is inelastic. If demand is instead elastic (grows with cheaper pricing) then revenue will likely increase.
"Will this lower earnings?" -- they have a current inference margin for their old models, and with the Elon deal in place, they have a new inference margin. It might be better or worse than their old one. If it's worse, then they'd need to see a concomitant increase in usage. If they don't, then yes it might lower earnings.
"Will this lower corporate value?" -- no - not least because this company is going to be owned by SpaceX approximately 90 days after IPO -- so all the new owner will care about is being benchmark competitive with Anthropic and oAI for the first n quarters. If they can do that, it will massively increase the corporate value of SX; it's hard to build a frontier lab.
One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.
The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."
So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.
i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha
that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token
Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?
2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.
- AI revenue going up & cost/token are not related metrics, at least not in the way you are assuming - basically all players (except OAI for the moment) struggling with capacity and/or reducing-dismissing subscription based solutions in favour of pay-per-use. If token cost/token was falling, we would see quite the opposite.
Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.
We could look at tasks instead. Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?
At this point, more of my complaints are on the harness side, which is odd since originally they were by far the best harness out there.
Support - This is pretty much non-existant, it's community support or sales support.
Interacting with GitHub - this should work and be awesome, Claude code does this well (responding to lint errors and comments). Cursor you have to poke the agent to look at the comments or lint errors, and even then it's about 10% good. Even GitHub Copilot is better here.
Bugbot - I have it setup to trigger manually, but it still seems to wake up and burn 80-120k tokens just to notice it's configured to be manually invoked. When it does run, it tells me there's no issues (but claude or copilot both find real things)
App - When you have both agent window and the ide windows, it's hard to open up the code in the right directory. A simple "cursor ." from the terminal used to do it, now it'll often open the agent window, you have to try a few times for it to work.
I love that they are running super fast, it's just hard when many of the basics break or don't work.
Tangent: we've been using Linear at work and I still don't understand why it claims to be "task tracking for agents". Is there anything at all that lends itself better to agentic workflows compared to JIRA or gitlab/github issues or whatever else?
Seems like Linear just hopped on the buzzword hype train at the exact right moment...
I think you nailed it. Provided an agent can connect and ingest the information in the ticket, that's basically what's needed. I guess it's nice to be able to nudge ticket status and post back to it, but all of those seem like wiring up existing APIs to an MCP and calling it good. I don't see why JIRA couldn't execute on that, despite being Atlassian.
I like the "copy prompt" feature, it's super simple but makes it just a few seconds to go from issue -> claude session.
Also assigning directly to cursor or codex, that's how I handle the easier tasks.
We also have scheduled tasks that elaborate existing tickets with information where needed, again that's just MCP but it works well enough
https://getcomposer.org/
You'd think the founders would Google for naming conflict before choosing a name.
They certainly wouldn’t be great at coming up with new words for a product name.
So now 2.5 is supposed to compete with opus 4.7? Sure…
Cursor team Codex team Claude team
Swap between the models when limited.
I am saving our company a lot of money vs Claude enterprise usage cost
I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.
The fast version of composer is the default now (which costs ~x3 as much).
Tab completion + chat-in-sidebar feels like an extension of my editing. An agentic harness feels more like delegating a 20-minute task and coming back to review. Different cognitive load, different bug profile. The "which is better" framing tends to skip over the fact that they reward different working styles.
Two things I'd watch on Composer 2.5 specifically:
1. How it handles long-running multi-file refactors that touch 10+ files. My experience with smaller models in that slot is they lose track of which files they've already edited around 30% of the way through. Frontier models keep the plan coherent for longer.
2. How it deals with non-obvious file boundaries. The thing that takes me out of "let it work" mode is the model deciding it needs to edit a config file I didn't think of. Usually that's right, but occasionally it's spelunking somewhere I don't want it to be.
The Kimi K2.5 base is interesting on its own. Open weights below frontier closed models is the thing worth watching from the harness side. If anyone's set up to fine-tune for a specific harness, this is the moment.
1/ CursorBench is so opaque [1] that it makes it hard to trust. Not to mention the v3.1 eval is a newer iteration and there's no insight into the tasks or if the model was just tuned to max it out. Composer 2 previously scored between 60-65% on the previous benchmark eval [2] but scores between 50-55% on CB v3.1[3].
2/ I've experienced Composer 2's performance and it leaves much to be desired as a daily driver for a knowledge worker. but KWs are obviously not the target users and I can see how it's cost-efficient for executing on clearly-defined, discrete coding tasks. Obviously that's their value proposition and they're figuring out how to communicate it well to the target customer. It just doesn't feel like CursorBench is that.
[1] https://cursor.com/blog/cursorbench#building-cursorbench
[2] https://cursor.com/blog/composer-2-technical-report#performa...
[3] https://cursor.com/blog/composer-2-5
I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.
[1] https://cursor.com/blog/composer-2-5
As for the typo, s's are cheap and I've added one :)
This is not a property of the model, but a property of the discipline; it can be operationalized by what you have documented before the session begins. Without "stop editing where you can no longer follow your changes to the spec" and "go back and read the migration file before changing the schema," there is nothing to halt the process until it fails integration.
Those teams who get consistent results independent of the model being used typically do so because they have operationalized their discipline first. Those switching out models monthly tend to expect the model to supply them.
Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.
They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.
And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.
Not for the faint of heart.
It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").
The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.
Ffs.
edit: removed cursing you out. Sorry but this is frustrating. I don’t leave AI generated comments here (or anywhere else).
Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.
My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.
1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.
2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.
https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to...
> They’ve been highly successful so far. Raised $50B,
They have not raised $50B. The article you linked says they're raising $2B, not $50B.
The valuation is not the amount raised.
The point is you made two brigade style comments about my posts sounding suspiciously like an LLM and having hallucinations.
Neither turned out to be true and I think a better response would concede the point.
It may be more helpful for us to stick together as humans since we can’t always recognize each other so easily anymore.
Your comment DOES sound like an LLM and it DOES have hallucinations!
Please make your humanness more recognizeable next time, don't waste readers time with posh fanboying and lazy fact checking.
If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.
To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.
See eg Kimi 2.6 benchmarks: https://www.kimi.com/blog/kimi-k2-6
I realize you’re a troll account but at least be a fun troll.
This is the last time I'm responding. Good luck on whatever journey you're on. I'm sure it's an interesting journey since you've realizations over troll accounts, very interesting.
Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.
- Its communication style is completely opposite to Anthropic models. It's not as bad as OpenAI's models, which are obsessed with "shapes", "wrinkles", hyphenated-words, and other cryptic formulations that make you feel like you're not on planet earth after a while talking to them. But it is nonetheless markedly "rude", "dry", "cold", gives off this "entitled I'm right, you're wrong" attitude. I once had composer2-fast accidentally run `rm -rf $HOME` (no harm done) as part of a bug in an install script it wrote and all it could say once it realized it was: "Running script with proper hardening". Qwen's models have clearly been distilled from Anthropic models because they have a much closer communication style and that's why I hope cursor will one day release a new family of composer models derived from that. A damn joy to use.
- It's just dumb. I don't know what they're doing with benchmarks, but for my work (python, bash, docker, whatever), cursor is just incredibly dumb. Always does in 10 lines what could be done in one. Doesn't know loads of internals of things that other models know. Never places things in the right files, constantly makes terrible edits (inline imports, edits without testing). Everything is so complicated when done by composer2, it's just a joke to me at this point. It clearly needs more handholding than Opus 4.x or GPT-5.x. I tried 2.5-fast and it seemed more of the same. And this would sort of be acceptable if it owned up to its incompetence, but it is so confidently incompetent that it's revolting.
I know that for many people the "tone" of the models is not relevant, or maybe they even prefer models like these. I simply cannot work like that.
Ever since Gemini started blowing benchmarks out of the water while being a clearly inferior model incapable of producing anything (and pretty much just doing tool calls without any feedback to the user), I gave up on benchmarks. Composer has been more of the same in that regard.
As a GPT model would say:
That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.
People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.
There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.
Dude what the hell happened to Musks Grok? How incapable are they that they give away training compute to Cursor like this?
Weird that the genius Musk doesn't need his own compute, after all shouldn't Macrohard (no joke) already building the worlds software from scratch?
For this amount of money you can rebuild cursor and everything else on the market, and with the rest of 9-59 Billion, you just hire experts in coding and let them code real high quality code examples.
And then you just use your existing grok pipeline and just add this functionality.
This xAI stuff has to be run by idiots
And if you combine a shitton of data with a lot of compute, large userbase and good engineers, you have a pretty good chance of doing something interesting.
You could literaly just give your compute away for free for a year to pull people in.
Make an API Endpoint for free with the caviat that they are allowed to use the data for traing, what everyone else does too.
I gave it a mix of refactoring tasks and new feature tasks. For each one, I had it write a plan, then I had Codex review it. Codex found major issues with every plan: patterns that don't match the rest of the code base, hallucinated variable/function names, and even outright bugs in the way the plan was written. I fed the feedback to Composer 2. After it made the changes and implemented the revised plan, I had Codex and Opus 4.7 do code reviews, and once again both of them found major bugs.
Overall it was a very frustrating experience. I feel like I wasted a whole day. Which is sad, as I have been looking for an excuse to come back to Cursor. But as things stand, Codex + CC combo cannot be beat, not just in terms of price but also quality.