I use Cursor daily - here's how I avoid the garbage parts

250 points by striat 7 months ago

walthamstow 7 months ago

Eng leadership at my place are pushing Cursor pretty hard. It's great for banging out small tickets and improving the product incrementally kaizen-style, but it falls down with anything heavy.

I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times. I think may be doing the same to me too.

Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.

As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.

rco8786 7 months ago

I’ve been a paying cursor user for 4-5 months now and feeling the same. A lot more mistakes leaking into my PRs. I feel a lot faster but there’s been a noticeable decrease in the quality of my work.
Obviously I could just better review my own code, but that’s proving easier said than done to the point where I’m considering going back to vanilla Code.
- DanHulton 7 months ago
  
  There's this concept in aviation of "ahead of or behind the plane". When you're ahead of the plane, you understand completely what it's doing and why, and you're literally thinking in front of it, like "in 30 minutes we have to switch to this channel, confirm new heading with ATC" and so forth. When you're behind the plane, it has done something expected and you are literally thinking behind it, like "why did it make that noise back there, and what does that mean for us?"
  I think about coding assistants like this as well. When I'm "ahead of the code," I know what I intend to write, why I'm writing it that way, etc. I have an intimate knowledge of both the problem space and the solution space I'm working in. But when I use a coding assistant, I feel like I'm "behind the code" - the same feeling I get when I'm reviewing a PR. I may understand the problem space pretty well, but I have to basically pick up the pieced of the solution presented to me, turn them over a bunch, try to identify why the solution is shaped this way, if it actually solves the problem, if it has any issues large or small, etc.
  It's an entirely different way of thinking, and one where I'm a lot less confident of the actual output. It's definitely less engaging, and so I feel like I'm way less "in tune" with the solution, and so less certain that the problem is solved, completely, and without issues. And because it's less engaging, it takes more effort to work like this, and I get tired quicker, and get tempted to just give up and accept the suggestions without proper review.
  I feel like these tools were built without any sort of analysis if they _were_ actually an improvement on the software development process as a whole. It was just assumed they must be, since they seemed to make the coding part much quicker.
  - ygra 7 months ago
    
    That's a great analogy. For me it is a very similar feeling that I get ripped out of "problem solving mode" into "code review mode" which is often a lot more taxing for me.
    It also doesn't help reviewing such code that sometimes surprisingly complex problems are solved correctly, while there's surprisingly easy parts that can be subtly (or very) wrong.
    
    ryandrake 7 months ago
    
    Yes great analogy!
    A hard pill to swallow is that a lot of software developers have spent most of their careers "behind the code" instead of out ahead of it. They're stuck for years in an endless "Junior Engineer" cycle of: try, compile, run, fix, try, compile, run, fix--over and over with no real understanding, no deliberate and intentional coding, no intimacy, no vision of what's going on in the silicon. AI coding is just going to keep us locked into this inferior cycle.
    All it seems to help with is letting us produce A Lot Of Code very quickly. But producing code is 10% of building a wonderful software product....
  - Breza 7 months ago
    
    This is such a great analogy! Exactly how I feel when using AI tools. I have had some incredibly productive conversations about high-level design where I explain my goals and the approaches I'm considering. But then the actual code will have subtle bugs that are hard to find.
  - speedbird 7 months ago
    
    Also very much in the spirit of "children of the magenta line" https://www.computer.org/csdl/magazine/sp/2015/05/msp2015050...
  - matwood 7 months ago
    
    Unlike an airplane you can stop using the assistant at any time and catch up. Those who learn to leverage AI will have an advantage.
- ljm 7 months ago
  
  Same result - I tried it for a while out of curiosity but the improvements were a false economy: time saved in one PR is time lost to unplanned work afterwards. And it is hard to spot the mistakes because they can be quite subtle, especially if you've got it generating boilerplate or mocks in your tests.
  Makes you look more efficient but it doesn't make you more effective. At best you're just taking extra time to verify the LLM didn't make shit up, often by... well, looking at the docs or the source.. which is what you'd do writing hand-crafted code lol.
  I'm switching back to emacs and looking at other ways I can integrate AI capabilities without losing my mental acuity.
  - _rwo 7 months ago
    
    > And it is hard to spot the mistakes because they can be quite subtle
    aw yeah; recently I spent half a day pulling my hair debugging some cursor-generated frontend code just to find out the issue was buried in some... obscure experimental CSS properties which broke a default button behavior across all major browsers (not even making this up).
    Velocity goes up because you produce _so much code so quickly_, most of which seems to be working; managers are happy, developers are happy, people picking up the slack - not so much.
    I obviously use LLMs to some extent during daily work, but going full-on blind mode on autopilot gotta crash the ship at some point.
  - geoduck14 7 months ago
    
    Can you elaborate on the mistakes you see? What languages are you working with?
    
    ljm 7 months ago
    
    Just your run-of-the-mill hallucinations, e.g. mocking something in pytest but only realising afterwards that the mock was hallucinated, the test was based on the mock, and so the real behaviour was never covered.
    I mean, I generally avoid using mocks in tests for that exact reason, but if you expect your AI completions to always be wrong you wouldn't use them in the first place.
    Beyond that, the tab completion is sometimes too eager and gets in the way of actually editing, and is particularly painful when writing up a README where it will keep suggesting completely irrelevant things. It's not for me.
    
    rco8786 7 months ago
    
    > the tab completion is sometimes too eager and gets in the way of actually editing
    Yea, this is super annoying. The tab button was already overloaded between built-in intellisense stuff and actually wanting to insert tabs/spaces, now there are 3 things competing for it.
    I'll often just want to insert a tab, and end up with some random hallucination getting inserted somewhere else in the file.
    
    crucialfelix 7 months ago
    
    Seriously, give us our tab key back! I changed accept suggestion to shift TAB.
    But still there is too much noise now. I don't look at the screen while I'm typing so that I'm not bombarded by this eager AI trying to distract me with guesses. It's like a little kid interrupting all the time.
    
    walthamstow 7 months ago
    
    I just turned tab off. If I'm writing myself, if I'm in the flow, I don't need any help. If I want the tool to write for me, I'll ask it to.
    
    j45 7 months ago
    
    Can tell it to check it before and after if it doesn't do something and it can improve.
    Also telling it not to code, or not to jump to solutions is important. If there's a file outlining how you like to approach different kinds of things, it can take it into consideration more intuitively. Takes some practice to pay attention to your internal dialogue.
- Aeolun 7 months ago
  
  I feel like this is also related to cursor getting worse, not better, over the past few months.
Torkel 7 months ago

I saw this post from a professor this morning:
https://x.com/lxeagle17/status/1899979401460371758
Students are not asking questions anymore.
Small assignments work well!
But then the big test comes and scores are at an all time low.
- HenryBemis 7 months ago
  
  Yes it is exactly what we think is causing it. Students use LLMs to get 10/10 in assignments so they don't learn 'the thing' and they tank on the big-closed-books-non-LLM-exams.
  BUT (apologies for the caps), back in the day we didn't have calculators and now we do. And perhaps the next phase in academia is "solve this problem with the LLM of your choice - you can only use the free versions of LLAMA vX.Y, ChatGTP vA.B, etc. - no paid subscriptions allowed" (in the same spirit that for some exams you can use the simple calculator and not the scientific one). Because if they don't do it, they (academia/universities) will lose/bleed out even more credibility/students/participation.
  The world is changing. Some 'parts' are lagging. Academia is 5 years behind (companies paying them for projects help though), Politicians are 10-15 years behind (because the 'donors' (aka bribers) prefer a wild-wild-west for a few years before rights are protected. (Case and point writers/actors applying a lot of pressure when they realized that politicians won't do anything until cornered)
  - meheleventyone 7 months ago
    
    Calculators replaced books of look up tables and slide rules. As tools they're not really replacing thinking. They help calculate the result but not make good decisions about what to calculate.
    LLMs are replacing thinking and for students the need to even know the basics. From the perspective of an academic program if they're stopping the students learning the materials they're actively harmful.
    If you're saying that LLMs obviate the need to understand the basics I think that's dangerously wrong. You still need a human in the loop capable of understanding whether the output is good or bad.
  - vasachi 7 months ago
    
    When you need to calculate a square root of 12.345 during a physics exam, professor doesn't care that you use a calculator, because the exam doesn't test your calculating ability. But it does test your knowledge of physics. What is the point of allowing LLM use during such a test?
dr_kiszonka 7 months ago

> I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
I have a feeling there will be a serious shortage of strong "old-school" seniors in a few years. If students and juniors are reliant on AI, and we recruit seniors from our juniors, who will companies turn to when AI gets stuck?
- mbil 7 months ago
  
  I wrote about this recently. I agree, though I think it might take a little longer. There will be a deskilling where the easy problems are solved by AI, so new grad workers won’t have gradual experience to grow into seasoned experts.
- mwgalloway 7 months ago
  
  I’m looking forward to when the job market recovers, but I’m not looking forward to the prospect of a significant amount of future demand being in the realm of having to scale and maintain the AI slop code that’s being generated now.
  - ryandrake 7 months ago
    
    It sounds like a pretty lucrative retirement plan, no matter how boring and frustrating the actual work would be.
    Sadly, I don't think companies are going to hire graybeards to maintain AI slop code. They're just going to release low-quality garbage, and the bar for what counts as good software will get lower and lower.
- HenryBemis 7 months ago
  
  I feel that (I can feel it in my own skin/life) that those 'oldies' with "ok" SME skills and "great" business acumen will be the ones harnessing the hordes of 'prompt engineers'.
  It is like they will be the 'new nerds' and we will have the 'street-smarts'.
arkh 7 months ago

> I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
Same as StackOverflow, same as Google, same as Wikipedia for students.
The problem is not using the tools, it's what you do with the result. There will always be lazy people who will just use this result and not think anymore about it. And then there will always be people who will use those results as a springboard to what to look for in the documentation of whatever tool / language they just discovered thanks to Cursor.
You want to hire from the second category of people.
- m_fayer 7 months ago
  
  Quantity has a quality all it’s own.
  That is to say, as we strive for better and better tools along a single axis, at some point the social dynamics shift markedly, even though it’s just “more of the same”. Digital distribution was just better distribution, but it changed the very nature of journalism, music, and others. Writing on computers changed writing. And the list goes on.
  “This is just the next thing in a long line of things” is how we technologists escape the disquieting notion that we are running more and more wild social experiments.
- addicted 7 months ago
  
  > Same as StackOverflow, same as Google, same as Wikipedia for students.
  Is it though?
  If I used stack overflow, for example, I still needed to understand the code well enough to translate it to my specific codebase, changing variable names at the very least.
- breakfastduck 7 months ago
  
  This is not the same. Because SO, google etc require actual research, introspection, prompting AI does not.
  - tomrod 7 months ago
    
    You're describing superficial behavior -- akin to taking the first match on Google or SO and saying that people are only doing that regarding AI prompts.
    Good prompting _does_ require engagement and, for most cases, some research, just like SO or Google.
    Sure, you can throw out idle or lazy queries. The results will be worse generally.
KronisLV 7 months ago

> Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
> As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
If it gets too expensive, then I guess the alternative becomes using something like Continue.dev or Cline with one of the providers like Scaleway that you can rent GPUs from or that have managed inference… either that, or having a pair of L4 cards in a closet somewhere (or a fancy Mac, or anything else with a decent amount of memory).
Whereas if there are no well priced options anywhere (e.g. the upfront investment for a company to buy their own GPUs to run with Ollama or something else), then that just means that running LLM based systems nowadays is economically infeasible for many.
lolinder 7 months ago

> Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
Can you elaborate on what you're referring to? I don't used Cursor extensively, but I do pay for it, and it was a flat fee annual subscription for unlimited requests, with the "fast" queue being capped at a set number of requests per month with no reference to their size.
Claude Code does work the way you say, since you provide it your Anthropic API key. But I have not seen Cursor charging for context or completion tokens, which is actually the reason why I'm paying for it instead of using something like Aider.
https://www.cursor.com/pricing
evo_9 7 months ago

Has anyone tried limiting Cursor or Cline, etc, to a higher level role such as analysis and outlining proposed changes, and then coding those changes yourself with minimal LLM interaction? Aka, ask to define / outline only a high level set of changes, but do no actual changes to any file; then proceed through the outlined work, limiting Cursor to roughing out the work and hand writing the actual critical bits? That’s the approach I’ve been taking, a sort of best of both worlds that greatly speeds me up without taking the hands 100% off the wheel.
- vineyardmike 7 months ago
  
  This seems like the worst of both worlds. The human still has to do the "boring" work of writing out all the boiler plate stuff, but now there is a machine telling the human what to do. Oh, and the machine is famously not great at big question type stuff while being being much more performant at churning out boilerplate.
- tgdude 7 months ago
  
  I find the opposite. I tend to think through the problem myself, give cursor/claude my understanding, guide it through a few mistakes it makes, have it leave files at 80% good enough as it codes and gets stuck, and then spend the next 20 min or so cleaning up the changes and fixing the few wire up spots it got wrong.
  Often I will decompose the problem into smaller subproblems and feed those to cursor one by one slowly building up the solution. That works for big tickets.
  For me the time saving and force multiplier isn't necessarily in the problem solving, I can do that faster and better in most cases, but the raw act of writing code? It does that way faster than me.
  - mock-possum 7 months ago
    
    Yeah that’s been my approach as well - and honestly I’m not even sure that it’s necessarily faster, it’s just different. Sometimes I feel like getting my hands dirty and writing the code myself - LLMs can be good for getting yourself unstuck when you’re facing an obstacle too. But other times, I’d rather just sit back and dictate requirements and approaches, and let the robot dream up to implementation itself.
- real0mar 7 months ago
  
  Yeah. Reasoning models like r1 tend to be good for architecting changes and less optimal for actually writing code. So this allows the best of both worlds.
j45 7 months ago

Heavier architecture / systems design stuff is sometimes easier to do with a differently configured cursor, or claude.
The key I find is experience doing that kind of stuff to begin with, vs domain experience as well, vs little to none and wanting to learn the ropes.
firecall 7 months ago

> As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
Exactly!
As you learn a codebase, you and it become better together.
It can take a while, but the investment in understanding the stack you are working with does pay off.
Trying to shortcut junior devs to productivity is the plot to a cyberpunk horror movie!
belter 7 months ago

It seems these days every other post on HN is about Cursor.
Is Cursor ultimately not using GPT-4, GPT-4o, Claude 3.5/3.7 Sonnet and so on? If some of the auto completion and agent features might be nice, is this not too much push for what is essentially just another UI plugin?
- smokel 7 months ago
  
  Don't underestimate how much effort it takes to apply a technique (LLMs) to a certain domain.
  Creating a good user interface, processing the code into appropriate embeddings for useful matching, and marketing are just three things that take a lot of effort to get right.
  The fragmented landscape of alternatives is not attractive at all.
  One thing I'm quite worried about, though, is that Cursor is a fork of VS Code. This will never be maintainable in the long run, in particular if Microsoft does not want them to continue.
- rafram 7 months ago
  
  They have their own next edit prediction model, and for me personally that’s the killer feature. I don’t like the VS Code platform very much, but it’s just magical to be able to press tab-tab-tab and watch Cursor jump between lines in the file and make the edits that I was clearly about to make.
brushfoot 7 months ago

> Cursor deciding which files to add to context then charging you for the size of the context
Copilot is unlimited for a flat fee [1]. I've been happy with it.
They also have a free tier now, but it's limited to 50 chats a month, which I'd burn through pretty quickly.
1: https://docs.github.com/en/copilot/about-github-copilot/subs...
malux85 7 months ago

Then only use it for the small tasks? There's one button you have to click to turn it off.
siva7 7 months ago

Has anyone else noticed how sharp the quality of composer and chat output has dropped since their newest release? I had a good opinion about them but they seem to have dropped the ball
theshrike79 7 months ago

What do you consider "heavy"? Is it optimising an algorithm or "rewrite this whole codebase in <a different language>"?
- walthamstow 7 months ago
  
  Refactoring a critical function that is very large and complex. Yeah, maybe it shouldn't be so large and so complex, but it is and that's the codebase we have. I'm sure many other companies do too.
  - kamaal 7 months ago
    
    Thats not how apex productivity folks have used any IDE productivity leap including this one.
    You dont outsource your thinking to the tool, You do the thinking and let the tool type it for you.
    
    walthamstow 7 months ago
    
    You're missing the step where I have to articulate (and type) the prompt in natural language well enough for the tool to understand and execute. Then if I fail, I have to write more natural language.
    You said just the same in another of your posts:
    > if you can begin to describe the function well
    So I have to learn how to describe code rather than just writing it as I've done for years?
    Like I said, it's good for the small stuff, but bad for the big stuff, for now at least.
    
    ModernMech 7 months ago
    
    > So I have to learn how to describe code rather than just writing it as I've done for years?
    If we keep going down this path, we might end up inventing artificial languages for the purpose of precisely and unambiguously describing a program to a computer.
    
    kamaal 7 months ago
    
    You mean logic programming? https://en.wikipedia.org/wiki/Logic_programming
    Its been around for decades. In fact this was the first approach to doing AI.
    In logic programming you basically write concrete set of test cases and the compiler generates the code for which the test cases hold 'true'.
    In other words you get a language to 'precisely and unambiguously describe a program', as you said. Compiler writes the code for you.
    
    sexyman48 7 months ago
    
    Sad that GP's wit was lost on you.
    
    splatzone 7 months ago
    
    And yet I learned something!
    
    kamaal 7 months ago
    
    Exactly my point, you don't ask the LLM to give you the whole function. That would be too much English work, because that means you need to write down the contract of the function in concrete and/or list(list of lists).
    You ask it to give you one block at a time.
    iterate over the above list and remove all strings matching 'apple'
    open file and write the above list etc etc kind of stuff.
    Notice how the English here can be interpreted only way, but the LLM is now a good intelligent coding assistant.
    >>I think in code, I'd rather just write in code.
    Continue to think, just make the LLM type out the outcome of your ideas.
    
    swatcoder 7 months ago
    
    I'm sure you have a great point to make, but you're making it very poorly here.
    Experienced developers develop fluency in their tools, such that writing such narrow natural language directives like you suggest is grossly regressive. Often, the tasks don't even go through our head in English like that, they simply flow from the fingers onto the screen in the language of our code, libraries, and architecture. Are you familiar with that experience?
    What you suggest is precisely like a fluently bilingual person, who can already speak and think in beautiful, articulate, and idiomatic French, opting to converse to their Parisian friend through an English-French translation app instead of speaking to them directly.
    When applied carefully, that technique might help someone who wants to learn French get more and deeper exposure than without a translation app, as they pay earnest attention to the input and output going through their app.
    And that technique surely helps someone who never expects to learn French navigate their way through the conversations they need to have in a sufficient way, perhaps even opening new and eventful doors for them.
    But it's an absolutely absurd technique for people whose fluency is such that there's no "translating" happening in the first place.
    You can see that right?
    
    kamaal 7 months ago
    
    >>Experienced developers develop fluency in their tools
    >>You can see that right?
    I get it, but this as big a paradigm shift as much as Google and Internet was to people in the 90s. Some times how you do things changes, and that paradigm becomes too big of a phenomenon to neglect. Thats where we are now.
    You have to understand sometimes a trend or a phenomenon is so large that fighting it is pointless and some what resembles luddite behaviour. Not moving on with time is how you age out and get fired. Im not talking about a normal layoff, but more like becoming totally irrelevant to whatever that is happening in the industry at large.
    
    swatcoder 7 months ago
    
    That could totally be something that we encounter in the future, and perhaps we'll eventually be able to see some through line from here to there. Absolutely.
    But in this thread, it sounds like you're trying to suggest we're already there or close to it, but when you get into the details, you (inadvertently?) admitted that we're still a long ways off.
    The narrow-if-common examples you cited slow experienced people down rather than speed them up. They surely make some simple tasks more accessible to inexperienced people, just like in the translation app example, and there's value in that, but it represents a curious flux at the edges of the industry -- akin VBA or early PHP -- rather than a revolutionary one at the center.
    It's impactful, but still quite far from a paradigm shift.
    
    cess11 7 months ago
    
    From actual software output it seems to me like the big SaaS LLM:s compete with Wordpress and to some extent with old school code generation. It does not look like a paradigm shift to me. Maybe you can explain why you're convinced otherwise.
    Some quite large organisations have had all-hands meetings and told their developers that they must use LLM support and 'produce more', we'll see what comes of it. Unlike you I consider it to be a bad thing when management and owners undermine workers through technology and control (or discipline, or whatever we ought to call the current fashion), i.e. the luddites were right and it's not a bad thing to be a luddite.
    
    kasey_junk 7 months ago
    
    I very much want to include more ai in my workflows. But every time I try it slows me way down. Then when people give examples like yours it feels like we are just doing different tasks all together. I can write the code you mention above much faster than the English to describe it in half a dozen languages. And it will be more precise.
    Perhaps the written word just doesn’t describe the phenomenon well. Do you have any goto videos that show no -toy examples of pairing with an ai that you think illustrate your point well?
    
    skydhash 7 months ago
    
    > can write the code you mention above much faster than the English to describe it in half a dozen languages
    Especially if you’re fluent in an editor like Vim, Emacs, Sublime, have setup intellisense and snippets, know the language really well and are very familiar with the codebase.
    
    kasey_junk 7 months ago
    
    Which were all tools I had to learn and slowed me down while I was learning them. So I’m extremely sympathetic to the idea that ai can be a productivity enhancer.
    But those had obvious benefits that made the learning cost sensible. “I can describe in precise English, a for loop and have the computer write it” flat sounds backwards.
    
    zdragnar 7 months ago
    
    > iterate over the above list and remove all strings matching 'apple' > open file and write the above list etc etc kind of stuff.
    Honestly, I can write the code faster to do those things than I can write the natural language equivalent into a prompt (in my favored language). I doubt I could have gotten there without actually learning by doing, though.
    This feels an awful lot like back in the day when we were required to get graphing calculators for math class (calculus, iirc), but weren't allowed to get the TI-92 line that also had equation solvers. If you had access to those, you'd cripple your own ability to actually do it by hand, and you'd never learn.
    Then again, I also started programming with notepad++ for a few years and didn't really get into using "proper" editors until after I'd built up a decent facility for translating mind-to-code, which at the time was already becoming a road less travelled.
    
    walthamstow 7 months ago
    
    So you do in fact agree that the tool can only be trusted to do small chunks of basic action and can't be used to do anything heavy.
    
    kamaal 7 months ago
    
    It writes blocks of 50 - 100 lines fairly fast in one go, and I work in such block chunks one at a time. And I keep going as I already have a good enough idea of how a program could look like. I can write some thing like a 10,000(100 such iterations) line Perl script in no time. Like in one/two working day(s).
    Thats pretty fast.
    If you are telling me I ask it to write the entire 20,000 line script in one go, that's not how I think. Or how I go about approaching anything in my life, let alone code.
    To go a far distance, I go in cycles of small distances, and I go a lot of them.
    
    bluefirebrand 7 months ago
    
    > It writes blocks of 50 - 100 lines fairly fast in one go, and I work in such block chunks one at a time.
    How many lines of natural language do you have to write in order to get it to generate these 50-100 lines correctly?
    I find that by the time I have written the right prompt to get a decently accurate 100 lines of code from these tools, which I then hage to carefully review, I could have easily written the 100 lines of code myself
    That doesn't make the tool very useful, especially because reviewing the code it generates is much slower than writing it myself
    Not to mention the fact that even if it generates perfect bug free code (which I want to emphasize: it never ever seems to do) if I want to extend it I still have to read it thoroughly, understand it, and build my own mental model of the code structure
    
    skydhash 7 months ago
    
    And there’s the value of snippets and old code. Especially for automation and infrastructure tasks. If your old project is modular enough, you can copy paste a lot of the boilerplate.
    
    arkh 7 months ago
    
    > You ask it to give you one block at a time.
    I think many people need to do some maintenance work on really old projects.
    "One block at a time", I'd like to see the LLM which will find for me the CRON script doing some database work hidden in a forgotten VM.
    
    skydhash 7 months ago
    
    Or just maintenance really. Where you do only 2 hours of coding for a week work.
    
    cess11 7 months ago
    
    I've spent a lot of time trying to make Aider and Continue do something useful, mainly with Qwen coder models. In my experience they suck. Maybe they can produce some scaffolding and boilerplate but I already have deterministic tools for that.
    Recently I've tried to make them figure out an algorithm that can chug through a list of strings and collect certain ones, grouping lines with one pattern under the last one with another pattern in a new list. They consistently fail, and not in ways that are obvious at a glance. Fixing the code manually takes longer than just writing the code.
    Usually it compiles and runs, but does the wrong thing. Sometimes they screw up recursion and only collect one string. Sometimes they add code for collecting the strings that are supposed to be grouped but don't use it, and as of yet it's consistently wrong.
    They also insist on generating obtuse and wrong regex, sometimes mixing PCRE with some other scheme in the same expression, unless I make threats. I'm not sure how other people manage to synthesise code they feel good about, maybe that's something only the remote models can do that I for sure won't send my commercial projects to.
onion2k 7 months ago

I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
As someone old enough to have built websites in Notepad.exe it's totally reasonable that I ask my teams to turn off syntax highlighting, brace matching, and refactoring tools in VSCode. I didn't have them when I started, so they shouldn't use them today. Modern IDE features are just making them lazy.
/s
- moron4hire 7 months ago
  
  Not all change is progress.
  Change comes with pros and cons. The pros need to outweigh the cons (and probably significantly so) for change to be considered progress.
  Syntax highlighting has the pro of making code faster to visually parse for most people at the expense of some CPU cycles and a 10 second setting change for people for whom color variations are problematic. It doesn't take anything away. It's purely additive.
  AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
  My junior developers think I don't know they are using AI coding tools. I discovered it about 2 months into them doing it, and I've been tracking their productivity both before and after. In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time. Or at all. Even basic things have to be rewritten because they aren't suitable for purpose. And in our pair programming sessions, I see them frozen up now, where they weren't before they started using the tools. I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
  I tried to use AI code generation once to fill in some ASP.NET Core boilerplate for setting up authentication. Should be basic stuff. Should be 3 or 4 lines of code. I've done it before, but I forgot the exact lines and had been told AI was good for this kind of lazy recall of common tasks. It gave me a stub that had a comment inside, "implement authentication here". Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation. And it still wasn't done. I haven't touched AI code gen since.
  So IDK. I'm very skeptical of the claims that AI is writing significant amounts of working code for people, or that it at all rivals even a moderately smart junior developer (say nothing of actually experienced senior). I think what's really happening is that people are spending a lot of time spinning the roulette wheel, always betting on 00, and then crowing they're a genius when it finally lands.
  - kamaal 7 months ago
    
    >>In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time.
    Most people are using it to finish work soon, rather than use it to do more work. As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
    >>I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
    I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent. Things are only getting easier with time and have been like this for a few centuries. None of this is wrong.
    >>Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation.
    Honestly this largely reads like how my dad would describe technology from the 2000s. It was always that he was better off without it. Whether that was true or false is up for debate, but the world was moving on.
    
    swatcoder 7 months ago
    
    > As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
    I think you just hit the core point that splits people in these discussions.
    For many senior engineers, we see our jobs are to build better and more lasting products. Correctness, robustness, maintainability, consistency, clarity, efficiency, extensibility, adaptability. We're trying to build things that best serve our users, outperform our competition, enable effective maintenance, and include the design foresight that lets our efforts turn on a dime when conditions change while maintaining all these other benefits.
    I have never considered myself striving towards "newer and bigger" projects and I don't think any of the people I choose to work with would be able to say that either. What kind of goal is that? At best, that sounds like the prototyping effort of a confused startup that's desperately looking to catch a wave it might ride, and at worst it sounds like spam.
    I assure you, little of the software that you appreciate in your life has been built by senior engineers with that vision. It might have had some people involved at some stage who pushed for it, because that sort of vision can effectively kick a struggling project out of a local minimum (albeit sometimes to worse places), but it's unlikely to have been a seasoned "senior engineer" being the one making that push and (if they were) they surely weren't wearing that particular hat in doing so.
    
    kamaal 7 months ago
    
    I don't get this idea that to build a stable product you must make your life hard as much as possible.
    One can use ai AND build stable products at the same time. These are not exactly opposing goals, and even above that assuming that ai will always generate bad code itself is wrong.
    Very likely people will build both stable and large products using ai than ever before.
    I understand and empathise with you, moving on is hard, especially when these kind of huge paradigm changing events arrive, especially when you are no longer in the upswing of life. But the arguments you are making are very similar to those made by boomers about desktops, internet and even mobile phones. People have argued endlessly how the old way was better, but things only get better with newer technology that automates more things than ever before.
    
    swatcoder 7 months ago
    
    I don't feel like you read my comment in context. It was quite specifically responding to the GP's point of pursuing "bigger and better" software, which just isn't something more senior engineers would claim to pursue.
    I completely agree with you that "one can use ai AND build stable products at the same time", even in the context of the conversation we're having in the other reply chain.
    But I think we greatly disagree about having encountered a "paradigm changing event" yet. As you can see throughout the comments here, many senior engineers recognize the tools we've seen so far for what they are, they've explored their capabilities, and they've come to understand where they fit into the work they do. And they're simply not compelling for many of us yet. They don't work for the problems we'd need them to work for yet, and are often found to be clumsy and anti-productive for the problems they can address.
    It's cute and dramatic to talk about "moving on is hard" and "luddism" and some emotional reaction to a big scary immanent threat, but you're mostly talking to exceedingly practical and often very-lazy people who are always looking for tools to make their work more effective. Broadly, we're open to and even excited about tools that could be revolutionary and paradigm changing and many of us even spend our days trying to discover build those tools. A more accurate read of what they're saying in these conversations is that we're disappointed with these and in many cases and just find that they don't nearly deliver on their promise yet.
    
    moron4hire 7 months ago
    
    > life getting easier for the young isnt exactly something we must resent.
    I don't see how AI is making life easier for my developers. You seem to have missed the point that I have seen no external sign of them being more productive. I don't care if they feel more productive. The end result is they aren't. But it does seem to be making life harder for them because they can't seem to think for themselves anymore.
    > a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products
    Well then, we're in agreement. I should reveal to my juniors that I know they are using AI and that they should stop immediately.
    
    ModernMech 7 months ago
    
    > I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent.
    Of course not. But, eventually these young people are going to take over the systems that were built and understood by their ancestors. History shows what damage can be caused to a system when people who don't fully understand and appreciate how it was built take it over. We have to prepare them with the necessary knowledge to take over the future, which includes all the warts and shit piles.
    I mean, we've spent a lot of our careers trying to dig ourselves out of these shit piles because they suck so bad, but we never got rid of them, we just hid them behind some polish. But it's all still there, and vibe coders aren't going to be equipped to deal with it.
    Maybe the hope is that AI will reach god-like status and jut fix all of this for us magically one day, but that sounds like fixing social policy by waiting for the rapture, so we have to do something else to assure the future.
  - onion2k 7 months ago
    
    AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
    At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
    I'm very skeptical of the claims that AI is writing significant amounts of working code for people
    You may be right, but people write far too much code as it is. Software development should be about thinking more than typing. Maybe AI's most useful feature will be writing something that's slightly wrong in order to get devs to think about a good solution to their problem and then they can just fix the AI's code. If that results in better software then it's a huge win worth billions of dollars.
    The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
    
    moron4hire 7 months ago
    
    > At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
    I am extremely skeptical that LLM-based generative AI running in silicon-based digital computers will improve to a significant degree over what we have today. Ever.
    GPT-2 to GPT-3 was a sea change improvement, but every since then, new models are really only incrementally improving, despite taking exponentially more power and compute to train. Coupled with the fact that processor are only getting wider, not faster, or less energy consuming, then without an extreme change in computing technology, we aren't getting there with LLMs.
    Either the processors or the underlying AI tech need to change, and there is no evidence this is the case.
    > The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
    I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
    
    ModernMech 7 months ago
    
    Same. I've heard that this area is improving exponentially for years now, but I can't really say the results I'm getting are any better than I originally experienced with ChatGTP.
    The dev tooling has gotten better; I use the integrated copilot every day and it saves me from writing a lot of boilerplate.
    But it's not really replacing me as a coder. Yeah I can go further faster. I can fill in gaps in knowledge. Mostly, I don't have to spend hours on forums and stack overflow anymore trying to work out an issue. But it's not replacing me because I still have to make fine-grained decisions and corrections along the way.
    To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
    And like you I don't see how to get there from where we are. I think we're at a local maxima here.
    
    onion2k 7 months ago
    
    To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
    And like you I don't see how to get there from where we are. I think we're at a local maxima here.
    To continue the car analogy - are you really suggesting we're at 'peak car'? You don't believe that cars in 20 years time are going to be significantly better than the cars we have today? That's very pessimistic.
    
    ModernMech 7 months ago
    
    There was a meme from back when - "if cars advanced like computers we'd be getting 1000 miles per gallon by now".
    Thinking back to the car I had 20 years ago, it's not all that different from the car I have now.
    Yes, the car I have now has a HUD, Carplay, Wireless iPhone charging, an iPhone app, adaptive cruise control, and can act as a wifi hotspot. But fundamentally it still does the same thing in the same way as 20 years ago. Even if we allow for EVs and Hybrid cars, it's still all mostly the same. Prius came out in 2000.
    And now we've reached the point where computers advance like cars. We're writing code in the same languages, the same OS, the same instruction set, for the same chips as we did 20 years ago. Yes, we have new advancements like Rust, and new OSes like Android and iOS, and chipsets like ARM are big now. But iPhone/iPad/iMac, c/C++/Swift, OSX/MacOS/iOS, PowerPC/Intel/ARM.... fundamentally it's all homeomorphic - the same thing in different shapes. You take a programmer from the 70s and they will not be so out of place today. I feel like I'm channeling Bret Victor here: https://www.youtube.com/watch?v=gbHZNRda08o
    And that's not for lack of advancements in languages, Os, instruction sets, and hardware architectures, it's for a lack of investment and commercialization. I can get infinite money right now to make another bullshit AI app, but no one wants to invest in an OS play. You'll hear 10000 excuses about how MS this and Linux that and it's not practical and impossible and there's no money in it, so on and so forth. The network effects are too high, and the in-group dynamic of keeping things the way they are is too strong.
    But AGI? Now that's something investors find totally rational and logical and right around the corner. "I will make a fleet of robot taxis that can drive themselves with a camera" will get you access to unlimited wallets filled with endless cash. "I will advance operating systems past where they have been languishing for 40 years" is tumbleweeds.
    
    moron4hire 7 months ago
    
    The cars we have today are not so far off from the cars of 100 years ago. So yes, I highly doubt another 20 years of development, after all the low hanging fruit has already been picked, will see much change at all.
    
    ModernMech 7 months ago
    
    "after all the low hanging fruit has already been picked"
    Another great analogy. LLMs allow us to pick low hanging fruit faster. If we want to pick the higher fruit, we'll need fundamentally different equipment, not automated ways to pick low hanging fruit.
    
    bluefirebrand 7 months ago
    
    For what its worth I agree quite strongly with you and moron4hire in this thread
    I wanted to make an observation that what you two are describing seems to me like it maps onto the Pareto principle quite neatly
    LLMs seem like they have rapidly (maybe exponentially) approached the 80% effectiveness threshold, but the last 20% is going to be a much higher bar to clear
    I think a lot of the disagreement around how useful these tools are is based around this. You can tell which people are happy with 80% accuracy versus those with higher standards
    
    skydhash 7 months ago
    
    > Mostly, I don't have to spend hours on forums and stack overflow anymore trying to work out an issue
    Who does that? If I can’t find something within 15 minutes on the web, it’s back to reading specs, docs, and code. Or bringing out the debugger.
    
    ModernMech 7 months ago
    
    I think a lot of people use forums and stack overflow to find information about a problem they are facing. But you're right, specs, docs, and code are other sources that I hadn't mentioned. I find that forums and stack overflow have utility beyond the documentation, because the docs don't contain information about issues people have faced in the past and accounts about how they have been solved.
    But the same idea applies to specs/docs -- instead of reading the specs or the docs, I'd rather be talking to a LLM trained on the specs and docs.
    
    Winsaucerer 7 months ago
    
    There was a point where I started to dive more into specs, docs, and code, and less relying on finding the answer I'm after.
    But in defence of those spending hours on forums, sometimes a project is not well documented, and the code isn't easy to read. In those situations though, I'll be browsing their Github issues or contacting them directly.
    Often though I find it highly valuable to go read the docs, even if an LLM has given me a working example. Sometimes, I find better ways, warnings, or even information on unrelated things I want to do.
    
    satvikpendem 7 months ago
    
    *ChatGPT, GPT stands for generative pre-trained transformer.
    
    onion2k 7 months ago
    
    I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
    I'm saying there is a lot of value in tools that are merely 'better than the status quo'. An AI assistant doesn't need to be as good as a dev in order to be useful.
    
    bluefirebrand 7 months ago
    
    These tools don't behave like assistants though, despite being advertised as being assistants
    They turn you into the assistant instead of the programmer in the driver seat
    Whenever you use them, you have to shift "code review mode", which is not the role of the primary programmer on a task, it is the role of a secondary programmer doing a PR
    It's "a lot of value" if you like being the assistant to a very inconsistent junior programmer
- hansvm 7 months ago
  
  I interview people every day who don't recognize glaring flaws despite the syntax highlighting and other linting in their favorite environment telling them something is definitely very wrong. Much like with LLMs, I'm firmly in the camp of (1) master working without it, (2) master working with it, (3) integrate it into your workflow in the way that suites you best.
  If your team can't recognize bullshit without syntax highlighting, they're going to struggle even when it's turned on. Once they've mastered the underlying principles, syntax highlighting will make them much more effective.
- timothygold 7 months ago
  
  I think your mistaking programmer productivity with A.I that generates all the code for you allowing you to switch off your brain completely. prompt engineering code is not the same skill as programming and being good at it does not mean you actually understand how code or software works.
  - voidfunc 7 months ago
    
    Why do you need to understand how the code or software works if the thing the AI generates satisfies the requirements? That's where we are headed.
    
    satvikpendem 7 months ago
    
    Because it eventually fails and to fix it, one must understand the code, something that is lacking when "vibe coding."
    
    bluefirebrand 7 months ago
    
    No one with this mentality has any business working with code ever
    Even the writers of Star Trek, a fictional fantasy show with near-magic super AI, understood that the Engineers would eventually have to fix something the AI couldn't, without the AI helping them
    
    mwgalloway 7 months ago
    
    Because if you’re working on anything of note, you will need to sustain, scale, and operate that software while hopefully designing for it at a high level. LLMs are not suited for those problems, at least not anytime soon
    
    matwood 7 months ago
    
    How do you know it satisfies the requirements?
- hakunin 7 months ago
  
  Yes. It is reasonable to ask them to turn off these things temporarily as an exercise, if you think it will help junior engineers write better code.
  All teaching works this way. Would it be totally reasonable to have junior pilots only practice with autopilot?
- walthamstow 7 months ago
  
  Is this supposed to be funny?
  - brookst 7 months ago
    
    Funny / sad. GP is just highlighting the all too common attitude of people who grew up using new tech (graphing calculators, Wikipedia, etc) who reach a certain age and suddenly new tech is ruining the youth of today.
    It’s just human nature, you can decide if it’s funny or sad or whatever
    
    walthamstow 7 months ago
    
    Neither of you have comprehended the part of my post where I talk about myself and my own skills.
    Hiding behind the sarcasm tag to take the piss out of people younger than you, I don't think that's very funny. The magnetised needle and a steady hand gag from xkcd, now that is actually funny.
  - onion2k 7 months ago
    
    Yes.
bitwize 7 months ago

In a few years, if you mention you just straight up code with an editor or IDE, there's going to be a hackernews who's going to ask you about the size of your project at work (LOC, dev team headcount) because you can't really be working on something big and important without AI.
kamaal 7 months ago

>>but it falls down with anything heavy.
If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong.
>>I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
When I first started work, my employer didn't provide internet access to employees, their argument would always be how would you code if there was no internet connection, out there in the real world? , As it turns out they were not only worried about the wrong problem, but the got the whole paradigm about this new world wrong.
In short it was not worth building anything at all in a world internet doesn't exist.
>>then one day it's not cheap ...
Again you are worried about the wrong thing, your worry should not be what happens when its no longer cheaper, but when it, as a matter of fact gets cheaper. Which it will.
- bluefirebrand 7 months ago
  
  > If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong
  Then what value are they actually adding?
  If this is all they are capable of, surely you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
  I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
  Why are people so bullish on them?
  - galbar 7 months ago
    
    This is how I feel. I mentioned this to a couple of friends over a beer and their answer was that there are many not "decently competent programmer"s in the industry currently and they benefit immensely from this technology, at the expense of the stability and maintainability of the system they are working on.
  - kamaal 7 months ago
    
    English to Code translation.
    Albeit they are fairly context aware as to what you are asking. So they can save a lot of RTFM and code/test cycles. At times they can look at the functions that are already built, and write new ones for you, if you can begin to describe the function well.
    But if you want to write a good function, like written to fit tightly to specifications. Its too much English. You need to describe in steps what is to be done, plus exceptions. And at some point you are just doing logic programming(https://en.wikipedia.org/wiki/Logic_programming) In the sense that whole english text looks like a list of and/or situations + exceptions.
    So you have to go one atomic step(a decision statement and a loop) at a time. But thats a big productivity boost too. Reason being able to put lots of text in place without you having to manually type it out.
    >>you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
    Honestly speaking most of coding is manually laborious if you don't know touch typing. And even if you did know its a chore.
    I remember when I started using co-pilot with react it was doing a lot of otherwise typing work I'd have to do.
    >>I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
    IMO opinion, my brain atleast over the years has seen so many code patterns, debugging situations and what to anticipate and assemble as I go, that having some intelligent typing assistant is a major productivity boost.
    >>Why are people so bullish on them?
    Eventually newer programming languages will come along and people will build larger things.
  - aprilthird2021 7 months ago
    
    I'll give a serious answer. The AI code completions are way faster and way easier to read for issues than me typing code out by hand. That's already a good 10-20% productivity boost right there for me.
    
    skydhash 7 months ago
    
    Which I highly doubt for anything other than greenfield project. When you look at any mature open source project, it’s mostly small commits, but a large context is required in order to make these. If you’re in boilerplate land, becoming fluent in your editor and using snippets can bring a more advantageous boost.
    
    aprilthird2021 7 months ago
    
    No I work in a huge company and many of our systems are 10+ years old. I did have a lot of snippets but AI is good at handling new things I haven't seen before and wouldn't have had a snippet for yet. It's also great for extending configurations / large switch statements / etc. the kind of stuff you see in big codebases.
    It's terrible when you get to complex code, but I'd rather spend most of my time there anyways
    
    player1234 7 months ago
    
    Supercool bro'! Show these luddites! See non-believers this one works at a huge company solving hard tasks. So again please teach us! How did you get the 10-20%? How did you measure it? What methodology? Control group? Is it 10-20 more projetcs done? Or profits!!??? I bet its profits!! This guy is living the dream!!!
    
    player1234 7 months ago
    
    10-20% nice! How did you measure it?
  - ModernMech 7 months ago
    
    Honestly, a lot of the problems people have with programming that they use AI to solve can be solved with better language design and dev tools.
    For example, I like LLMs because they take care of a lot of the boilerplate I have to write.
    But I only have to write that boilerplate because it's part of the language design. Advances in syntax and programming systems can yield similar speedups in programming ability. I've seen a 100x boost in productivity that came down to switching to a DSL versus C++.
    Maybe we need more DSLs, better programming systems, better debugging tools, and we don't really need LLMs the way LLM makers are telling us? LLMs only seem so great because our computer architecture, languages and dev tooling and hardware are stuck in the past.
    Instead of being happy with the Von Neumann architecture, we should be exploring highly parallel computer architectures.
    Instead of being happy with imperative languages, we should be investing heavily in studying other programming systems and new paradigms.
    Instead of being happy coding in a 1D text buffer, we should be investing more in completely imaginative ways of building programs in AR, VR, 3D, 2D.
    LLMs are going to play a part here, but I think really they are a band-aid to a larger problem, which is that we've climbed too high in one particular direction (von-neuman/imperative/text) and we are at a local maxima. We've been there since 2019 maybe.
    There are many other promising peaks to climb, avenues of research that were discovered in the 60s/70s/80s/90s have been left to atrophy the past 30 years as the people who were investigating those paths refocused or are now gone.
    I think all these billions invested in AI are going to vaporize, and maybe then investors will focus back on the fundamentals.
    LLMs are like the antenna at the top of the Empire State Building. Yes, you can keep going up if you climb up there, but it's unstable and eventually there really is a hard limit.
    If we want to go higher that that, we need to build a wider and deeper foundation first.

laborcontract 7 months ago

Cursor's current business model produces a fundamental conflict between the well-being of the user and the financial well-being of the company. We're starting to see these cracks form as LLM providers are relying on scaling through inference-time compute.

Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.

On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.

Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.

rafaelmn 7 months ago

>Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. While that seems like a perfectly reasonable strategy, it starts to fall apart when integrating reasoning models.
In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.
NitpickLawyer 7 months ago

> This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.
There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.
There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.
Roritharr 7 months ago

This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.
Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.
- laborcontract 7 months ago
  
  If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.
  - Roritharr 7 months ago
    
    Good callout, will try! I haven't considered switching tools, it's mostly convenience of just continuing, instead of stopping mid-way through and switch out the tools. But also I only code intermittently, a couple of days a week at most these days, because it's only part of what I do, so I can get to experiment with new tooling much less than i'd like.
    
    laborcontract 7 months ago
    
    Cheers, give it a shot. Cline runs as an extension within Cursor, so you can use it to augment your existing workflows with almost zero disruption.
IanCal 7 months ago

> Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.
- laborcontract 7 months ago
  
  what i mean is that their implementation (thinking only on the first response) renders zero benefit because it doesn’t see the code itself. They run multiple function calls to analyze your codebase in increments. If they ran the thinking model on the output of those function calls, then performance would be great but, so far, this is not what they are doing (yet). It also dramatically increases the cost of running the same operation.
  - IanCal 7 months ago
    
    But the way those models work is to run everything once the function calls come in. Are you saying cursor is not using the model you selected on function calls responses?
  - throwaway314155 7 months ago
    
    This sounds like a Cursor issue, not something that effects reasoning models in general.
    edit: Ah, I see what you mean now.
    
    laborcontract 7 months ago
    
    That's my point. Cursor, by offering unlimited requests (500 fast requests + unlimited slow requests) to people paying a fixed $20/mo, they've put themselves into a ruthless marginal cost optimization game where one of their biggest levers for success is reducing context sizes and discouraging thinking after every function call.
    Software like Claude Code and Cline do not face those constraints, as the cost burden is on the user.
MrBuddyCasino 7 months ago

> Cursor has been trying to do things to reduce the costs of inference, especially through context pruning.
You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.
- laborcontract 7 months ago
  
  I've just tried gemini-2.0-flash, this is an incredible model that's great for making edits. I haven't tried any heavy lifting with it yet but It's replaced Claude for a lot of my edits. It's also great at agentic stuff too!
- laborcontract 7 months ago
  
  I love cline but i’ve never tried the gemini models with it. I’ll give it a shot tonight, thanks for the tip!
- greyman 7 months ago
  
  Or you can also use Gemini Code Assist extension for VS Code, which is basically free, but so far, the code it wrote almost never worked for me. So far I use only Claude 3.7 or Grok in chat mode. Almost no model, as of today, is good at coding.
  - MrBuddyCasino 7 months ago
    
    Did Grok 3 finally get an API?
sandbach 7 months ago

I think you're right, but what company's business model doesn't produce a conflict between the user's well-being and the company's finances?
namaria 7 months ago

Reflecting on your comment I realized that using a huge amount of GPUs is akin to an Turing machine approaching infinite speed. So I think the promise of LLMs writing code is basically saying: if we add a huge number of reading/writing heads with unbounded number of rules, we can solve decideability. Because what is the ability to generate arbitrarily complex code if not solving the halting problem? Maybe there's a more elegant or logical way to postulate this, or maybe I'm just confused or plain wrong, but it seems to me that it is impossible to generate a program that is guaranteed to terminate unless you can solve decideability. And throwing GPUs at a huge tape is just saying that the tape approaches infinite size and the Turing machine approaches infinite speed...
Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.

cyprx 7 months ago

I had been using Cursor for a month until a day when my house got no internet, then i realized that i started forgetting how to write code properly

risyachka 7 months ago

I had the exact same experience, pretty sure this happens in most cases, people just don’t realize it
ant6n 7 months ago

Just get a Mac Studio with 512GB RAM and run a local model when the internet is down.
- jjude 7 months ago
  
  Which local model would you recommend that comes close to cursor in response quality? I have tried deepseek, mistral, and few others. None comes close to quality of cursor. I keep coming back to it.
  - ant6n 7 months ago
    
    Possibly useful comment on local models, perhaps also fitting on machines with less ram:
    https://news.ycombinator.com/item?id=43340989
  - itsabackupplan 7 months ago
    
    [flagged]
    
    walthamstow 7 months ago
    
    A $10k backup plan? That makes sense. No wonder you used a throwaway.
    
    itsabackupplan 7 months ago
    
    [flagged]
    
    walthamstow 7 months ago
    
    The hardware isn't free. Someone asks a question and your answer is who cares about $10k of hardware hanging around as a subpar backup?
    
    itsabackupplan 7 months ago
    
    See, the thing is, I never wanted to comment on the cost or feasibility of the hardware at all. What I was commenting was that any backup plan is expected to be subpar by very nature, and if not, shouldnbe instantly promoted. If you'll notice that was 100% of what I said. I was adding to the pile of "this plan is stupid". Cursor has an actual value proposition.
    Of course then you disrespected me with a rude ad hominem and got a rude response back. Ignoring the point and attacking the persin is a concession. M
    For the record, I and many others use throwaways wvery single thread. This isn't and shouldn't be reddit.
    
    walthamstow 7 months ago
    
    You're right, I shouldn't have said the throwaway bit, sorry. However, you're ignoring the context of the conversation, which is a $10k piece of hardware. I don't know what you expected to add to the conversation by saying "who cares?" when someone asks for advice, in context or even in isolation.
    
    grimgrin 7 months ago
    
    wrong. where the user is asking for recommended models (for offline use), they’re not saying “yes in fact I will burn $10000 on a computer”, not at all lol
- _puk 7 months ago
  
  Back up your $20 a month subscription with a $2000 Mac Studio for those days your internet is down.
  Peak HN.
  - automatic6131 7 months ago
    
    Lol he suggested a $10k Mac Studio
    But you can at least resell that $10k Mac Studio, theoretically.
    
    mettamage 7 months ago
    
    Trying to do that with an M1 laptop of 32 GB and it's hard to get even 1000 euro's for it in the Netherlands whereas the refurbished price is at double of that.
  - yohannesk 7 months ago
    
    Even more absurd is that Mac Studio with 512GB RAM costs around $9.5K
    
    ant6n 7 months ago
    
    > Peak HN.
    But, alas, not a single upvote.
  - timothygold 7 months ago
    
    Maybe this "backup" solution.. developed into commodity hardware as an affordable open source solution that keeps the model and code locally and private at all times is the actual solution we need.
    Lets say a cluster of raspberry pi's / low powered devices producing results as good as claude 3.7-sonnet. Would it be completely infeasible to create a custom model that is trained on your own code base and might not be a fully fledged LLM but provides similar features to cursor?
    Have we all gone bonkers sending our code to third parties? The code is the thing you want to keep secret unless your working on an open source project.
  - rullopat 7 months ago
    
    2000$? You wish!
    
    _puk 7 months ago
    
    Lol, not sure where I got the 2k from. Brain fart, but I'll let it stand :D
- pknerd 7 months ago
  
  Can one run cursor with local LLMs only?
- SCdF 7 months ago
  
  ... to make *completely* sure that they forgot how to program?
- eadmund 7 months ago
  
  But then I’d be using a Mac, and that would slow my development down and be generally miserable.
  - throwaway314155 7 months ago
    
    lol
vrnvu 7 months ago

Me too. I completely forgot the standard library and basic syntax of my daily language. wow. I went back to VSCode and use Cursor for the AI model to ask questions.

jillesvangurp 7 months ago

The UX of tools like these is largely constrained by how good they are with constructing a complete context of what you are trying to do. Micromanaging context can be frustrating.

I played with aider a few days ago. Pretty frustrating experience. It kept telling me to "add files" that are in the damn directory that I opened it in. "Add them yourself" was my response. Didn't work; it couldn't do it somehow. Probably once you dial that in, it starts working better. But I had a rough time with it creating commits with broken code, not picking up manual file changes, etc. It all felt a bit flaky and brittle. Half the problem seems to be simple cache coherence issues and me having to tell it things that it should be figuring out by itself.

The model quality seems less important than the plumbing to get the full context to the AI. And since large context windows are expensive, a lot of these tools are cutting corners all the time.

I think that's a short term problem. Not cutting those corners is valuable enough that a logical end state is tools that don't do that that cost a bit more. Just load the whole project. Yes it will make every question cost 2-3$ or something like that. That's expensive now but if it drops by 20x we won't care.

Basically large models that support huge context windows of millions/tens of millions of tokens cost something like the price of a small car and use a lot of energy. That's OK. Lots of people own small cars. Because they are kind of useful. AIs that have a complete, detailed context of all your code, requirements, intentions, etc. will be able to do a much better job that one that has to guess all of that from a few lines of text. That would be useful. And valuable to a lot of people.

Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.

mbeex 7 months ago

> aider [...] It kept telling me to "add files" that are in the damn directory that I opened it in.
That's intentional, and I like it. It limits the context dynamically to what is necessary (of course it makes mistakes). You can also add files with placeholders and in a number of other ways. but most of the time I let Aider decide. It has a repomap (https://aider.chat/docs/repomap.html), gradually building up knowledge and makes proposals based on this and other information it gathered also with token costs and out-of-context-window in mind.
As for manual changes: aider is opinionated regarding the role of Git in your workflow. At first glance, this repels some people and some stick to this opinion. For others, it is exactly one of the advantages, especially in combination with the shell-like nature of the tool. But the standard Git handling can still be overridden. For me personally, the default behavior becomes more and more smooth and second nature. And the whole thing is scriptable, I only begin to use the possibilities.
In general: Tools have to be learned, impatient one-shot attempts are simply not enough anymore.
jampekka 7 months ago

> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
OTOH currently the LLM companies are probably taking a financial loss with each token. Wouldn't be surprised if the price doesn't even cover the electricity used in some cases.
Also e.g. Gemini already runs on Google's custom hardware, skipping the Nvidia tax.
_heimdall 7 months ago

> Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
That still leaves us with an ungodly amount of resources used both to build the GPUs and to run them for a few years before having to replace them with even more GPUs.
Its pretty amazing to me how quickly the big tech companies pivoted from making promises to "go green" to buying as many GPUs as possible to burn through entire powerplants worth of electricity.
_1tem 7 months ago

Try Claude Code. It figures out context by itself. I’m having a lot of success with it for a few days now, whereas I never caught on with Cursor due to the context problem.
- _--__--__ 7 months ago
  
  I have not tried Claude Code, but besides the model lock-in the number one complaint I have heard is that it consistently over provides context leading to high token usage.
  - _1tem 7 months ago
    
    I don't care about model lock-in as long it actually gets the job done. Claude Code is the only AI solution I've tried that can actually make deep, meaningful changes across frontend and backend on a mature enterprise codebase.
    
    danielbln 7 months ago
    
    I would put Cline in the same level of tools, with the added benefit of deep VSCode integration.

2sk21 7 months ago

I read this point in the article with bafflement:

"Learn when a problem is best solved manually."

Sure, but how? This is like the vacuous advice for investors: buy low and sell high

dkersten 7 months ago

By trying things and seeing what it’s good and bad at. For example, I no longer let it make data modelling decisions (both for client local data and database schemas), because it had a habit of coding itself into holes it had trouble getting back out of, eg duplicating data that it then has difficulty keeping in sync, where a better model from the start might have been a more normalised structure.
But I came to this conclusion by first letting it try to do everything and observing where it fell down.

Amekedl 7 months ago

Compounding the opinions of other commentors, I feel that using Cursor is a bad idea. It's a closed source SaaS, and with these components involved, service quality can do wild swings on a daily basis, not something I'm particularly keen of.

Aurornis 7 months ago

AI tools aren’t all or nothing. You use them when it makes sense and you go back to regular coding when they don’t (or when they’re unavailable).
You’re also not limited to a single tool. You can switch to different tools and even have multiple editors open at the same time.
turnsout 7 months ago

There's always Aider with local models!
rco8786 7 months ago

This is true of every single service provider outside of fully OSS solutions, which are a teeny tiny fraction of the world's service providers.

blainm 7 months ago

I've found tools like Cursor useful for prototyping and MVP development. However, as the codebase grows, they struggle. It's likely due to larger files or an increased number of them filling up the context window, leading to coherence issues. What once gave you a speed boost now starts to work against you. In such cases, manually selecting relevant files or snippets from them yields better results, but at that point it's not much different from using the web interface to something like Claude.

Semaphor 7 months ago

I had that same experience with Claude Code. I tried to do a 95% "Idle Development RPG" approach to developing a music release organization software. At the beginning, I was really impressed, but with more and more complexity, it becomes increasingly incoherent, forgetting about approaches and patterns used elsewhere and reinventing the wheel, often badly.
turnsout 7 months ago

Agreed. One useful tip is to have Cursor break up large files into smaller files. For some reason, the model doesn't do this naturally. I've had several Cursor experiments grow into 3000+ line files because it just keeps adding.
Once the codebase is reasonably structured, it's much better at picking which files it needs to read in.
blitzar 7 months ago

Or the context not being large enough for all the obscure functions and files to go into the context. I am too basic to have dug deep enough, but a simple (automatic) documentation context for the entire project would certainly improve things for me.

kevingadd 7 months ago

> Like mine will keep forgetting about nullish coallescing (??) in JS, and even after I fix it up it will revert my change in its future changes. So of course I put that rule in and it won't happen again.

I'm surprised that this sort of pattern - you fix a bug and the AI undoes your fix - is common enough for the author to call it out. I would have assumed the model wouldn't be aggressively editing existing working code like that.

worldsayshi 7 months ago

Yeah I have seen this a bunch of times as well. Especially with deprecated function calls. It generates a bunch of code. I get deprecation warnings. I fix them. Copilot fixes them back. I have to explicitly point out that I made the change for it to not reintroduce the deprecations.
I guess that while code that compiles is easier to train for but code with warnings less so?
I remember there are other examples of changes that I have to tell the AI I made to not have it change it back again, but can't remember any specific examples.
Aeolun 7 months ago

It’s due to a problem with Cursor not updating the state of the files that have been manually edited since the last time they were used in the chat, so it’ll thing the fix is not there and blindly output code that doesn’t have it. The ‘apply’ model is dumb, so it just overwrites the corrected version with the wrong one.
I think the changelog said they fixed it in 0.46, but that’s clearly not the case.
- oefrha 7 months ago
  
  Yep I asked about this exact problem the other day: https://news.ycombinator.com/item?id=43308153 Having something like “always read the current version of the file before suggesting edits” in Cursor rules doesn’t help, the current file is only read by the agent sometimes. Guess no one has a reliable solution yet.
siquick 7 months ago

Cursor in agent mode + Sonnet 3.7 love nothing better than rewriting half your codebase to fix one small bug in a component.
I've stopped using agent unless its for a POC where I just want to test an assumption. Applying each step takes a bit more time but means less rogue behaviour and better long term results IME.
- krets 7 months ago
  
  Sounds like a human colleague of mine
- worldsayshi 7 months ago
  
  > love nothing better than rewriting half your codebase to fix one small bug in a component
  Relatable though.
  - MarcelOlsz 7 months ago
    
    Reminds me of my old co-worker who rewrote our code to be 10x faster but 100x more unreadable. AI agent code is often the worst of both of those worlds. I'm going to give [0] this guy's strategy a shot.
    [0] https://www.youtube.com/watch?v=jEhvwYkI-og
- WA 7 months ago
  
  If you stopped using agent mode, why use Cursor at all and not a simple plugin for VSCode? Or is there something else that Cursor can do, but a VSCode plugin can't?

DeathArrow 7 months ago

Apart from the fact that it chews fast requests like there's no tomorrow I dislike how it does changes I didn't ask to. And if I ask to undo what it did without being asked, it goes on and beaks more code.

In my test application I had a service which checked the cache, then asked the repository if no data is in cache, then uses external APIs to fetch some data, combine it and update the DB and the cache.

I asked Cursor to change using DateTime type to using Unix timestamp. It did the changes but it also removed cache checks and calling external APIs, so my web app relied just on the data in DB. When asked to add back what it removed, it broke functionality in other parts of the application.

And that is with a small simple app.

torginus 7 months ago

I have been a religious Cursor + Sonnet user for like past half a year, and maybe I'm an idiot, but I don't like this agentic workflow at all.

What worked for me is having it generate functions, classes, ranging from tens of lines of code to low hundreds. That way I could quickly interate on its output and check if its actually what I wanted.

It created a prompt-check-prompt iterative workflow where I could make progress quite fast and be reasonably certain of getting what I wanted. Sometimes it required fiddling with manually including files in the context, but that was a sacrifice I was willing to make and if I messed up, I could quickly try again.

With these agentic workflows, and thinking models I'm at a loss.

To take advantage of them, you need very long and detailed prompts, they take a long time to generate and drop huge chunks of code on your head. What it generates is usually wrong due to the combination of sloppy or ambiguous requirements by me, model weaknesses, and agent issues. So I need to take a good chunk of time to actually understand what it made, and fix it.

The iteration time is longer, I have less control over what it's doing, which means I spend many minutes of crafting elaborate prompts, reading the convoluted and large output, figuring out what's wrong with it, either fixing it by hand, or modifying my prompt, rinse and repeat.

TLDR: Agents and reasoning models generate 10x as much code, that you have to spend 10x time reviewing and 10x as much time crafting a good prompt.

In theory it would come out as a wash, in practice, it's worse since the super-productive tight AI iteration cycle is gone.

Overall I haven't found these thinking models to be that good for coding, other than the initial project setup and scaffolding.

timrichard 7 months ago

I think you’re absolutely right and I’ve come to the same conclusion and workflow.
I work on one file at a time in Ask mode, not Composer/Agent. Review every change, and insist on revisions for anything that seems off. Stay in control of the process, and write manually whenever it would be quicker. I won’t accept code I don’t understand, so when exploring new domains I’ll go back with as many questions as necessary to get into the details.
I think Cursor started off this way as a productivity tool for developers, but a lot of Composer/Agent features were added along the way as it became very popular with Vibe Coders. There are inherent risks with non-coders copypasting a load of code they don’t understand, so I see this use case as okay for disposable software, or perhaps UI concept prototypes. But for things that matter and need to be maintained, I think your approach is spot on.
- _heimdall 7 months ago
  
  Have you found that this still saves you time overall? Or do you spent a similar amount of time acting as a code reviewer rather than coding it yourself?
  - timrichard 7 months ago
    
    Yes, I think so. Often it doesn’t take much more than a glance for simpler edits.
theshrike79 7 months ago

Do you have any Cursor rules defined? Those tend to control its habit of trying to go off the rails and solve 42 problems at once instead of just the one.

yard2010 7 months ago

How can I stop Cursor from sending .env files with secrets as plain text? Nothing I tried from the docs works.

M4v3R 7 months ago

This is a huge issue that was already raised on their forums and it's very surprising they didn't address it yet.
[0] https://forum.cursor.com/t/environment-secrets-and-code-secu...
- timrichard 7 months ago
  
  I have been adding .env files to .cursorignore so far.
  I can see from that thread that the approach hasn’t been perfect, but it seems that the last two releases have tried to address that :
  “0.46.x : .cursorignore now blocks files from being added in chat or sent up for tab completions, in addition to ignoring them from indexing.”
- mexicocitinluez 7 months ago
  
  lol "move fast and break stuff....like really, really break stuff. i mean, break it so bad you'll probably cause people to lose their jobs and livelihoods"

rhodescolossus 7 months ago

I've tried Cursor a couple of times but my complain is always the same: why forking VS Code when all this functionality could just be an extension, same as Copilot does?

Some VSCode extensions don't work, you need to redo all your configuration, add all your workspaces... and the gain vs Copilot is not that high

andrewl-hn 7 months ago

> why forking VS Code when all this functionality could just be an extension, same as Copilot does?
Have you programmed extensions for VSCode before? While it seems like a fairy extensible system overall, the editor component in particular is very restrictive. You can add text (that's what extensions like ErrorLens and GitLens are doing), inlay hints, and on-hover popup overlays (those can only trigger on words, and not on punctuation). What Cursor does: the automatic diff-like views of AI suggestions with graphic outlines, floating buttons, and whatnot right on top of the text editing view - is not possible in vanilla VSCode.
This was originally driven by necessity of tighter control over editor performance. In its early days VSCode was competing with Atom - another extensible JS-powered editor from GitHub, and while Atom had an early lead due to larger extensions catalog VSCode ultimately won the race because they manged to maintain lower latency of their text editor component. Nowadays they still don't want to introduce extra extension points to it, because newer faster editors pop out all the time, too.
- satvikpendem 7 months ago
  
  Now the Atom (and Electron, hence the name) creators are working Zed, which is even faster than VSCode, albeit not as extensible.
frereubu 7 months ago

> and the gain vs Copilot is not that high
I think that's (at least part of) your answer. More friction to move back from an entirely separate app rather than disabling an extension.

DaveMcMartin 7 months ago

For those of you who, like me, use Neovim, you can achieve "cursor at home" by using a plugin like Avante.nvim or CodeCompanion. You can configure it to suit your preferences.

Just sharing this because I think some might find it useful.

timothygold 7 months ago

I made a quick prototype to demonstrate what I think A.I code assistance should be..

https://github.com/hibernatus-hacker/ai-hedgehog

This is a simple code assistant that doesn't get in your way and makes sure you are coding (not losing your ability to program).

You configure a replicate API token from replicate... install the tool and point it at your code base.

When you save a file it asks the LLM for advise and feedback on the file as a "senior developer".

Run this along side your favorite editor to get feedback from an LLM as your working on (open source code nothing you don't want third parties to see).

You are still programming and using your brain but you have some feedback when you save files.

The feedback is less computationally expensive or fraught with difficulty than actually getting code from LLM's so it should work with much less powerful models.

It would be nice if there was a search built in so it could search for useful documentation for you.

9dev 7 months ago

It sounds very interesting, I would suggest adding some kind of demo or video on the readme; I think i know how it works, but it would be nice to see Hedgehog in action.
I’m going to try it tomorrow!

gregwebs 7 months ago

AI blows me away when asked to write greenfield code. It can get a complex task using hundreds of lines of code right on the first try or perhaps it needs a second try on the prompt and an additional tweak of the output code.

As things move from prototype to production ready the productivity starts to become a wash for me.

AI doesn’t do a good job organizing the code and keeping it DRY. Then it’s not easy for it to make those refactorings later. AI is good at writing code that isn’t inherently defective but if there is complexity in the code it will introduce bugs in its changes.

I use Continue for small additions and tab completions and Claude for large changes. The tab completions are a small productivity boost.

Nice to see these tips- I will start experimenting with prompts to produce better code.

mrlowlevel 7 months ago

Do any of these tools use the rich information from the AST to pull in context? Coupled with semantic search for entry points into the AST, it feels like you could do a lot…

zarathustreal 7 months ago

Don’t they all do this? Surely they’re not just doing naive text, n-gram, regex, embeddings, etc, right?
- marxplank 7 months ago
  
  they are just a language model generating text which happens to be code, ASTs are not used afaik

stared 7 months ago

Other useful things I've discovered:

- Push for DRY principles ("make code concise," "ensure good design").

- Swap models strategically; sometimes it's beneficial to design with one model and implement with another. For example, use DeepSeek R1 for planning and Claude 3.5 (or 3.7) for execution. GPT-4.5 excels at solving complex problems that other models struggle with, but it's expensive. - Insist on proper typing; clear, well-typed code improves autocompletion and static analysis.

- Certain models, particularly Claude 3.7, overly favor nested conditionals and defensive programming. They frequently introduce nullable arguments or union types unnecessarily. To mitigate this, keep function signatures as simple and clean as possible, and validate inputs once at the entry point rather than repeatedly in deeper layers.

- Emphasize proper exception handling. Some models (again, notably Claude 3.7) have a habit of wrapping everything in extensive try/catch blocks, resulting in nested and hard-to-debug code reminiscent of legacy JavaScript, where undefined values silently pass through multiple abstraction layers. Allowing code to fail explicitly is a blessing for debugging purposes; masking errors is like replacing a fuse with a nail.

stared 7 months ago

Some additional thoughts on GPT-4.5: it provides BFK-9k experience - eats e̶n̶e̶r̶g̶y̶ ̶c̶e̶l̶l̶s̶ budget ($2 per call!) like there is no tomorrow, but removes bugs with a blast.
In my experience, the gap between Claude 3.7 and GPT-4.5 is substantial. Claude 3.7 behaves like an overzealous intern on stimulants. It delivers results but often includes unwanted code changes, resulting in spaghetti code with deeply nested conditionals and redundant null checks. Although initial results might appear functional, the resulting technical debt makes subsequent modifications increasingly difficult, often leaving the codebase in disarray. GPT-4.5 behaves more like a mid-level developer, thoughtfully applying good programming patterns.
Unfortunately, the cost difference is significant. For practical purposes, I typically combine models. GPT-4.5 is generally reserved for planning, complex bug fixes, and code refinement or refactoring.
In my experience, GPT-4.5 consistently outperforms thinking models like o1. Occasionally, I'll use o3-mini or DeepSeek R1, but GPT-4.5 tends to be noticeably superior (at least, on average). Of course, effectiveness depends heavily on prompts and specific problems. GPT-4.5 often possesses direct knowledge about particular libraries (even without web searching), whereas o3-mini frequently struggles without additional context.
- kikimora 7 months ago
  
  Wouldn’t it be easier instead of juggling with models and their quirks to just write the code the old way?
  - stared 7 months ago
    
    Depends.
    Sometimes I could solve in 15 mins, a bug I had been chasing for days. In other cases, it is simpler to write codes by hand - as AI either does not solve a problem (even a simple one), or does, but at a cost of tech debt - or it takes longer than doing things manually.
    AI is just one more tool in our arsenal. It is up to us to decide when to use them. Just because we have a hammer does not mean we need to use it for screws.
    > Wouldn’t it be easier instead of juggling with [something] and their quirks to just write the code the old way?
    This phrase, when taken religiously, would keep us writing purely in assembly - as there is always "why this new language", "why this framework", "why LLMs".
    
    kikimora 7 months ago
    
    Most tools we have been using are not probabilistic. I use syntax highlighting, autocompletion, refactoring because they work close to 100% of the time. So my question really is if using a tool that works some % of the time is worth it.

mattwad 7 months ago

Everyone posting should be considerate of your language/stack. It's pretty likely that Cursor doesn't work equally for every language. I'm working on a Next.js/Typescript/Solidity monorepo with multiple apps and packages and it handles pretty much anything I throw at it. I know I can squeeze more out because I have only been really using it heavily for the past month or so.

atoav 7 months ago

I have been using AI coding for a while now and the issues I keep having is the following:

- LLM keeps forgetting/omitting parts of the code

- LLM keeps changing unrelated parts of the code

- LLM does not output correctly typed code (with Rust this can feel like throwing mud at a wall and see what sticks, in the end you're faster on your own)

- LLM flip-flops back and forth between two equally wrong answers when asked about a particularly (from the perspective of the LLM) to answer problem

In the end the main thing any AI coding tool will have to solve, is how to get the human in front of the LLM to trust that the output does what it does without breaking other things.

But of course LLMs are already crazy good at whst they do. I just wonder how people who have no idea what they are doing will be able to handle that power.

fifticon 7 months ago

I've had colleagues like that since the 1990's :-) :-( But yes, we would like AIs to be better than Kevin..

ookblah 7 months ago

parts of the article are spot on. after the magic has worn off i find it's best to literally treat it like another person. would you blindly merge code from someone else or huge swaths of features? no. i have to review every single piece of code, because later on when there's a bug or new feature you have to have that understanding.

another huge thing for me has been to scaffold a complex feature just to see what it would do. just start out with literal garbage and an idea and as long as it works you can start to see if something is going to pan out or not. then tear it down and do it again with those new assumptions you learned. keep doing it until you have a clear direction.

or sometimes my brain just needs to take a break and i'll work on boilerplate stuff that i've been meaning to do or small refactors.

divan 7 months ago

How does the current state of Cursor agentic workflow compare to Windsurf Editor?

I've been using Windsurf since it was released, and back then, it was so ahead of Cursor it's not even funny. Windsurf feels like it's trained on good programming practices (check usage of the function in other parts of the project for consistency, double checking for errors after changes made, etc). It's also surprisingly fast (it can "search" the 5k files codebase in, like, 2 seconds. It even asked me once to copy and paste output from Chrome DevTools because it suspected that my interpretation of the result was not accurate (and it was right).

The only thing I truly wish is to have the same experience with locally running models. Perhaps Mac Studio 512GB will deliver :)

jjude 7 months ago

I too liked windsurf better than cursor until ...
I asked it to refactor an authenticatedfetch block of code. It went on a loop exhausting 15 credits (https://bsky.app/profile/jjude.com/post/3ljuhrxs3442k).
- divan 7 months ago
  
  To be honest, I switched from Cursor to Windsurf precisely because of how much less of a credits it uses. Even using daily, I couldn't even remotely hit the limits of the credits in Windsurf. Well, initially they didn't even show how many credits I'm using :), now it's more visible, but still for 10$ per month I still can't hit the limits and I'm not restricting myself (not abusing either).
- throwaway314155 7 months ago
  
  How much does a credit cost in USD?
  - divan 7 months ago
    
    It's a fixed price of 10$ per month with 500/1500 credits for "premium models" (claude/etc), and unlimited for their own base model.

austin-cheney 7 months ago

With so many caveats, so many exceptions, so many rules, so much manual validation, and so little trust why bother? It sounds like an employee pending termination.

hakaneskici 7 months ago

What is your mental model when coding with Cursor?

When I code with with AI assistance; I "think" differently and noticed that I have more memory bandwidth to think about the big picture rather than the details.

With AI assistance, I can keep the entire program logic in my head; otherwise I have to do expensive context switching between the main components of the program/system.

How are you "thinking" when typing prompts vs typing actual code?

trash_cat 7 months ago

> And then at the top of the file, just write some text about what the project is about. If you have a particular file structure and way of organising code that is great to put in as well.

By asking the AI to generate a context.md file, you get an automatically structured overview of the project, including its purpose, file organization, and key components. This makes it easier to onboard new contributors, including other LLMs.

factsaresacred 7 months ago

Too bad they removed the ability to use Chat (rebranded as Ask) with your own API keys in version 0.47. Now every feature requires a subscription.

Natural for Cursor to nudge users towards their paid plans, but why provide the ability to use your own API keys in the first place if you're going to make them useless later?

siva7 7 months ago

Oh no didn't realise that. I had a high opinion of this team but it starts changing now...

HugoDias 7 months ago

I saw this post on the first page a few minutes ago (published 5 hours ago), but it quickly dropped to the 5th page. Given its comments and points, that seems odd. I had to search to find it again. Any idea why?

flippyhead 7 months ago

Note that the latest update (0.47.x) made this useful change:

Rules: Allow nested .cursor/rules directories and improved UX to make it clearer when rules are being applied.

This has made things a lot easier in my monorepos.

ThinkBeat 7 months ago

What programming languages do you primarily use ? I feel that knowing what programming languages a llm is best at is valuable but often not directly apparent.

TheAnkurTyagi 7 months ago

nice also you can use project-specific structure and markdown files to ensure the AI organizes content correctly for your use case. we are using it on 800k lines of golang and it works well. https://getstream.io/blog/cursor-ai-large-projects/

Jayakumark 7 months ago

Will you be able to share those ai_*.md files ?

pestkranker 7 months ago

Is there an equivalent to cursorrules and copilot-instructions for the Jetbrains IDEs (Rider) + GitHub Copilot extension?

askonomm 7 months ago

So, if I liked being a manager more than a developer, I'd use Cursor, and lean in entirely on AI?

lonelyasacloud 7 months ago

Yes; it can be used in agentic mode and along with the joys it also has a few of the frustrations that would be familiar if have managed human devs.
TingPing 7 months ago

If you don’t understand what it outputs then it’s just random garbage.

ZeroTalent 7 months ago

Cursor is not SOTA. It's just popular. Look into alternatives. CLine, Augment Code, etc.

eleumik 7 months ago

How much is the price of a kilogram of code in the us ?

flipgimble 7 months ago

Cursor overwrites the “code” command line shortcut/alias that’s normally set by VS Code. It does this on every update with no setting to disable this behavior. There are numbers of forum threads asking about manual solutions. This seems like a deliberately anti-user feature meant to get their usage numbers up at all costs. This small thing makes me not trust the decision making process at Cursor won’t sell me out as a user.

throwaway314155 7 months ago

This is the primary reason I uninstalled Cursor and subsequently realized that, hey, VS Code has most of these features now.
What in the hell were they thinking?!

dimgl 7 months ago

The new Cursor update (0.47) is cursed. They got rid of codebase searching (WTF?) and the agent is noticeably worse, even when using Sonnet 3.5.

I'm really shocked, actually. This might push me to look at competitors.

timothygold 7 months ago

I tried cursor for a day or two and then asked for a refund... here's why:

* It has terrible support for Elixir (my fav language) because the models are only really trained on python.

* Terrible clunky interface... it would be nice if you didn't have to click around, do modifier ctrl + Y stuff ALL the time.

* The code generated is still riddled with errors or naff (apart from boiler plate)... so I am still * prompt engineering * the crap out of it.. which I'm good at but I can prompt engineer using phind.com...

* The fact that the code is largely broken first time and they still haven't really fixed the context window problem means you have to copy paste error codes back into it.. defeating the purpose of an in integrated IDE imo.

* The free demo mode stops working after generating one function... if I had been given more time to evaluate it fully I would never have signed up. I signed up to see if it was any good.. which it isn't.

quotz 7 months ago

Cline is much better

cheema33 7 months ago

Why not list the reasons why you like it over Cursor?
- quotz 7 months ago
  
  I was in a hurry and hoped others would continue the thread. Cline is open source, pay as you go, using your own Openrouter API keys. You can use any model you like. Its more expensive than Cursor of course, depending on usage and models.
  - omnizone 7 months ago
    
    What are you looking to build with cline?

kobe_bryant 7 months ago

I was trying to figure out what he does and his website proudly states at the very top “ No templates, no no-code, no AI slop - just great sites built to grow.”. interesting!

DiabloD3 7 months ago

I'm sorry, but isn't Cursor just an editor? Maybe an editor shouldn't actually have garbage parts to avoid?

Why not just use an editor that is focused on coding, and then just not use an LLM at all? Less fighting the tooling, more getting your job done with less long term landmines.

There are a lot of editors, and many of them even have native or semi-native LLM support now. Pick one.

Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.

Like, a 7900XTX runs you about $1000. You probably already own a GPU that cost more in your gaming rig.

krapht 7 months ago

> Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.
???
Deepseek R1 doesn't run locally unless you program on a dual socket server with 1 TB of RAM. Or enough cash to have a cabinet of GPUs. The trend for state-of-the-art LLMs is to get bigger over time, not smaller.
Look, I've played with llava and llama locally too, but the benchmarked performance is nowhere near what you can get from the larger cloud providers who can serve hundred-million+ parameter models without quantization.
- DiabloD3 7 months ago
  
  You wouldn't use full fledged R1 for coding. There are distilled models using R1 for coding that get you most of the way there. R1 also doesn't take 1TB of RAM, go use read Unsloth's writeup on how to reduce model size without reducing quality (they got it to fit into 131GB): https://unsloth.ai/blog/deepseekr1-dynamic tl;dr parameter count is where the statistical model lives or dies, not weight precision; you can't blindly shrink every weight, and tooling is learning how to not butcher models.
  Also, performance between cloud-ran models and models I've ran locally with llama.cpp seem to be actually pretty similar. Are you sure your model didn't fit into your VRAM, or something else may have been misconfigured? Not fitting into VRAM slows everything to a halt. All the coder models that are worth looking at fit into 24GB cards in their full sized variants with the right quantization.
  - satvikpendem 7 months ago
    
    Distilled "DeepSeek" models are not actually DeepSeek and should not be referred to as such.
    
    DiabloD3 7 months ago
    
    No one said they were. They're distilled using the original model and the same weights that match the ones in R1. Its ostensibly the original, but better.
    There are also fused models such as https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Code... that also seem to perform interestingly.
    
    satvikpendem 7 months ago
    
    They are not better, they are strictly worse in every way, and the performance characteristics show such degradation. There is a difference between distilling and quantizing, as while the latter does show some degradation too, it is not to the extent of distilled models and at least it's still the original model.
    
    DiabloD3 7 months ago
    
    Depends how you define "worse in every way". With my own personal testing, DeepSeek's distillations have been able to do the task when the original upstream model either couldn't, or was marginally worse yet.
    You're preaching to the anti-choir on this, though: I do not think LLMs are ready for use yet. Maybe another few years, maybe another few decades, we'll find out I guess, but what we have today sure as hell isn't it.
zild3d 7 months ago

> All the cutting edge models are open weight licensed, and run locally.
No? from https://lmarena.ai/ coding:
...
Rank* ... Model ... Score ... Org ... License
1 ... Grok-3-Preview-02-24 ... 1414 ... xAI ... Proprietary
1 ... GPT-4.5-Preview ... 1413 ... OpenAI ... Proprietary
3 ... Gemini-2.0-Pro-Exp-02-05 ... 1378 ... Google ... Proprietary
3 ... o3-mini-high ... 1369 ... OpenAI ... Proprietary
3 ... DeepSeek-R1 ... 1369 ... DeepSeek ... MIT
3 ... ChatGPT-4o-latest (2025-01-29) ... 1367 ... OpenAI ... Proprietary
3 ... Gemini-2.0-Flash-Thinking-Exp ... 1366 ... Google ... Proprietary
3 ... o1-2024-12-17 ... 1359 ... OpenAI ... Proprietary
3 ... o3-mini ... 1353 ... OpenAI ... Proprietary
4 ... o1-preview ... 1355 ... OpenAI ... Proprietary
4 ... Gemini-2.0-Flash-001 ... 1354 ... Google ... Proprietary
4 ... o1-mini ... 1353 ... OpenAI ... Proprietary
4 ... Claude 3.7 Sonnet ... 1350 ... Anthropic ... Proprietary
- DiabloD3 7 months ago
  
  Yes, I'm aware of various rankings. Try all of those models on something that isn't commonly used on a benchmark, and you'll notice that a lot of the proprietary models have trouble actually producing statistically relevant results.
  The only one that I've come across that makes me think LLMs will maybe be useful someday is Deepseek R1 and the redistillations based on it.
  I've seen HN's fascination with OpenAI's products, and I can't understand why. Even O1 and O3, they're always too little too late, somebody else already is doing something better and throwing it into a HF repo. Must be the Silicon Valley RDF at work.

hannah_creator 7 months ago

Very useful!!!

T3RMINATED 7 months ago

[dead]

r_singh 7 months ago

Just use Cline, it beats Cursor hollow — saves me like hours per day