Oh, don't get me started. LOL
SonarQube quality gate (rightfully) won't pass two of my CI pipelines because we're using Task.Wait, which is deprecated. Fine. So I tell our inhouse, sandboxed LLM to write stuff that used await instead.
Sure, it knocked out the immediate function or method call, but what about all the asynch parent classes above it? Can you handle static, non instantiated classes? What about the calling methods?
Brain freeze.
Not QUITE sure why the previous developers used asynch anyways, there is no simultaneous code that needed to be executed. Perhaps they were showing off, or maybe it was a demonstration of the truism, 'If all you have is a hammer, everything looks like nails.'
Anyways, yeah, our inhouse LLM that is leveraging ChatGPT 4.1 could not do it. Not for the whole calling stack, anyways, and when I tried to break it up, I got AI Slop.
But curiously, this entity did a far better job than Codeium/Windsurf. THAT entity simply said, "Hey, yeah, that's a tough problem. Here's how you would go about solving it."
Wait, what? THAT'S YOUR JOB, you lazy sack of 1000-parameter multidimensional vectors!
I haven't tried claude or copilot yet, so I cannot speak to those.
The only benefit Codeium/Windsurf brags on is that it can context-wrap our repos, and opposed to our inhouse LLM (which sports 2500 context tokens) Codeium/Windsurf brings 20,000.
Doesn't use these features very well, yet, but reportedly they are there.
PS: If your codegen is quitting when it runs out of token space, there is a magic command: "Continue", or "Continue from this code line (insert code line).
Lots of online documentation is no longer accurate, so AI does help in that it can usually generate working calling code or CSS blocks. It has been very helpful for that since online Q&A systems like Stack Overflow is a joke anymore.
I can't speak to claude, never used it. CoPilot though, I've used and it has its own limitations.
When you enable ChatGPT with CoPilot, it becomes a self-affirming mess that leverages its own result sets to validate ... its own result sets.
Look at Grok. It has the ability to generate fairly decent code for moderate tasks and it does a decent job at dissecting code and fixing it.
It's limited to copying and pasting code into its chat window to do that however it does a better job than any other AI I've found thus far.
YMMV of course. You're deeper into app stuff than I've been in quite some time.