ZDNET’s key takeaways
- OpenAI’s new GPT-5 flagship failed half of my programming exams.
- Earlier OpenAI releases have had nearly excellent outcomes.
- Now that OpenAI has enabled fallbacks to different LLMs, there are alternatives.
So GPT-5 occurred. It is out. It is launched. It is the speak of the digital city. And it is obtained some issues. I am not gonna bury the lede. GPT-5 has failed half of my programming exams. That is the worst that OpenAI’s flagship LLM has ever completed on my fastidiously designed exams.
Additionally: The very best AI for coding in 2025 (and what to not use)
Earlier than I get into the small print, let’s take a second to debate one different little function that is additionally a bit wonky. Try the brand new Edit button on the highest of the code dumps it generates.
Clicking the Edit button takes you into a pleasant little code editor. Right here, I changed the Writer discipline, proper in ChatGPT’s outcomes.
That appeared good, nevertheless it finally proved futile. After I closed the editor, it requested me if I wished to save lots of. I did. Then this unhelpful message confirmed up.
I by no means did get again to my unique session. I needed to submit my unique immediate once more, and let GPT-5 do its work a second time.
However wait. There’s extra. Let’s dig into my take a look at outcomes…
1. Writing a WordPress plugin
This was my very first take a look at of coding prowess for any AI. It is what gave me that first “the world is about to vary” feeling, and it was completed utilizing GPT-3.5.
Subsequent exams, utilizing the identical immediate however with completely different AI fashions, generated combined outcomes. Some AIs did nice, some did not. Some AIs, like these from Microsoft and Google, improved over time.
Additionally: How I take a look at an AI chatbot’s coding skill – and you’ll, too
ChatGPT’s mannequin has been the gold commonplace for this take a look at because the very starting. That makes the outcomes of GPT-5 all that rather more curious.
So, look, the precise coding with GPT-5 was partially profitable. GPT-5 generated a single block of code, which I pasted right into a file and was capable of run. It supplied the requisite UI.
After I pasted within the take a look at names, it dynamically up to date the road depend, though it described it as “Line to randomize” as an alternative of “Strains to randomize.”
However then, after I clicked Randomize, it did not. As a substitute, it redirected me to instruments.php. What?? ChatGPT has by no means had an issue with this take a look at, whether or not GPT-3.5, GPT-4, or GPT-4o. You imply to inform me that OpenAI’s much-anticipated GPT-5 is failing proper out of the gate? Ouch.
I then gave GPT-5 this immediate.
After I click on randomize, I am taken to http://testsite.native/wp-admin/instruments.php. I don’t get an inventory of randomized outcomes. Are you able to repair?
The end result was a line to patch. I am not thrilled with that strategy as a result of it requires the consumer to dig by means of code and to make no errors changing a line.
So, I requested GPT-5 for a full plugin. It gave me the complete textual content of the plugin to repeat and paste. This time, it labored.
This time, it did randomize the traces. When it encountered duplicates, it separated them from one another, because it was instructed. Lastly.
Additionally: I discovered 5 AI content material detectors that may accurately determine AI textual content 100% of the time
I am sorry, OpenAI. I’ve to fail you on this take a look at. You’ll have handed if the one error was not utilizing the plural of “line” when applicable. However the truth that it gave me again a non-working plugin on the primary attempt is fail territory, even when the AI did finally make it work on the second attempt.
Regardless of the way you spin it, this can be a step again.
2. Rewriting a string perform
This second take a look at is designed to rewrite a string perform to higher test for {dollars} and cents. The unique code that GPT-5 was requested to rewrite didn’t permit for cents (it solely checked for integers).
GPT-5 did positive with this take a look at. It did return a minimal end result as a result of it did not do any error checking. It did not test for non-string enter, further whitespace, 1000’s separators, or foreign money symbols.
However that is not what I requested for. I informed it to rewrite a perform, which itself didn’t have any error checking. GPT-5 did precisely what I requested with no embellishment. I am type of glad of that as a result of it does not know whether or not or not code previous to this routine already did that work.
GPT-5 handed this take a look at.
3. Discovering an annoying bug
This take a look at happened as a result of I used to be scuffling with a less-than-obvious bug in my code. With out going into the weeds about how the WordPress framework works, the apparent reply isn’t the best reply.
You want some pretty arcane data about how WordPress filters cross their data. This take a look at has been a stumbling block for various AI LLMs.
Additionally: Gen AI disillusionment looms, in accordance with Gartner’s 2025 Hype Cycle report
GPT-5, nonetheless, like GPT-4 and GPT-4o earlier than it, did perceive the issue. It articulated a transparent resolution.
GPT-5 handed this take a look at.
4. Writing a script
This take a look at asks the AI to include a reasonably obscure Mac scripting software referred to as Keyboard Maestro, in addition to Apple’s scripting language AppleScript, and Chrome scripting habits.
It is actually a take a look at of the attain of the AI when it comes to data, its understanding of how net pages are constructed, and the flexibility to write down code throughout three interlinked environments.
Fairly a couple of AIs have failed this take a look at, however the failure level is normally a lack of expertise about Keyboard Maestro. GPT-3.5 did not learn about Keyboard Maestro. However ChatGPT has been passing this take a look at since GPT-4. Till now.
The place ought to we begin? Nicely, the excellent news is that GPT-5 dealt with the Keyboard Maestro a part of the issue simply positive. However it obtained the coding so unsuitable that it even doubled down on its lack of information of how case works in AppleScript.
It really invented a property. That is a kind of instances the place an AI confidently presents a solution that’s utterly unsuitable.
Additionally: ChatGPT comes with character presets now – and different upgrades you might need missed
AppleScript is natively case-insensitive. In order for you AppleScript to concentrate to case, you must use a “contemplating case” block. So, this occurred.
The rationale the error message referred to the title of one in all my articles is as a result of that was the entrance window in Chrome. This perform checks the entrance window and does stuff primarily based on the title.
However misunderstanding how case works wasn’t the one AppleScript error GPT-5 generated. It additionally referenced a variable named searchTerm with out defining it. That is just about an error-creating apply in any programming language.
Fail, fail, fail, McFaildypants.
The web hath spoken
OpenAI appeared to undergo from the identical hubris that its AIs do. It confidently moved everybody to GPT-5 and burned the bridges again to GPT-4o. I am paying $200 a month for a ChatGPT Professional account. On Friday, I could not transfer again to GPT-4o for coding work. Neither might anybody else.
There was, nonetheless, only a tiny little bit of consumer pushback on the entire bridges burning factor. And by tiny, I imply the complete frickin’ web. So, by Saturday, ChatGPT had a brand new choice.
To get to this, go to your ChatGPT settings and activate “Present legacy fashions.” Then, because it has at all times been, simply drop down the mannequin menu and select the one you need. Observe: this selection is just obtainable to these on paid tiers. Should you’re utilizing ChatGPT without cost, you may take what you are given, and you may like it.
Ever because the complete generative AI factor kicked off at first of 2023, ChatGPT has been the gold commonplace of programming instruments, not less than in accordance with my LLM testing.
Additionally: Microsoft rolls out GPT-5 throughout its Copilot suite – this is the place you may discover it
Now? I am actually unsure. That is solely a day or so after GPT-5 has been launched, so its outcomes will in all probability get higher over time. However for now, I am sticking with GPT-4o for coding, though I do just like the deep reasoning capabilities in GPT-5.
What about you? Have you ever tried GPT-5 for programming duties but? Did it carry out higher or worse than earlier variations like GPT-4o or GPT-3.5? Had been you capable of get working code on the primary attempt, or GPT-4o did you must information it by means of fixes? Are you going to make use of GPT-5 for coding or persist with older fashions? Tell us within the feedback beneath.
You may observe my day-to-day venture updates on social media. You’ll want to subscribe to my weekly replace publication, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.
