I went hands-on with ChatGPT Codex and the vibe was not good - this is what occurred

Aleksandra Konoplia/Getty Photographs

ZDNET’s key takeaways

ChatGPT Codex wrote code and saved me time.
It additionally created a severe bug, however it was in a position to get better.
Codex continues to be based mostly on the GPT-4 LLM structure.

Nicely, vibe coding this isn’t. I discovered the expertise to be sluggish, cumbersome, tense, and incomplete. However it all labored out ultimately.

ChatGPT Codex is ChatGPT’s agentic device devoted to code writing and modification. It will possibly entry your GitHub repository, make adjustments, and difficulty pull requests. You may then assessment the outcomes and determine whether or not or to not incorporate them.

Additionally: The best way to transfer your codebase into GitHub for evaluation by ChatGPT Deep Analysis – and why you must

My major improvement undertaking is a PHP and JavaScript-based WordPress plugin for website safety. There is a fundamental plugin obtainable totally free, and a few add-on plugins that improve the capabilities of the core plugin. My non-public improvement repo accommodates all of this, in addition to some upkeep plugins I depend on for person help.

This repo accommodates 431 recordsdata. That is the primary time I’ve tried to get an AI to work throughout my total ecosystem of plugins in a personal repository. I beforehand used Jules so as to add a characteristic to the core plugin, however as a result of it solely had entry to the core plugin’s open supply repository, it could not keep in mind your entire ecosystem of merchandise.

Earlier final week, I made a decision to offer ChatGPT Codex a run at my code. Then this occurred.

GPT-5 launched

On Thursday, GPT-5 slammed into the AI world like a freight practice. Initially, OpenAI tried to power everybody to make use of the brand new mannequin. Subsequently, they added legacy mannequin help when a lot of their clients went ballistic.

I ran GPT-5 in opposition to my set of programming checks, and it failed half of them. So, I used to be notably interested by whether or not Codex nonetheless supported the GPT-4 structure or would power builders into GPT-5.

Nonetheless, once I queried Codex 5 days after GPT-5 launched, the AI responded that it was nonetheless based mostly on “OpenAl’s GPT-4 structure.”

Screenshot by David Gewirtz/ZDNET

I took two issues from that:

OpenAI is not prepared to maneuver Codex coding to GPT-5 (which, recall, failed half my checks).
The outcomes, conclusions, and screenshots I took of my Codex checks are nonetheless legitimate, since Codex continues to be based mostly on GPT-4.

With that, right here is the results of my still-very-much-not-GPT-5 have a look at ChatGPT Codex.

Getting began

My first step was asking ChatGPT Codex to look at the codebase. I used the Ask mode of Codex, which does evaluation, however would not truly change any code.

Screenshot by David Gewirtz/ZDNET

I hoped for one thing as deep and complete because the one I acquired from ChatGPT Deep Analysis a number of months in the past, however as a substitute, I acquired a a lot much less full evaluation.

Screenshot by David Gewirtz/ZDNET

I discovered a simpler method was to ask Codex to do a fast safety audit and let me know if there have been any points. This is how I prompted it.

Determine any severe safety issues. Ignore plugins Anybody With Hyperlink, License Fixer, and Settings Nuker. Anybody With Hyperlink is within the very early phases of coding, and isn’t prepared for code assessment. License Fixer and Settings Nuker are specialty plugins that don’t want a safety audit.

Codex recognized three fundamental areas for enchancment.

Screenshot by David Gewirtz/ZDNET

All three areas had been legitimate, though I’m not ready to change the serialization information construction right now, as a result of I am saving that for a complete preferences overhaul. The $_POST criticism is managed, however with a distinct method than Codex seen.

Additionally: One of the best AI for coding in 2025 (and what to not use)

The third space — the nonce and cross-site request forgery (CSRF) threat — was one thing value altering straight away. Whereas entry to the person interface for the plugin is assumed to be decided by login position, the plugins themselves do not explicitly test that the individual submitting the plugin settings for motion is allowed to take action.

That is what I made a decision to ask Codex to repair.

Fixing the code

Subsequent up, I instructed Codex to make fixes within the code. I modified the setting from Ask mode to Code mode so the AI would truly try adjustments. As with ChatGPT Agent, Codex spins up a digital terminal to do a few of its work.

Screenshot by David Gewirtz/ZDNET

When the method accomplished, Codex confirmed a diff (the distinction between authentic and to-be-modified code).

Screenshot by David Gewirtz/ZDNET

I used to be heartened to see that the adjustments had been fairly surgical. Codex did not attempt to rewrite massive sections of the plugin; it simply modified the small areas that wanted enchancment.

In a number of areas, it dug in and adjusted a number of extra strains, however these adjustments had been nonetheless fairly particular to the unique immediate.

At one level, I used to be curious to know why it added a brand new foreach loop to iterate over an array, so I requested.

Screenshot by David Gewirtz/ZDNET

As you’ll be able to see above, I obtained again a reasonably clear response on its reasoning. It made sense, so I moved on, persevering with to assessment Codex’s proposed adjustments.

All instructed, Codex proposed making adjustments to 9 separate recordsdata. As soon as I used to be happy with the adjustments, I clicked Create PR. That creates a pull request, which is how any GitHub person suggests adjustments to a codebase. As soon as the PR is created, the undertaking proprietor (me, on this case) has the choice to approve these adjustments, which provides them into the precise code.

It is a good mechanism, and Codex does a clear job of working inside GitHub’s setting.

Screenshot by David Gewirtz/ZDNET

As soon as I used to be satisfied the adjustments had been good, I merged Codex’s work again into the primary codebase.

Screenshot by David Gewirtz/ZDNET

Houston, we now have an issue

I introduced the adjustments down from GitHub to my take a look at machine and tried to run the now-modified plugin. Anticipate it…

Screenshot by David Gewirtz/ZDNET

Yeah. That is not what’s alleged to occur. To be honest, I’ve generated my very own share of error screens identical to that, so I am unable to actually get offended on the AI.

As an alternative, I took a screenshot of the error and handed it to Codex, together with a immediate telling Codex, “Selective Content material plugin now fails after making adjustments you urged. Listed below are the errors.”

It took the AI three minutes to recommend a repair, which it introduced to me in a brand new diff.

Screenshot by David Gewirtz/ZDNET

I merged that grow to be the codebase, as soon as once more introduced it all the way down to my take a look at server, and it labored. Disaster averted.

No vibe, no circulate

Once I’m not in a rush and I’ve the time, coding can present a really nice mind-set. I get right into a type of circulate with the language, the machine, and what looks like a connection between my fingers and the pc’s CPU. Not solely is it numerous enjoyable, however it may also be emotionally transcendent.

Working with ChatGPT Codex was not enjoyable. It wasn’t hateful. It simply wasn’t enjoyable. It felt extra like exchanging emails with a very recalcitrant contractor than having a gathering of the minds with a coding buddy.

Additionally: The best way to use GPT-5 in VS Code with GitHub Copilot

Codex supplied its responses in about 10 or quarter-hour, whereas the identical code would in all probability have taken me a number of hours.

Would I’ve created the identical bug as Codex? In all probability not. As a part of the method of pondering by means of that algorithm, I almost definitely would have averted the error Codex made. However I undoubtedly would have created a number of extra bugs based mostly on mistyping or syntax errors.

To be honest, had I launched the identical bug as Codex did, it might have taken me significantly longer than three minutes to search out and repair it. Add one other hour or so a minimum of.

So Codex did the job, however I wasn’t in circulate. Usually, once I code and I am inside a specific file or subsystem, I do numerous work in that space. It is like cleansing day. When you’re cleansing one a part of the toilet, you may as properly clear all of it.

However Codex clearly works greatest with small, easy directions. Give it one class of change, and work by means of that one change earlier than introducing new elements. Like I mentioned, it does work and it’s a useful gizmo. However utilizing it undoubtedly felt like extra of a chore than programming usually does, although it saved me numerous time.

Additionally: Google’s Jules AI coding agent constructed a brand new characteristic I might truly ship – whereas I made espresso

I haven’t got tangible take a look at outcomes, however after testing Google’s Jules in Could and ChatGPT’s Codex now, I get the impression that Jules is ready to get a deeper understanding of the code. At this level, I am unable to actually help that assertion with numerous information; it is simply an impression.

I’ll strive operating one other undertaking by means of Jules. It will likely be attention-grabbing to see if Codex adjustments a lot as soon as OpenAI feels secure sufficient to include GPT-5. Let’s understand that OpenAI eats its personal pet food with Codex, that means it makes use of Codex to construct its code. They may have seen the identical iffy outcomes I discovered in my checks. They could be ready till GPT-5 has baked for a bit longer.

Have you ever tried utilizing AI coding instruments like ChatGPT Codex or Google’s Jules in your improvement workflow? What sorts of duties did you throw at them? How properly did they carry out? Did you are feeling like the method helped you’re employed extra effectively? Did it sluggish you down and take you out of your coding circulate?

Do you like giving your instruments small, surgical jobs, or are you in search of an agent that may deal with big-picture structure and reasoning? Tell us within the feedback under.

You may comply with my day-to-day undertaking updates on social media. Make sure to subscribe to my weekly replace e-newsletter, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Is Instagram Map displaying your location? The way to verify and switch it off

Do not fall for AI-powered disinformation assaults on-line – here is methods to keep sharp

Tribit Stormbox Mini+ Evaluate: A Nice Low cost Moveable Speaker

Our Picks

The Northern Lights Forecast within the U.Ok.

‘Inside Out 2’ Eyes Historic Animated Pic Second Weekend at Field Workplace

NFL star praises Sophie Cunningham for sticking up for Caitlin Clark

Most Popular

China, Brazil might be fashions of ‘self-reliance’ for International South, Xi says | Enterprise and Economic system Information

At Meta, Millions of Underage Users Were an ‘Open Secret,’ States Say

Elon Musk Says All Money Raised On X From Israel-Gaza News Will Go to Hospitals in Israel and Gaza

I went hands-on with ChatGPT Codex and the vibe was not good – this is what occurred

ZDNET’s key takeaways

GPT-5 launched

Getting began

Fixing the code

Houston, we now have an issue

No vibe, no circulate

Related Posts