Why ChatGPT Won’t Replace Coders Just Yet

Clive Thompson on 2023-03-21

The “bullshit” problem turns up in code, too

Photo by Joshua Reddekopp on Unsplash

Lately I’ve been seeing people using ChatGPT to write code. One guy posted on Twitter about how he used it to make a simple version of Pong “in under 60 seconds”, while another had it write a Python script to rename files.

I wanted to get in on the action, so I asked ChatGPT — the March 14 version — to make a simple to-do list web app. I started off with this request …

I’d like to make a simple to-do web app. Show me the code that would do the following: Display the text “My To-Do List” at the top, with a field beneath where I can type in to-do items. When I type in a new item and hit “enter”, the new item would appear in a list below the input field. When I click on any list in the item, it disappears.

Presto — it cranked out some simple HTML and Javascript, which did exactly what I asked for.

Over the next few minutes, I added a bunch more requests: I told it to add a button called “Save List” that, when you clicked it, would save the list locally on your browser. I told it a couple of Google fonts to use, and asked it for CSS styling to use those fonts. Then I had it add a bit more styling: To change the font size for smaller screens vs. bigger screens, to add bullet points before each item on the list, and to add a light gray dotted line around the input box. Each time I asked for a revision, ChatGPT would tell me where to cut and paste the new code, and what the code was doing.

The whole shebang took barely 15 minutes, and the final version is here, hosted on Glitch, so you can try it out. (You can also see the code here that ChatGPT generated.)

Yep, I paid my credit cards

It wasn’t quite perfect! ChatGPT’s first attempt to add bullet points didn’t work, and I had to ask again. Then, when I asked it to add font-size-changing styling, it seemed to forget about my earlier request to give the input box a gray outline … so that stuff vanished from the code. It’s also not clear this would have worked so well for a layperson. I’m a hobbyist coder, so I knew how to phrase my requests so they were precise.

Overall though, it did a good job, eh?

So, are coders screwed? Is ChatGPT the beginning of the Star Trek vision: We’ll just tell the computer what we want it to do?

The short answer is: Not right now, and probably not any time soon.

That’s because the types of coding problems at which ChatGPT seems to excel are common ones. If you ask it to do something that’s been done a ton of times before, then sure, it’ll do a very good job. To-do lists? Pong? These have been coded a bajillion times before, and they’re all online. OpenAI trained its models on all that existing code. And truthfully, there’s a lot of coding that is pretty repetitious. When coders are trying to write a function — take a list of addresses, format them all so each is on one line, then sort them by area code — they often start by googling to see if someone’s done this before. Since ChatGPT has hoovered up all that code online, it’s bound to be pretty good at quickly spitting out common answers to common problems.

But the thing is, writing software isn’t just about writing the basic algorithms for munging the data in the way you want it to be munged. A ton of it is about fiddling around the edges. For example, one’s software very often needs to “talk” to some other service online: How does the API for that other service work? As the user danwee posted on Hacker News this week …

I want to see GPT-4 dealing with this situation: - they: we need a new basic POST endpoint - us: cool, what does the api contract look like? URL? Query params? Payload? Response? Status code? - they: Not sure. Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db - us: ok, cool. Let me get in contact with them - … one week later… - company XXQ: we got this contract here: <contract_json> - us: thanks! We’ll work on this - … 2 days later… - us: umm, there’s something not specified in <contract_json>. What about this part here that says that… - … 2 days later… - company XXQ: ah sure, sorry we missed that part. It’s like this… - …and so on…

This stuff is maddeningly complex, and right now ChatGPT doesn’t seem equipped to do that stuff at all.

Here’s a simple example from my own coding experiments. Yesterday, I asked ChatGPT to create a version of my “Weird Old Book Finder”. That’s an app I made last year which takes any query you type and sends it to the API for Google Books. It asks for only pre-1927 books (which are in the public domain) then picks one random book from the results and sends it to you. I wrote the original app in a few evenings; it’s quite simple, code-wise.

But when I asked ChatGPT to redesign it, the code it produced didn’t work. (You can see it here, on Glitch.) Each request got a 403 error from the Google Books API. Sure, ChatGPT could quickly make a web form for typing in a query, but working with the Google Books API flummoxed it.

Is it possible that OpenAI could train its models on every existing API, in some more comprehensive and subtle way? Maybe? With advances in software and AI, I never say “never”.

But even if it could do that, that’s only the tip of the iceberg in software engineering. Frankly, the biggest part of making a product isn’t in the coding; it’s in helping your client figure out what the heck it is they need to do. This takes endless hours of talk, talk, and talk, before any line of code is written. (The same is true if you’re the client — i.e. if you’re creating a new app yourself.) Automating those conversations and that decision-making would be even harder than mastering the byzantine pathways of the world’s API’s.

This is why, if you read the whole thread at Hacker News about whether ChatGPT and AI will replace coders, the great majority of posters argue that a) it certainly won’t in the short and medium run, and b) more likely it’ll simply expand the output of the existing software developers.

I think this is probably correct. It’ll be kind of like the effect that Microsoft Word had on the generation of business documents. Back in the 70s and early 80s, word processing was an expensive affair — requiring highly trained word processors. Creating new digital documents was slow and labor-expensive. But once cheap desktop computers made it possible for anyone to create a good-looking business document, the overall volume of business documents simply exploded.

Me, I’ve been using Copilot for over a year now, and this is more or less how I use it: To help me quickly blast through boilerplate code so I can more quickly get to the tricky bits.

Photo by Portuguese Gravity on Unsplash

There’s a more subtle problem with ChatGPT’s code generation, which is that it suffers from ChatGPT’s general “bullshit” problem.

As I wrote back in December, ChatGPT is designed to answer a question — or complete a prompt — by producing strings of prose that seem suitable, and that are statistically likely to suit the conversation. It doesn’t know whether what it’s saying is true or not. Indeed, it can’t know whether the stuff it’s saying is true, because “knowing” something — possessing knowledge — is not just a matter of grasping context and predicting a likely-sounding response. Certainly, human intelligence doesn’t seem to work that way. (We certainly use prediction in our cognition! But our cognition is not solely prediction.)

So, as I wrote, ChatGPT is fluent at bullshitting. It doesn’t know anything. It’s just trying to write convincingly.

This is obviously a huge problem for relying on any facts you get from ChatGPT. It makes stuff up. (OpenAI calls it “hallucinating”, but “making things up” seems more accurate.) Just yesterday I asked it a factual question, and it gave me a citation to one scholarly paper that was a perfect answer — and two more that, near as I can tell, do not exist at all.

As it turns out, ChatGPT also appears to bullshit when it’s writing code, too.

Recently, Tyler Glaeil asked ChatGPT to write code that tackled some interesting algorithmic problems — like plotting a route through an obstacle-laden grid, or figuring out collision-detection for crescent-moon shapes …

With each case, ChatGPT produced something that kind of worked — but not quite. The key thing is that it didn’t give up; it didn’t say “no, I can’t really do this.” It just confidently spouted code and said, sure, I got this. But it didn’t.

And as Glaeil notes, this is where problems begin:

I think ChatGPT is just kind of bullshitting at this point. It doesn’t have an answer, and cannot think of one, so it’s just making shit up at this point. And like… its really good at making shit up though! Like that algorithm… the cases where it fails are subtle! You could easily publish that algorithm in a book and confuse people into thinking they somehow messed up the implementation when there’s bugs in the collision detection, because it certainly *sounds* like the type of algorithm that would solve this.

Glaeil isn’t the only one to find this, mind you. Wagner James Au checked whether it could write the scripting language inside Second Life — and discovered it produced stuff that looked plausible but didn’t work. It sometimes functions that don’t exist in the scripting language. Tanya Tsui tried to get ChatGPT to do some data work, and it created code that used the “gridify” attribute of the “geopandas” module … which doesn’t exist.

Now, the thing about bullshitting with code is that computers, unlike humans, quickly reject obvious bullshit. None of this bullshitted code actually runs!

Ah, but subtler bullshit? That’ll run! As I’ve found in using Copilot, it’s quite possible for the AI to make subtler errors; the code will run at first, but quickly collapse when it hits an edge case. So one needs to be pretty vigilant even in using AI as an assistant.

I’ll be interested to see how AI code-writing evolves in the months and years to come. It’s a damn interesting part of this new phase in artificial intelligence, and has a lot of true utility. But it shares some of the dangers that come with large language models that write in English.

(Enjoyed this one? Then find that “clap” button and start mashing; it can handle up to 50 clicks, per reader! Or you could ask ChatGPT to “write a Python script using Selenium that, when given a URL at Medium.com, navigates to that page and clicks the ‘clap’ button 50 times.” Could work!)

You might also enjoy my pay-what-you-want weekly newsletter “The Linkfest”, in which I curate the best stuff I’ve found online. “The opposite of doomscrolling.”

I’m a contributing writer for the New York Times Magazine, a columnist for Wired and Smithsonian magazines, and a regular contributor to Mother Jones. I’m also the author of Coders: The Making of a New Tribe and the Remaking of the World, and Smarter Than You Think: How Technology is Changing our Minds for the Better. I’m @pomeranian99 on Twitter and Instagram, and @clive@saturation.social on Mastodon.