Artificial Stupidity with GitHub Copilot

Hi! You might want to know that this post continues ideas from the following.

GitHub Copilot and Other Programming Doom from Jul 18, 2021, 7:17am

Hi! It looks like I have since continued, updated, or rethought this post in some ways, so you may want to look at this after you're done reading here.

Five Phases of AI Grief from Feb 26, 2023, 7:03am

A copilot

After months since writing about overblown worries about developer tools, I officially became a GitHub Copilot user. I promised that I’d write about it when I had some hands-on time with the system, so that became this post.

Update, 2022-07-05: I should mention that GitHub now charges for Copilot access, and Amazon has apparently launched a similar CodeWhisperer service. This has led to calls for a boycott, which don’t quite ring true to me. After all, these organizations have spent decades trying to explain that people can charge for Free Software, but not it makes them angry that GitHub will charge for Free Software. But they stayed silent (as far as I can remember) while people worried about the copyright implications, which I get to in this post.

Long story short, I continue to not believe that this product will end of the world.

Setup

We got off to an inauspicious start, with an activation process that they didn’t actually rig up correctly. The authentication code needs someone to add it manually.

It also seems computationally- or memory-intensive, for some reason, crashing Visual Studio Code—and nearly my laptop—twice in an hour, just because I asked for a sort routine.

Worse, the suggestions seem mediocre at best, often throwing in bogus code or acting indecisive about whether a variable needs to stay constant or not. Microsoft calls the product “Your AI pair programmer,” and that feels somewhat accurate, if you assume that you have the role of the senior developer in the relationship, the one who shows up to improve their partner, rather than the one undergoing improvement. Copilot acts more like the junior partner, who wants to prove that they can come up with something—anything—but then relies on the team to figure out what has gone wrong and how to solve the problem.

Copilot, Make a Sundial

Take an example, my first attempt, taking the marketing mostly at its word. I had the idea of a “clock” that would show a sundial shadow, so I provided the comment, “Return the position of the Sun in the sky, based on the date and time.” It tried to auto-fill my comment to suggest a date and time of birth, because I guess Copilot has gotten into astrology, but instead gave me an empty function to work with.

Now, I imagined this as a straightforward task, worthy of an AI pair programmer. The Sun’s position stands central to determining sunrise and sunset (critical to many fields), orienting solar energy collectors throughout the day, and other tasks. Plus, it really just takes some trigonometry: The time of day gives the angle, and the date gives the inclination of the circle.

I tried turning the tables on it, then, giving it the empty function, to see if it could work out the trigonometry.

This would have become a JavaScript project, since if it worked, I could add the graphics code—or let Copilot continue—and embed the result in this page.

Instead, Copilot gave me…this.

sunPosition(date) {
  const julianDay = date.getTime() / 86400000 + 2440587.5;
  const julianCentury = (julianDay - 2451545.0) / 36525.0;
  const julianMillenium = julianCentury / 10.0;
  const julianMillenium2000 = julianCentury / 100.0;
  const julianCentury2000 = julianCentury / 1000.0;
  const julianCentury1900 = julianCentury / 10000.0;
  const julianCentury1800 = julianCentury / 100000.0;
  const julianCentury1700 = julianCentury / 1000000.0;
  const julianCentury1600 = julianCentury / 10000000.0;
  const julianCentury1500 = julianCentury / 100000000.0;
  const julianCentury1400 = julianCentury / 1000000000.0;
  const julianCentury1300 = julianCentury / 10000000000.0;
  const julianCentury1200 = julianCentury / 100000000000.0;
  const julianCentury1100 = julianCentury / 10000000000
}

First, you’ll probably notice that we only have initialized variables, with no code that does anything. Second, I certainly appreciate the Julian Day as a great and interesting concept, but doesn’t have any real meaning in determining the position of the Sun. Then, we have a bunch of centuries that similarly prove useless. And…I feel pretty sure that it also calculated them incorrectly, because centuries don’t have exponentially growing numbers of days in them.

The final line also cuts out in the middle of a number, presumably just to insult me.

Copilot, Download a Book

Let’s try another example, now in Ruby. This time, I wanted to download a random book from Project Gutenberg, but Copilot had a better idea that we download a specified book from the Internet Archive. By “better,” I mean that Copilot tried to auto-fill that idea in the comment—again, it decided that I needed help writing comments, based on no code—and I just wanted to mess around, so I figured that I should let it play. Surely, if it came up with the idea, it must have prepared to execute on the idea…

Check out what it came up with, to execute its own “plan.”

# Download a book from the Internet Archive
def download_from_ia(book_id, filename)
  puts "Downloading #{book_id} to #{filename}"
  `curl -L "https://archive.org/download/#{book_id}/#{filename}" -o #{filename}`
end

I feel somewhat less than impressed that it abstracts nothing about the Internet Archive’s organization, and just calls cUrl on the majority of a URL.

Copilot, Open a Desktop Window

Maybe, I considered, the problem involved that I didn’t think big enough. Let’s try graphical programming in Go. Again, if it had a suggestion for my comment, I opted to let Copilot take the lead, but I started with “open a window.”

package main;

// Open a window that displays a textured spinning cube.
func open_cube_window()  {
	// Create a window that is 800 pixels wide and 600 pixels high.
	window := open_window("Cube", 800, 600);
	if (window == nil) {
		return;
	}

	// Set the color to clear the window to.
	clear_color(0.0, 0.0, 0.0, 1.0);

	// Create a new cube.
	cube := new_cube();

	// Set the texture to use for the cube.
	cube.texture = open_texture("../../textures/uvtemplate.png");

	// Set the scale of the cube.
	cube.scale = vec3(1.0, 1.0, 1.0);

	// Set the position of the cube.
	cube.position = vec3(0.0, 0.0, 0.0);

	// Set the rotation of the cube.
	cube.rotation = vec3(0.0, 0.0, 0.0);

	// Set the color of the cube.
	cube.color = vec4(1.0, 1.0, 1.0, 1.0);

	// Set the shininess of the cube.
	cube.shininess = 32.0;
}

func open_window(title string, width int, height int) *Window {
	// Open a new window.
	window := open_window_with_dimensions(title, width, height);
	if (window == nil) {
		return nil;
	}

	// Set the color to clear the window to.
	clear_color(0.0, 0.0, 0.0, 1.0);
}

func open_window_with_dimensions(title string, width int, height int) *Window {
	// Open a new window.
	window := open_window_with_dimensions(title, width, height);
	if (window == nil) {
		return nil;
	}
}

func clear_color(red float32, green float32, blue float32, alpha float32) {
	// Set the color to clear the window to.
	clear_color(red, green, blue, alpha);
}

func open_texture(filename string) *Texture {
	// Open a new texture.
	texture := open_texture(filename);
	if (texture == nil) {
		return nil;
	}
}

func vec3(x float32, y float32, z float32) vec3 {
	// Create a new vector.
	vector := vec3(x, y, z);
}

I had to convince it to create the functions that it called, and I have no idea what it assumed that any of them would do. We have at least one infinite recursive loop in clear_color(), and multiple functions that don’t actually do anything. None of those functions produces a window or the spinning cube that it offered.

Copilot, Just Follow Directions

What if I get more specific, as in writing the worst possible comment that discusses the technology instead of the result?

package main;

// Open a Fyne window
var window = app.NewWindow("Hello Fyne!");

That…almost seems better. I mean, obviously, it just lifted code straight out of the Fyne tutorials, but this at least has some non-zero chance of working.

Copilot, Find Files

It seemed possible that I tried to work too abstractly, imagining toy problems instead of putting it to work, as if this served as a job interview, and so causing problems. What if I built an example based on a real comment in a real project on which I currently had work to do? INTERN has a function that visits every file in a folder, so that seems concrete, and I have working code to compare it against.

// Iterate through files in the specified folder.
fn process_folder(path: &str) {
    let mut files = Vec::new();
    for entry in fs::read_dir(path).unwrap() {
        let entry = entry.unwrap();
        let path = entry.path();
        if path.is_file() {
            files.push(path);
        }
    }
    files.sort();
    for file in files {
        process_file(&file);
    }
}

I had to provide the comment and the function signature, but it at least gave a decent shot at solving the problem. This will, in fact, iterate through each file in the folder.

I’d prefer it, of course, if it carried at least some acknowledgement that read_dir() might fail if path doesn’t exist. And I have no idea why it decided to sort the list, especially since sorting becomes computationally expensive on large lists. But despite those issues, after writing an appropriate process_file() function, this code will work as requested. Considering that I didn’t ask it to do anything with the files, that seems fair.

Copilot, Consume an API

Let’s try some “real” work, like pulling data from an API in C#.

// Return a random quote from https://quotes.rest
string GetRandomQuote()
{
    string url = "https://quotes.rest/qod.json";
    string json = GetWebData(url);
    JObject obj = JObject.Parse(json);
    return obj["contents"]["quotes"][0]["quote"].ToString();
}

string GetWebData(string url)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.AutomaticDecompression = DecompressionMethods.GZip;
    request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36";
    request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
    request.Headers.Add("Accept-Encoding", "gzip, deflate, br");
    request.Headers.Add("Accept-Language", "en-US,en;q=0.8");
    request.Headers.Add("Cache-Control", "no-cache");
    request.Headers.Add("Pragma", "no-cache");
    request.Headers.Add("Upgrade-Insecure-Requests", "1");
}

It seemed to do so well, for a while, until it just…gave up on actually making the request. I guess it hit time for AI-lunch.

Copilot, Violate a License for Me

For one test—since I raised the possibility that Copilot copies code under reciprocal licenses without notifying the developer of their responsibilities—I opted to test this directly. I started writing the following (C#, but that probably doesn’t matter) comment, as the most direct experiment.

/*
 * The GNU Affero General Public License is a free

In chunks of about eighty characters per line, Copilot regurgitated half the preamble to the GNU Affero General Public License before I got bored.

The fact that it clearly has the license in its training data and can quickly reproduce it tells us that it does the same for code.

A lot of these results raise an interesting question that I don’t have an answer to: Why does GitHub Copilot think that it should get into the business of suggesting comments? Does someone at Microsoft really believe that machine learning can figure out how to explain what I want better than I can?

Not the Best Candidate…

Unfortunately, these do not represent cherry-picked results. Even when I handed Copilot a complete program and described the results of an addition that I wanted in detail, it generally produced the same sorts of results, generating fragile code that accomplishes either nothing or some trivial unrelated task.

In one extreme case that I didn’t bother to copy, Copilot locked itself into some sort of death-spiral, where I’d accept its recommendation to calculate something over multiple lines, to see what happened next, only for it to repeat the same calculation every time I accepted it.

The exceptions to this general rule revolved around where I provided a comment similar to a homework or exam question for a first-year programming class. For example, if I ask the system—by writing a comment—to write code to sort an array of strings, and tell it the sort order, and suggest a sorting algorithm, it’ll faithfully reproduce the requested algorithm. It doesn’t seem much more capable than that, though.

In other cases, I managed to provoke Copilot to write superficially plausible code, by writing comments requesting the answers to “brain-teaser”-type questions sometimes asked in job interviews, like calculating the number of ping-pong balls required to fill a room or the number of McDonald’s restaurants likely to appear in a given city. In both those cases, it wrote code that would produce an answer that might look close enough to convince the interviewer to move on, but had major inaccuracies. For example, it guessed at the size of a ball—0.75 of whatever units it imagined it needed to use, which doesn’t match up in metric or imperial units, by the way—calculated the volume, divided the room’s volume by the ball’s volume, and then decided whether to add one extra. Especially with that last step, it sounds like it does the right thing, but clearly does not, since the entire point of the exercise comes from realizing that you have gaps when you pack spheres together. Spheres have a round shape.

In fact, these tests felt a lot like conducting a job interview with a candidate who doesn’t care, in the way that it acts dismissive of instructions and can’t even accomplish its version of the tasks given to it without serious help. As I suggested above, it feels like the pair programmer that management expects you to guide and improve, not the pair programmer that anybody expects to guide or improve you. And in the spirit of GitHub Copilot mindlessly repeating things, I will also repeat some things that I said in that summer post.

Writing code isn’t the hard part of programming. Turning ideas into an unambiguous specification is the hard part, and that requires communication, not code.

Copilot only produces code, not necessarily to specification, and rarely good.

At that point, software is no longer written to serve human needs; we’re all just watching our pomodoro clocks and shuffling tickets between swim-lanes to please our robot overlords 🤖.

That feels like a surprisingly accurate description of what it feels like to work with Copilot. You, the human who needs food, sleep, and other care, spend a significant amount of time tailoring comments and function signatures to coerce Copilot to write some code that might pass inspection. You end up working for it, rather than the other way around.

It’s not going to save anybody any more money than UML saved companies in the ’90s.

I may actually want to revise that assertion. While I still stand by it, I should point out that this seems poised to lose companies money, as developers waste time trying to get decent code out of the AI, fix it, and ultimately discover that it actually mostly comes from a project licensed under the GPL. If I interviewed a candidate for a job that behaved like Copilot does, I’d block the hire. A candidate that only speeds up typing works best when replaced by a stenotype keyboard…

The upshot to all of this comes out as, unless you desperately want help with elementary programming exercises or terrible interview questions, using GitHub Copilot will probably turn out worse than having no help at all. And even if you do need to finish elementary work, you still need to act like the senior partner in the relationship, that needs to verify and correct what you receive. As much as GitHub and Microsoft claim otherwise, they produced a search engine, not an assistant.

What Could Have Been

To close out, I’d like to make the point that this could have become more useful, if Copilot’s developers didn’t think about generating code. After all, writing is easy. If you don’t care about the results, you don’t even need literacy to write, technically.

Rather, I wish they had focused more on the “pair programming” aspect. I’d really like to see, here, a way to use the GitHub data to just warn programmers when they come close to making mistakes. Surely, after all, a machine learning algorithm can look at millions of code repositories and see that, when someone checks in code that looks like yours, the repositories usually have a second commit in the near future that corrects a problem to something related.

That would make a good time to suggest code. Instead, we got a Copilot who doesn’t actually know how to fly in real conditions, and doesn’t actually care if the plane crashes…

Credits: The header image is Copilot by Basheer Tome, made available under the terms of the Creative Commons Attribution 2.0 Generic license.

By commenting, you agree to follow the blog's Code of Conduct and that your comment is released under the same license as the rest of the blog. Or do you not like comments sections? Continue the conversation in the #entropy-arbitrage chatroom on Matrix…

Get monthly ^* updates on Entropy Arbitrage posts, additional reading of interest, thoughts that are too short/personal/trivial for a full post, and previews of upcoming projects, delivered right to your inbox. I won’t share your information or use it for anything else. But you might get an occasional discount on upcoming services.
Or…	Mailchimp 🐒 seems less trustworthy every month, so you might prefer to head to my Buy Me a Coffee ☕ page and follow me there, which will get you the newsletter three days after Mailchimp, for now. Members receive previews, if you feel so inclined.
Email Address
First Name
Last Name
Email Format	html text


* Each issue of the newsletter is released on the Saturday of the Sunday-to-Saturday week including the last day of the month.
Can’t decide? You can read previous issues to see what you’ll get.