A copilot

After months since writing about overblown worries about developer tools, I am now officially a GitHub Copilot user. I promised that I’d write about it when I had some hands-on time with the system, so that’s this post.

Long story short, I continue to not believe that this is the end of the world.

Setup

We got off to an inauspicious start, with an activation process that isn’t actually rigged up correctly. The authentication code needs to be added manually.

It’s also computationally- or memory-intensive, for some reason, crashing Visual Studio Code—and nearly my laptop—twice in an hour, just because I asked for a sort routine.

Worse, the suggestions are mediocre at best, often throwing in bogus code or being indecisive about whether a variable will be constant or not. Microsoft calls the product “Your AI pair programmer,” and that’s somewhat accurate, if you assume that you’re the senior developer in the relationship, the one who’s there to improve their partner, rather than the one being improved. Copilot is more like the junior partner, who wants to prove that they can come up with something—anything—but then relies on the team to figure out what’s wrong.

Copilot, Make a Sundial

Here’s an example. I had the idea of a “clock” that would show a sundial shadow, so I provided the comment, “Return the position of the Sun in the sky, based on the date and time.” It tried to auto-fill my comment to suggest a date and time of birth, because I guess Copilot is into astrology, but instead I gave it an empty function to work with.

Now, I imagined this to be a simple task. The Sun’s position is central to determining sunrise and sunset (critical to many fields), orienting solar energy collectors throughout the day, and other tasks. Plus, it’s just trigonometry: The time of day gives the angle, and the date gives the inclination of the circle.

This would be a JavaScript project, since if it worked, I could add the graphics code—or let Copilot continue—and embed the result in this page.

Instead, Copilot gave me…this.

sunPosition(date) {
  const julianDay = date.getTime() / 86400000 + 2440587.5;
  const julianCentury = (julianDay - 2451545.0) / 36525.0;
  const julianMillenium = julianCentury / 10.0;
  const julianMillenium2000 = julianCentury / 100.0;
  const julianCentury2000 = julianCentury / 1000.0;
  const julianCentury1900 = julianCentury / 10000.0;
  const julianCentury1800 = julianCentury / 100000.0;
  const julianCentury1700 = julianCentury / 1000000.0;
  const julianCentury1600 = julianCentury / 10000000.0;
  const julianCentury1500 = julianCentury / 100000000.0;
  const julianCentury1400 = julianCentury / 1000000000.0;
  const julianCentury1300 = julianCentury / 10000000000.0;
  const julianCentury1200 = julianCentury / 100000000000.0;
  const julianCentury1100 = julianCentury / 10000000000
}

First, you’ll notice that all we have are initialized variables, no code that does anything. Second, Julian Day is a great and interesting concept, but it’s not really meaningful to the position of the Sun. Then, we have a bunch of centuries that are similarly useless. And…I’m pretty sure that they’re also calculated incorrectly, because centuries don’t have exponentially growing numbers of days in them.

The final line also cuts out in the middle of a number, no less.

Copilot, Download a Book

Let’s try another example, now in Ruby. This time, I wanted to download a random book from Project Gutenberg, but Copilot had a better idea that we download a specified book from the Internet Archive. By “better,” I mean that’s just what Copilot tried to auto-fill in the comment, and I’m just messing around, so I figured I should let it play. Surely, if it came up with the idea, it’s prepared to execute on the idea…

Here’s what it came up with to execute its own “idea.”

# Download a book from the Internet Archive
def download_from_ia(book_id, filename)
  puts "Downloading #{book_id} to #{filename}"
  `curl -L "https://archive.org/download/#{book_id}/#{filename}" -o #{filename}`
end

I’m less than impressed that it abstracts nothing about the Internet Archive’s organization, and just calls cUrl on the majority of a URL.

Copilot, Open a Desktop Window

Maybe the problem is that I’m not thinking big enough. Let’s try graphical programming in Go. Again, if it had a suggestion for my comment, I opted to let Copilot take the lead, but I started with “open a window.”

package main;

// Open a window that displays a textured spinning cube.
func open_cube_window()  {
	// Create a window that is 800 pixels wide and 600 pixels high.
	window := open_window("Cube", 800, 600);
	if (window == nil) {
		return;
	}

	// Set the color to clear the window to.
	clear_color(0.0, 0.0, 0.0, 1.0);

	// Create a new cube.
	cube := new_cube();

	// Set the texture to use for the cube.
	cube.texture = open_texture("../../textures/uvtemplate.png");

	// Set the scale of the cube.
	cube.scale = vec3(1.0, 1.0, 1.0);

	// Set the position of the cube.
	cube.position = vec3(0.0, 0.0, 0.0);

	// Set the rotation of the cube.
	cube.rotation = vec3(0.0, 0.0, 0.0);

	// Set the color of the cube.
	cube.color = vec4(1.0, 1.0, 1.0, 1.0);

	// Set the shininess of the cube.
	cube.shininess = 32.0;
}

func open_window(title string, width int, height int) *Window {
	// Open a new window.
	window := open_window_with_dimensions(title, width, height);
	if (window == nil) {
		return nil;
	}

	// Set the color to clear the window to.
	clear_color(0.0, 0.0, 0.0, 1.0);
}

func open_window_with_dimensions(title string, width int, height int) *Window {
	// Open a new window.
	window := open_window_with_dimensions(title, width, height);
	if (window == nil) {
		return nil;
	}
}

func clear_color(red float32, green float32, blue float32, alpha float32) {
	// Set the color to clear the window to.
	clear_color(red, green, blue, alpha);
}

func open_texture(filename string) *Texture {
	// Open a new texture.
	texture := open_texture(filename);
	if (texture == nil) {
		return nil;
	}
}

func vec3(x float32, y float32, z float32) vec3 {
	// Create a new vector.
	vector := vec3(x, y, z);
}

I had to convince it to create the functions that it called. We have at least one infinite recursive loop in clear_color(), and multiple functions that don’t actually do anything. None of those functions produces a window or the spinning cube that it offered.

Copilot, Just Follow Directions

What if I’m more specific, as in writing the worst possible comment that discusses the technology instead of the result?

package main;

// Open a Fyne window
var window = app.NewWindow("Hello Fyne!");

That’s…almost better. I mean, obviously, it just lifted code straight out of the Fyne tutorials, but this at least has some non-zero chance of working.

Copilot, Find Files

It’s possible that my problem is that I’m being too abstract, imagining toy problems instead of putting it to work, as if this is a job interview. What if I build an example based on a real comment in a real project that I’m currently working on? INTERN has a function that visits every file in a folder, so that’s concrete, and I have working code to compare it against.

// Iterate through files in the specified folder.
fn process_folder(path: &str) {
    let mut files = Vec::new();
    for entry in fs::read_dir(path).unwrap() {
        let entry = entry.unwrap();
        let path = entry.path();
        if path.is_file() {
            files.push(path);
        }
    }
    files.sort();
    for file in files {
        process_file(&file);
    }
}

I had to provide the comment and the function signature, but it at least gave a decent shot at solving the problem. This will, in fact, iterate through each file in the folder.

I’d prefer it, of course, if there was at least some acknowledgement that read_dir() might fail if path doesn’t exist. And I have no idea why it decided to sort the list, especially since sorting is computationally expensive on large lists. But despite those issues, after writing an appropriate process_file() function, this code will work as requested. Considering that I didn’t ask it to do anything with the files, that’s fair.

Copilot, Consume an API

Let’s try some “real” work, like pulling data from an API in C#.

// Return a random quote from https://quotes.rest
string GetRandomQuote()
{
    string url = "https://quotes.rest/qod.json";
    string json = GetWebData(url);
    JObject obj = JObject.Parse(json);
    return obj["contents"]["quotes"][0]["quote"].ToString();
}

string GetWebData(string url)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.AutomaticDecompression = DecompressionMethods.GZip;
    request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36";
    request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
    request.Headers.Add("Accept-Encoding", "gzip, deflate, br");
    request.Headers.Add("Accept-Language", "en-US,en;q=0.8");
    request.Headers.Add("Cache-Control", "no-cache");
    request.Headers.Add("Pragma", "no-cache");
    request.Headers.Add("Upgrade-Insecure-Requests", "1");
}

It was doing so well until it just…gave up on actually making the request. I guess it was time for AI-lunch.

Copilot, Violate a License for Me

For one test—since I raised the possibility that Copilot is copying code under reciprocal licenses without notifying the developer of their responsibilities—I opted to test this directly. I started writing the following (C#, but that probably doesn’t matter) comment.

/*
 * The GNU Affero General Public License is a free

In chunks of about eighty characters per line, Copilot regurgitated half the preamble to the GNU Affero General Public License before I got bored.

The fact that it clearly has the license in its training data and is able to reproduce it tells us that it is doing the same for code.

A lot of these results raise an interesting question that I don’t have an answer to: Why does GitHub Copilot think that it should be in the business of suggesting comments? Does someone at Microsoft really believe that machine learning can figure out how to explain what I want better than I can?

Not the Best Candidate…

Unfortunately, these were not cherry-picked results. Even when I handed Copilot a complete program and described the results of an addition that I wanted in detail, it generally produced the same sorts of results, generating fragile code that accomplishes either nothing or some trivial unrelated task.

In one extreme case that I didn’t bother to copy, Copilot locked itself into some sort of death-spiral, where I’d accept its recommendation to calculate something over multiple lines, to see what happened next, only for it to repeat the same calculation every time I accepted it.

The exceptions to this general rule were where I provided a comment similar to a homework or exam question for a first-year programming class. For example, if I ask the system—by writing a comment—to write code to sort an array of strings, and tell it the sort order, and suggest a sorting algorithm, it’ll faithfully reproduce the requested algorithm. It doesn’t seem to be much more capable than that, though.

In other cases, I was able to provoke Copilot to write superficially plausible code, by writing comments requesting the answers to “brain-teaser”-type questions sometimes asked in job interviews, like calculating the number of ping-pong balls required to fill a room or the number of McDonald’s restaurants likely to be found in a city. In both those cases, it wrote code that would produce an answer that might be close enough to be accepted, but wasn’t correct. For example, it guessed at the size of a ball—0.75 of whatever units it imagined were in use, which isn’t correct in metric or imperial units—calculated the volume, divided the room’s volume by the ball’s volume, and then decided whether to add one extra. Especially with that last step, it sounds like it’s doing the right thing, but clearly isn’t, since the entire point of the exercise is that there are gaps when you pack spheres together.

In fact, these tests felt a lot like conducting a job interview with a candidate who doesn’t care, in the way that it is dismissive of instructions and can’t even accomplish its version of the tasks given to it without serious help. As I suggested above, it’s the pair programmer that you’re expected to guide and improve, not the pair programmer that’s expected to guide or improve you. So, in the spirit of GitHub Copilot mindlessly repeating things, I will also repeat some things that I’ve said in that summer post.

Writing code isn’t the hard part of programming. Turning ideas into an unambiguous specification is the hard part, and that requires communication, not code.

Copilot only produces code, not necessarily to specification, and rarely good.

At that point, software is no longer written to serve human needs; we’re all just watching our pomodoro clocks and shuffling tickets between swim-lanes to please our robot overlords 🤖.

That’s a surprisingly accurate description of what it’s like to work with Copilot. You, the human who needs food, sleep, and other care, spend a significant amount of time tailoring comments and function signatures to coerce Copilot to write some code that might pass inspection. You’re working for it, rather than the other way around.

It’s not going to save anybody any more money than UML saved companies in the ’90s.

I may actually want to revise that assertion. While I still stand by it, I should point out that this seems poised to lose companies money, as developers waste time trying to get decent code out of the AI, fix it, and ultimately discover that it’s actually mostly from a project licensed under the GPL. If I interviewed a candidate for a job that behaved like Copilot does, I’d block the hire. A candidate that only speeds up typing can be replaced by a stenotype keyboard…

The upshot to all of this is that, unless you are looking for help with elementary programming exercises or terrible interview questions, using GitHub Copilot is probably worse than having no help at all. And even if you are doing elementary work, you’re still the senior partner in the relationship that needs to verify and correct what you receive. As much as GitHub and Microsoft claim otherwise, they produced a search engine, not an assistant.

What Could Have Been

To close out, I’d like to make the point that this could have been more useful, if Copilot’s developers didn’t think about generating code. After all, writing is easy. If you don’t care about the results, you don’t even need to be literate to write, technically.

Rather, I wish they had focused more on the “pair programming” aspect. What I’d really like to see, here, would be to use the GitHub data to just warn programmers when they’re about to make mistakes. Surely, after all, a machine learning algorithm can look at millions of code repositories and see that, when someone checks in code that looks like yours, there’s usually a second commit in the near future that corrects a problem to something else.

That would be a good time to suggest code. Instead, we got a Copilot who doesn’t actually know how to fly in real conditions, and doesn’t actually care if the plane crashes…


Credits: The header image is Copilot by Basheer Tome, made available under the terms of the Creative Commons Attribution 2.0 Generic license.