As a quick note, I initially thought of writing this as some brief notes, and so it almost went into the Entropy Arbitrage newsletter. However, once it crossed a thousand words, and didn’t dwell on specific Intellectual Property—like a television show—it felt more appropriate for the blog.

A clip-art-style image of a robot sitting at a desk writing, with a clock on the wall next to it

In any case, for the holidays, I decided to experiment with a month of paid ChatGPT, to see if the service held any actual value.

The Value of Commercial AI

For clarity, I see the potential value in these AI systems as mostly creating placeholder work to hand off to a more competent and motivated worker. No computer will ever have all the context or interest necessary to create work suited to a specific task. If it could, we’d need to have a long discussion about its civil rights and why a company owns it. I think of it a bit like collecting inspiration for a project, in other words, rather than as a solution that will actually provide completed work, in other words, giving me something that I can show a real worker to give them some idea of the direction of my thinking.

Likewise, I should point out that I have at least some awareness that companies like OpenAI sell their services at a massive loss. They can only figure out how to make their business models work at all by infringing on everyone’s copyright —seriously, they actually say that “it would be impossible to train today’s leading AI models without using copyrighted materials”—and even then, they probably lose money on electricity costs. Seriously, at this point, they believe that they need a fusion breakthrough to stay in business.

In that sense, then, you always technically get a good deal on the service, because it’ll cost you less per query than it would cost to provide it for yourself. I don’t care about that side of things too much, but to some people, the prospect of hastening the rate at which these companies waste money might make a sensible argument in favor of a subscription.

This all goes into saying that I walk into these experiments with my eyes open, expectations of failure, and only an interest in saving myself some occasional effort. I especially wanted to look at the “web browser,” the interface over to sister-project DALL-E, and the PDF access, because a better image interface would improve that product, and the other two could make me more productive, if they prove out. And…wow, did this ever flop.

Where Things Went Well

In fairness, I started by pointing the web browser plugin to my blog and asked it to recommend some end-of-year post ideas, and it recommended…the most middle-of-the-road, content-marketing-est posts that I could plausibly have gotten away with. If I wanted that, or if I needed a post to go out so that I could sell ad space against it, then I could see the feature maybe saving me two minutes to say “maybe I will write a list of 2024 technology predictions” or whatever. It doesn’t make sense for my situation, and I don’t know if I could ever approve of it, but I could see it feeling helpful to someone who needs to get a blog post or other essay-adjacent product out the door.

I also got some decent code out of it, though it almost as often suggested an overly complicated design, would ignore a lot of requirements until I reminded it, and/or produce a solution that even superficial inspection would show couldn’t work before testing it.

This goes back to my original assertion that you can’t reasonably expect this to replace a human, because it doesn’t understand or care about the user’s needs. After more than a year and a lot of credulous excitement, it still only really spews out homogenized text with no sense of purpose or anything that you could even pretend looks like comprehension, once you’ve gotten accustomed to its routine.

The Web Browser Tests

For my real test of the web browser, I tried two different conversations:

Assuming that I wanted a similar but distinct aesthetic for a CC BY-SA-licensed counterpart of the Star Wars franchise, I first asked it to find me some candidate fonts that might fit the primary uses of text. If you read the Entropy Arbitrage newsletter or could read between the lines in last Sunday’s talk about a hypothetical diversified Free Culture franchise, then you already have some idea of what that project looks like. For the rest of you, you’ll either need to read about it there or sit tight until I actually get the project rolling…

In any case, I gave it the following specific uses for the fonts.

  • Text in a logo
  • The hypothetical introductory text crawl
  • Any inter-scene titles

For the second test, I asked it to find me some fictional fields of mathematics or fields of pseudo-mathematics. We have plenty of fictional science and history that we could point to, but the only fictional math that I can think of involves adding bogus words to existing fields, like having someone blurt out “quantum trans-Euclidean geometry,” or some nonsense like that, which doesn’t feel like it should count, since it probably still looks a lot like geometry.

Hoping to head off the inevitable response where it repeats whatever appears at the first search results for the keywords, I specified that I did not want a list of novels where something mathematical happens or gets mentioned in the story. I already know that Flatland exists, for example, and know that it does not contain references to any fictional branches of mathematics.

The Fonts

Now, I should admit that the font problem does feel slightly unfair, especially since I already have some good candidates. However, it feels that way mostly because of the difficulty in judging the visual fitness for a project that I haven’t defined, except in its vague similarity to another project. It doesn’t feel unfair because I asked it to find fonts that conform to some (minimal) criteria.

If it had given me ugly fonts, I still would’ve complained, but I probably wouldn’t have written this post, because that only qualifies as a useless result. I fully expected the thing to suggest one of the many clones of the Star Wars title font—I’ve seen something like a dozen of them, and surely at least one of them has a Free Culture license—and something unassuming like Source Sans or even Exo 2 as used for the main text of this blog.

Despite that, it managed to instead find this task difficult by completely ignoring my single firm requirement, interpreting “compatible with CC BY-SA” to mean any font that I could download without paying anything, and that someone called “sci-fi,” somewhere. I don’t dislike the fonts that I see on Envato—to pick one inappropriately licensed example collection that it kept promoting—but I don’t want to download a font for use when I write notes to myself or for a one-time school project, but rather for when someone needs to reconstruct the work that I’ve done on a project. I find Orbitron somewhat unpleasant to look at and probably more suited to near-future, Earth-bound concepts than what I have in mind, for example, but I’d rather see that now-trite recommendation for a logo—because I can bundle it with a project and not need to worry about anything beyond aesthetics—than a spectacular visual choice that I couldn’t legally use on this blog or similar work.

On the other hand, it also tried to push me to a lot of monospaced fonts, which makes absolutely no sense for the request.

When I confronted the chatbot with the petty, insignificant detail that “free for personal use” doesn’t work for my stated license, it apologized profusely for confusing me—which, what?—and declared victory by providing me with a list of websites that people use to download fonts, with a note that I could go through their listings and see what looks good and which licenses fit my needs. Apparently, it assumed that it had never occurred to me to look for fonts with a web browser, and had never encountered Google Fonts before.

You heard that right: The paid version of ChatGPT solved its problem by assigning the project back to me, the customer making the request.

Interspersed with this, by the way, it would reply to my attempts to clarify the question or complaints about the results with apologies and some variation on “allow me a moment to try a new search,” but with no actual search after that. I suppose that could have come from a glitch, or maybe it has instructions to keep users on site as long as possible, for whatever reason. Regardless of the reason, I don’t care for it…

Fictional Fields of Mathematics

For the second problem, you know exactly how this went, I would guess. In fact, it did happily give me a list of books that feature math somewhere in their pages, starting with Flatland. None of them feature branches of mathematics, but all mention or center (real) mathematical concepts in the story.

When I pointed out that I specifically told it not to answer the question that way, it gave me a different list of books, though it also included Flatland, and a similar set of criteria that conflicted with my request. A third try caused it to—yes, as in the previous example—inform me of the existence of books in general, and suggest that maybe I read through them to find what I want.

Then, after pointing out that “read every book published to see which have any useful information” seems like a task more suited to a computer than a paying customer, it tried one more search and told me that this task now looked significantly more difficult than when I had started, and apologized for the failure. After that, it refused to continue, except to correct my I-guess-misguided belief that it only handles trivial questions…but then it offered to try again, if I could narrow down the search to specific books and specific fictional fields of mathematics.

And that makes some sense, right? If I told it what books to look through, and maybe even what fictional fields of mathematics that a reader would find in them, then the AI could almost certainly tell me about them without over-extending itself. 🤪 I can see this business model catching on more widely. “Have it your way” at Burger King, for example, by making your burger at home however you like it, after you pay the restaurant chain a monthly subscription for such sage advice from the Burger Serf Plus program…

Web Browser Assessment?

The ChatGPT web browser works somewhat less well than searching the web manually does—at a time when search results have already declined in relevance, for a wide variety of reasons—but also has no trouble wasting as much of your day as you can give it, and will ultimately suggest that you should do the work on your own, and tell it what you found.

Image Generation Test

On to the DALL-E interface, I decided to try something low-effort: Give me a prototype logo for a new science fiction franchise—yes, the same one alluded to above—inspired by a couple of major mainstream franchises that I named, such as Star Wars. I didn’t worry about it getting the text right, only that it could give me some structure for the logo, like a layout, some design elements, and hints of a useful font. If it couldn’t find me a font, maybe it could “manifest” one, similar to the results that I got for Kabang!, a few months ago.

And it told me that doing so would violate the terms of service. Why? Oh, right, that makes no sense, it tells me, and definitely does not violate the terms of service…but it can’t create the image, maybe because of a technical problem. And it tried again, with the same results. I asked it what policies might prevent it—maybe calling out specific Intellectual Property triggered something, I figured—but no, it wouldn’t tell me anything useful. But it did offer to “refine” the logo for me. I don’t know how you refine a logo that nobody has created, and neither did it, because that plan also ran afoul of that magic content policy.

Can it help me try any other images, it asks? No, I explain—in hopes of getting it to reveal this magic content policy—because instead of image generation, I have non-consensually entered into a guessing game of what image ideas I would like to see that it will give me permission to see. And again, it suggests giving it more chances, because when I think about what I would really like out of an assistant, I think primarily about begging, with no plan to fix any problems.

DALL-E Assessment?

Maybe this will work better for someone than going to any image-generation service and learning the quirks of its prompt syntax, but it definitely didn’t do anything useful for me unless I asked it for something worthlessly generic, like a domestic animal sleeping.

Document-Reading Test

Finally, I got to the ask-your-PDF plug-in. It promises that I can upload any document and then ask it questions about the contents of that document. And so, I picked a small project that I’ve wanted to get to for a few months: I uploaded Green Comet, and asked it to tell me everything that it could find about (fictional sport) flashball. After all, I’d love to have a post highlighting what we know about the handful of original sports that appear in Free Culture works.

This seemed entirely fair, for once. The book includes a chapter where the protagonist almost exclusively watches a game. If the AI can’t handle something that overt, then it probably wouldn’t have much luck summarizing or picking out the themes of a novel.

However, it instead told me that flashball sounds like a sport, but that it can’t find any reference to it in the book, so I must have gotten it wrong. I assured the AI that it definitely came up, because I had read (at least some of) the story, and it continued to deny finding it, asking me to tell it where to look. I suggested maybe starting with the table of contents, where it would find a link to an entire chapter called Flashball, which maybe sounds exactly like the reference that it claimed that it couldn’t find. And from there, it went on to complain that maybe the book had too much text in it to bother with.

Maybe—it suggests—I could tell it what the book has to say about flashball, so that it could answer my question with more detail. Yes, once again, it tried the “if you tell me what to tell you, then I can probably tell you what you told me” idea.

Smaller Tests

I didn’t only try these plugins. I also gave the main GPT-4 chat a chance to do things that I’ve seen the company tout using their service for. None of those went particularly well.

As a big example, I asked it for help renaming various things. I’ve done this before to modest-but-sometimes-usable results, but the modern approach seems to begin and end with picking random keywords from the description or existing title, and going through permutations of synonyms, more suited to weak parody than actual brainstorming. Much like the “web browser,” this seems to take a fast, low-effort activity, and make it slower and more frustrating.

I would also occasionally ask it for music recommendations, particularly classical, that sound like television and movie themes. This seems to sit neatly with the renaming problems in the last paragraph, because rather than focus—as requested—on the similarity in sound, it’d instead inevitably change the definition of “theme” to the thematic elements of the plot and storytelling. For example, if I ask about the opening theme to Star Trek: Voyager, the answer should at least mention Richard Strauss, rather than prattling on about music that might reflect the difficulties of two crews trying in working together to get home. And yet, it would give me the latter every time, even when I tried correcting it.

The music test seems especially odd, to me, considering that people have trained language models on music for decades, making me think that OpenAI would have done something similar.

Overall Assessment

The paid version of ChatGPT would probably do a good job of replacing some of the worst managers that I’ve seen in my career. It routinely brushes off required work as if leaving it for someone else to do. It desperately wants you to like it, but doesn’t provide any reason to do so. It wants to see you spending as much time in its field of vision as possible. And it has an absolute willingness to present your ideas as its own, even instructing you to supply it with the information that you asked for. But it actually might make a worse assistant than the free version of the service, so if you want to use this to reduce headcount in your organization, prioritize appropriately…

I forget where—apparently not a post, because I can’t find it—but I once mentioned that GitHub Copilot felt like the sort of “assistant” that your boss hires out of nepotism. This person becomes a core part of the team, despite no actual skill or interest in your field. ChatGPT seems to want to go in the same direction, complete with a sense of self-entitlement that you should get used to working for it, because it’ll get a promotion any day now.

When I cancelled my subscription, by the way, OpenAI had a “fun” little survey, where it tries to narrow down why, exactly, none of them have become billionaires, other than making an expensive service that nobody really needs, requires massive copyright infringement, and doesn’t really work. They wanted to know my overall reason for leaving, but also suggestions of how to improve it, and how sad—seriously, they ask—I would feel, if they eliminated the free tier. Oh, and they “offered” me a chance to talk to someone and expound on my answers, because I really want to do more free labor for them, after their AI repeatedly told me to do my own research, and while they tell courts that they need to steal Intellectual Property because they can’t become ultra-wealthy without exploiting everyone…

Mind you, I don’t regret spending the twenty bucks, because I did so mostly for the sake of these experiments. However, if I had gone in expecting ChatGPT to actually provide me with value, I’d feel even less charitable than you got in this post.

🤖


Credits: The header image is robot artificial intelligence by Mohamed Hassan, made available under the terms of the Creative Commons CC0 1.0 Universal Public Domain Dedication.