Writing a Twitter Bot

Hi! It looks like I have since continued, updated, or rethought this post in some ways, so you may want to look at this after you're done reading here.

Developer Journal, Jonathan Swift's Birthday from Nov 30, 2020, 6:55am

It has been a while since I’ve posted a tech tip, so my new Twitter bot @replybrary—the repository is available, too—seems like as good an excuse as any to write one, so I don’t need to look this up again. This particular bot replies to any message sent to it.

A bluebird

All things considered, the code is exactly what you’d expect it to be, plus or minus which library you choose to work with, so I probably won’t focus much on that side.

So, let’s get started. Please excuse the lack of images; I don’t want to bring any of Twitter’s non-Free design into this post, in case someone needs to re-use it.

Create an Account

Unless you’re planning to unleash automation on an existing account, you won’t get far if you don’t head over to Twitter, log out if you need to, and Sign up for a new account for your bot. Twitter has a nice feature that allows you to “claim” your accounts and have them all logged in simultaneously, if that’s useful to you.

Twitter doesn’t require much information, so if you’re concerned about your “data footprint,” this isn’t too bad. Complaints about Twitter tracking users is rare, as well, though I’m sure they do some of that.

Prepare Your App

Next up, jump over to the Twitter Developer Site. Sign in, if you need to, and click Create an app. Fill out the required fields. Twitter wants to know what your plans are. Be honest, because they’ll undoubtedly compare your plan to your actual behavior, if anybody reports your bot for abuse.

If the site doesn’t send you to an App details page, click the Details button.

Select the Permissions tab, to make sure that your bot has “Read and write” access permissions. Depending on what you need, you can trim that back to Read-only—for example, you might want to record a Twitter chat for your community, which wouldn’t require participating—or add Direct Messages. This bot replies to requests, though, so “read and write” is what we want.

Then, select the Keys and tokens tab. Here—surprise!—you can manage your keys and tokens. Generate one of each and copy all four pieces of information (API key, API secret key, Access token, and Access token secret) somewhere locally. If you’re planning to just use my Library-Twitterbot project linked above, a good place to copy that information is into the project’s config.json file, where the bot can pick it up later.

Obviously, don’t let anybody else get access to those details, especially the two that are explicitly marked secret. That’s why my code reads from the external JSON file: If you never commit that file, nobody can see your keys and tokens but you.

The Code

For the code itself, I decided to go with the twit library. It hasn’t been updated in two years and has over a hundred unresolved issues against it, so that might not be the best plan. However, NPM gives it high scores for popularity and quality, and it does the job. So, I’m willing to rewrite my small program if this library starts failing in the future.

You can read the bot’s code for yourself, if you’re interested in the details, but here are the interesting parts that took me some time to figure out.

Waiting for Messages

The first step is putting the bot in a position to receive the messages sent to it. In this case, we create a stream that “follows” our bot’s user.

const twit = require('twit');
const twitter = new twit(config);
const stream = twitter.stream(
  'statuses/filter',
  { follow: [ config.user_id ] }
);

There’s almost certainly a more elegant way to do this. There’s a user stream in the twit documentation, even. However, I wasn’t able to get that to work, whereas this approach worked on the first try.

This might be a sign that the library is already out of date, but like I said, I’ll worry about that some other time. The point is that the bot is now listening to config.user_id’s timeline.

Handle the Tweets

The stream we created in the last section is event driven. In our case, the event that interests us is tweet.

stream.on('tweet', (tweet) => {
  // ...do something
})

The meat of the project is filling in the “do something” part. To reiterate, I won’t waste time with the details, but the code in the repository takes the incoming tweet, parses out some likely keywords, and replies to the original tweet with URLs that relate to the keywords.

Skipping over the keyword/URL part, the important parts of the handler are as follows.

Track Your History

This isn’t strictly necessary, yet, but I’ll eventually want to look at the user’s history, in case there was a crash and we missed some incoming tweets.

ids.push(tweet.id_str);
fs.writeFileSync('processed_ids.json', ids);

By keeping a record of every tweet we see—in the ids variable and in the processed_ids.json file—we can make sure that we only reply to each message once.

Don’t Talk to Yourself

We also want to perform some similar tracking and stop processing, when the tweet was sent by the bot’s account.

if (tweet.user.screen_name === config.screen_name) {
  return;
}

If you don’t do something like this, then the bot will reply to the initial tweet, but then see itself mentioned in the reply and attempt to reply to that tweet. If the keywords are inserted into the reply, this creates an infinite loop that only ends when Twitter’s heuristics have realized that you’re about to flood the system and cut you off.

Don’t panic. If you forget, you’ll get an error when you post (see below) and just need to manually click through Twitter’s challenge. Ironically, it wants you to prove that you’re a human, even though the whole point of the project is to not have a human involved…

Post the Reply

This is fairly straightforward, once you’ve taken a look at the tweet event object.

twitter.post(
  'statuses/update',
  {
    in_reply_to_screen_name: tweet.user.screen_name,
    in_reply_to_status_id: tweet.id,
    in_reply_to_status_id_str: tweet.id_str,
    in_reply_to_user_id: tweet.user.id,
    in_reply_to_user_id_str: tweet.user.id_str,
    status: `@${tweet.user.screen_name} ${response}`,
  },
  (err, data, response) => {
    if (err) {
      console.log("Couldn't tweet:");
      console.log(JSON.stringify(err, ' ', 2));
    }
  });

Specifically, all we’re going to do, here, is post the reply. That said, the reply has a few parts. It’s possible that some of those fields are unnecessary, but to be on the safe side and ensure that the threading is correct, I’m making sure to set:

The text of the message (status), including the user’s @-name,
The user the bot is responding to, by @-name (in_reply_to_screen_name) and user ID (in_reply_to_user_id and in_reply_to_user_id_str), and
The message the bot is responding to, by ID (in_reply_to_status_id and in_reply_to_status_id_str).

This ensures that it shows up as a reply to the message and that the user sending the request sees it in their mentions.

Other

Like I said, this isn’t the whole story. The actual bot also needs to handle the error event, plus parsing the message and matching URLs to keywords. But this structure is most of what most people probably need.

For example, if you just want to collect tweets related to a hashtag, change the stream filter and don’t post() anything. Or, if you want to copy NPR and post The United States Declaration of Independence on our Independence Day, use the same structure, but only reply to the bot’s own posts and post the first tweet outside the handler code.

I’ll be doing more work on this bot before I set it free. And I want to do more research to find out if there’s an easy way that I can automate my Twitter Roundup posts on Fridays, because it would be nicer to not need to manually copy the URL for each tweet.

However, I do want to introduce everybody to my new best friend, the process to remove all duplicate values from an array, which I needed to make sure the returned URLs don’t waste any space.

return [...new Set(urls)].join(' ');

This:

Converts the array to a Set, which can’t have duplicates,
Uses the spread operator (...) to turn the Set into a list of parameters for a new array initializer ([]), and
Turns that array into a space-delimited string via join().

What I especially like about it is that there isn’t any custom code—I have previously used filter() to remove duplicates—and is a clear idiom of “give me the array of the set of the array.”

Happy automated Twittering…

Credits: The header image is untitled by an unnamed photographer, released under the terms of the Creative Commons CC0 1.0 Universal Public Domain Dedication.

By commenting, you agree to follow the blog's Code of Conduct and that your comment is released under the same license as the rest of the blog. Or do you not like comments sections? Continue the conversation in the #entropy-arbitrage chatroom on Matrix…

Get monthly ^* updates on Entropy Arbitrage posts, additional reading of interest, thoughts that are too short/personal/trivial for a full post, and previews of upcoming projects, delivered right to your inbox. I won’t share your information or use it for anything else. But you might get an occasional discount on upcoming services.
Or…	Mailchimp 🐒 seems less trustworthy every month, so you might prefer to head to my Buy Me a Coffee ☕ page and follow me there, which will get you the newsletter three days after Mailchimp, for now. Members receive previews, if you feel so inclined.
Email Address
First Name
Last Name
Email Format	html text


* Each issue of the newsletter is released on the Saturday of the Sunday-to-Saturday week including the last day of the month.
Can’t decide? You can read previous issues to see what you’ll get.