Experts insights

Create a Twitter bot in Python

créer un bot Twitter en python

As Twitter has become part of the digital and media landscape, we’ve discovered just how time consuming and even restrictive this social media platform can be. It’s tempting to turn to some form of automation, and we would gladly send a bot to meander through Twitter on our behalf. That’s how apps like threadreaderapp and blockpartyapp have become available. These solutions, as you might imagine would be necessary in order to pull the best from Twitter, all use the platform’s API. In this article, then, we’ll be going over, in detail, all the steps required for creating a Twitter bot in Python.

To best understand the way that this Twitter bot works, we’ll go step by step. We’ll first look at making requests to the Twitter API in a timely fashion in order to cull information from a particular tweet. Then we’ll look at collecting this type of information on multiple tweets using a filter. And then we’ll automate actions to perform on these tweets, in particlar, sending them in an email or replying to them.

Some pre-requisites

In order to make a program work automatically using the Twitter API, there are a few basic requirements:

  • A Twitter account, which will provide access to your developer account. You can sign up to use Twitter’s API on this page and you’ll receive a Bearer Token. Keep this token safe and sound, you’l need it to make requests later
  • Familiarity with API-specific concepts, particularly the idea of an endpoint
  • Some level of competency with Python, particularly using libraries, which will be essential for us

First contact with the Twitter API

The goal of this first contact with the API will be to recover information about a particular tweet without opening Twitter. For that, we will indicate the tweet ID of the tweet (if you’re wondering how to get this, click on a tweet in your web browser and look at the part after status in the URL; the numerical value after that is the tweet ID), log in, and request different information from the API, with each corresponding to a particular “endpoint.” The list of references to this “endpoints” is available in the API documentation.

By default, the program gives the author and the text of the tweet. We’re going to use the library requests to make HTTP calls and pprint to display (“pretty-print” as it explains in the documentation) the resulting json file:

import os
from pprint import pprint
import requests

params = {'ids': ['1588915242490560512']}
# The bearer token is stored in an environmental variable BEARER_TOKEN
headers = {'Authorization': f"Bearer {os.getenv('BEARER_TOKEN')}"}
r = requests.get('https://api.twitter.com/2/tweets', params=params, headers=headers)
pprint(r.json())
# example of the response we'll get
{
    'data': [
        {
            'edit_history_tweet_ids': ['1588915242490560512'],
            'id': '1588915242490560512',
            'text': 'in what is becoming a tradition in odd point release...'
        }
    ]
}

You’ll probably notice that the data found in the data section for each object represents a single tweet. Here we’ve selected just one. To display more information about this tweet, in addition to its author and the text of the tweet, we’ll need to use the additional request field tweet.fields. This will be the same as with other types of objects, if you want to get more information about a user, you’ll need to configure the field user.fields.

Let’s return to our example and say we’ll also display the creation date of the tweet ID of the author.

import os
from pprint import pprint

import requests

params = {
    'ids': '1588915242490560512',
    # on ajoute les champs supplémentaires qu'on veut
    'tweet.fields': 'created_at,author_id'
}
headers = {'Authorization': f"Bearer {os.getenv('BEARER_TOKEN')}"}
r = requests.get('https://api.twitter.com/2/tweets', params=params, headers=headers)
pprint(r.json(), indent=4)

# response example
{
    'data': [
        {
            'author_id': '1037022474762768384',
            'created_at': '2022-11-05T15:24:49.000Z',
            'edit_history_tweet_ids': ['1588915242490560512'],
            'id': '1588915242490560512',
            'text': 'in what is becoming a tradition in odd point release...',
        }
    ]
}

Follow a topic with our Twitter bot in Python

Now let’s really get into it—let’s look at how to observe specific events that take place on Twitter and execute some action in response. Let’s first search for recent tweets that meet our search criteria.

Look up recent tweets

To collect recent tweets that meet particular criteria, we’ll simultaneously carry out two operations:

1.Apply a search filter

This filter accepts a certain number of characters (512 if you’re just beginning and have access to the “Essential” type) and enables you to combine it with certain Boolean keywords and criteria like the “from” account (which sent the tweet), the “to” account (which the Tweet was sent to, if applicable), information about the content, images, links, etc. (“has”) or other contextual elements about the nature of the tweet or its author (“is:retweet,” “is:verified,” etc.). The complete list of filters lets you see the possibilities of this tool

For our example, we propose to search for Gandi promos on domain names. We would use the following request:

from:gandibar #promo has:links -is:retweet. 

From this you can intuitively understand what the request is doing, but to summarize, we’re searching for:

  • Tweets coming from the user gandi from:gandibar
  • AND have the hashtag #promo
  • AND have a link (so we know where to go to do our shopping) has:links
  • AND that is not a retweet -is:retweet. This last filter is very important and even recommended in the official documentation because many tweets are often retweets, which would add unhelpful noise in our results

2. Get historical data

Here, the route that interests us enables us to search for tweets in the past 7 days. We can continue to work with the requests library, but it will be difficult to analyze the results, manage the number of pages, etc., which is why we will use a specialized library for the Twitter API named tweepy. We’ll use the class tweepy.Client to manipulate Twitter’s API v2 endpoints. To better understand the relationship between this class’s methods and API endpoints, please see this page. In our case, the method that we’re interested in is search_recent_tweets.

import tweepy

client = tweepy.Client(os.getenv('BEARER_TOKEN'))

response = client.search_recent_tweets(
    'from:gandi_net #promo has:links -is:retweet',
    max_results=100,
    tweet_fields=['created_at']
)

if response.data is not None:
    for tweet in response.data:
        print(tweet.id, tweet.text)

# pagination information is listed here
print(response.meta)

A few points on this:

  • It’s easy to use. Here, we specified the filter, which is the only mandatory argument. Then, we specified the number of elements that we anted to display and the additional fields. In general, the method arguments are the same that the API route can take. To get to know all possible arguments to use with this method, you can refer to the documentation
  • The object response returned is a namedtuple that contains four keys: data, includes, meta, and errors. This is the same information as when you make a request yourself with requests, for example
  • In response.data, you have a list of requested objets. In general, if you know the list of fields of the object in question, then you can use them as properties like in the example above. However, these are also all documented in the tweepy documentation

We’ll also mention here, without getting too into it, that another way to use the route of recent tweets is polling, which sort of a real time mode. The idea is to search all tweets corresponding to a filter, but not just in the past, but starting from a particular tweet. This requires having a particular (recent) tweet that already meets our criteria and to iterate from there.

Getting a filtered stream

One possible method for searching tweets in real time is with a filtered stream. The documentation for this can be found here. It includes 3 endpoints to exploit:

Unlike the previous endpoint, you  can add several filter rules, the number dependin on the type of access that you have.

  • With Essential access, you can have 5 rules with 512 characters each (this is probably your situation if you just started using the Twitter API)
  • With Elevated access, you can have 25 rules with 512 characters each
  • With Academic access, you can have 1000 rules with 1024 characters each

The idea here is simple, you add on or more filter rules and search recent tweets related to these filters. If a single rule matches for that tweet, it will be returned. To manipulate these endpoints with tweepy, we’ll use the class tweepy.StreamingClient.

To add, read, and remove filters, you can write:

import os

import tweepy

client = tweepy.StreamingClient(os.getenv('BEARER_TOKEN'))
rules = [
    # we're adding our rules here
    tweepy.StreamRule('from:gandi_net #promo has:links -is:retweet', tag='gandi promo')
    tweepy.StreamRule('from:gandi_net #certificat -is:retweet', tag='gandi certificat')
]
client.add_rules(rules)

# we're displaying our rules
response = client.get_rules()
for rule in response.data:
    print(rule)

# We're removing one or more rules by passing their IDs
client.delete_rules(['158939726852798054'])

Note: You can move the argument dry_run=True to the add_rules and delete_rules methods to test the request and verify that its correct without really executing it on the server side.

Note: When defining multiple rules, it’s recommended to associate a tag to remind you what the filter does. A filter can be quite complex to read 🙂

Once we have created our rules, the only thing left to do is call the endpoint that makes it possible to list tweets. Normally the method to use is listen except if you call it in state, you won’t see anything since by default StreamingClient doesn’t do anything with the tweets it collects. You need to import from the class and preload some of these methods.

import os

import tweepy


class IDPrinter(tweepy.StreamingClient):

    # you can get a complete object response
    def on_response(self, response):
        # il a la structure StreamResponse(tweet, includes, errors, matching_rules)
        # pour chaque tweet, on a donc l'ensemble des règles qui ont matché
        print(response)

    # or you can just get the tweet
    def on_tweet(self, tweet):
        print(tweet.id, tweet.text)

    def on_errors(self, errors):
        print(errors)

    def on_connection_error(self):
        # must be done in case of network connection errors
        self.disconnect()

    def on_request_error(self, status_code):
        # must be done if the status code of the HTTP response is >= 400
        pass


printer = IDPrinter("Bearer Token here")
printer.filter()

Notes:

  • By preloading on_response, we added a StreamingResponse object to the settings which contains the tweet and all the rules that matched that tweet.
  • We can also add the argument threaded=True to filter to avoid blocking the program and recover the created thread in order to close it later
  • There’s also an asynchronous version of this class of streaming AsyncStreamingClient which makes it possible to manipulate coroutines instead of threads. We won’t talk about that here since it’s a more advanced topic.
  • There is also another StreamingClient class method, sample, which is related to this API route. It doesn’t take filters into account and returns 1% of new tweets registered on Twitter’s platform. This will help you, using natural analysis algorithms for example, detect trends on Twitter like those that show up on the right of the web interface. Be careful, however, when using this route, since it can quickly use up your limit of tweets per month. You need a clear objective before using it and should only use it in a certain, limited timeframe.

Automate tasks with your Twitter bot

Now that we know how to talk to the API and make specific requests of it, it’s time to put our bot to work and have it do specific, concrete tasks.

Send weekly search results by email

First, let’s take a look at how to send an email with an attachment of different tweets resulting from a search for promos every Monday morning at 9:00 AM, with results from the previous week.

We’ll use the following python libraries:

Here is the resulting code that we will comment below:

import os

import emails
import tempfile
import json
from pathlib import Path

import tweepy
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger


def send_mail_with_tweets():
    # you should define the following environmental variables:
    # SMTP_HOST: the smtp server from which you'll send the email
    # SMTP_PORT: the port used by this SMTP server
    # SMTP_TLS: "true" if you want to use TLS (SSL), "false" if not
    # SMTP_USER: optional, the user who will send the email
    # SMTP_PASSWORD: optional, the user's password
    smtp = {
        'host': os.getenv('SMTP_HOST'),
        'port': os.getenv('SMTP_PORT'),
    }
    if os.getenv('SMTP_TLS', '').lower() == 'true':
        smtp.update({
            'ssl': True,
            'user': os.getenv('SMTP_USER'),
            'password': os.getenv('SMTP_PASSWORD')
        })
    client = tweepy.Client(os.getenv('BEARER_TOKEN'))

    with tempfile.TemporaryDirectory() as tmp_dir:
        # you may want to replace it with the ".jl" extension
        # with "txt" to avoid getting blocked by certain email providers
        path = Path(tmp_dir) / 'tweets.jl'
        with path.open('w') as f:
            for tweet in tweepy.Paginator(
                    client.search_recent_tweets,
                    'from:gandi_net #promo has:links -is:retweet',
                    max_results=100
            ).flatten():
                data = {
                    'id': tweet.id,
                    'text': tweet.text,
                    'twitter_url': f'https://twitter/gandi_net/status/{tweet.id}'
                }
                f.write(f'{json.dumps(data)}\n')

        message = emails.Message(
            subject='Promotions gandi',
            text='En pièce jointe vous trouverez un fichier avec tous les tweets liés à des promotions.',
            mail_from=('Twitter Bot', 'twitter@bot.com')
        )
        message.attach(filename=path.name, content_disposition='inline', data=open(path, 'rb'))
        response = message.send(to='foo@bar.com', smtp=smtp)

        # log the fact that an email could not be sent
        if response.status_code != 250:
            print("l'email n' a pas pu être envoyé")

# For a more industrial example, you'll need to configure a jobstore to keep
# the job information in the db, that way if a server crashes and needs
# to restart, your scheduler will pick up where it left off
scheduler = BlockingScheduler(timezone='utc')

# we'll use the crontab notation to carryout our action
# we'll schedule it for every Monday at 9:00 AM
scheduler.add_job(send_mail_with_tweets, CronTrigger.from_crontab('0 9 * * 1'))

# the scheduler stops here and its call is blocked
scheduler.start()

Notes:

  • There are a few environmental variables to create for configuring sending emails. The login and password are only necessary for using TLS, which will be the case most of the time
  • To test sending emails locally, you can use the service mailhog
  • On lines 52 to 64, to store the tweets in a file, we used the format json lines. It’s practical for saving lots of data without eating up all your memory
  • On lines 66 to 76, we defined the information for the email (you can change this as needed), the attachment,and we sent it. I displayed a message in the console to the effect that the email didn’t send properly, but you can set up logging instead.  On the other hand, to try to debug the error, you’ll need to configure logging to display messages from the emails library from the debug level
  • On line 83, we define the scheduler, we use a version of Blocking, which has been adapted to our case, but according to the style of program that you write, you’ll be able to use Threading or Asyncio. We’ll let you look through the documentation for more details
  • On line 87, we add our job using crontab as trigger. If you want to verify the syntax of your crontab, you can use this website
  • On line 90, our scheduler will run in daemon mode

Create a tweet with our Twitter bot

In order to do the second application we promised at the beginning of this article, we’ll need to learn how to create a tweet using the API. Unfortunately, we can no onger use the bearer token as with the previous routes. We need an access token with specific rights. If we look at the documentation of the route in question, we will need the permissions tweet:read, tweet:write, and users:read. The workflow for getting a token is defined here. I will however show you how to proceed with tweepy. You’ll need an API, or more precisely, a forwarding URL to use the technical term. This URL will be used to send a code, required to obtain the access token. In this URL, two pieces of information will be sent as request parameters:

  • state: which is a random security value. For more information on the technical oauth2 jargon, see this page
  • code: value defined by Twitter’s API

Here’s an example of a fastapi that you can implement for the forwarding URL.

import logging
from fastapi import FastAPI, Response

app = FastAPI()
logger = logging.getLogger(__name__)


@app.get('/')
def get_code(code: str, state: str):
    logger.info('code: %s, state: %s', code, state)
    return Response(status_code=200)

Once this is done, you’ll need to go back to the developer portal, to the level of your project, and click on the section “User authentication settings.” You can skip the first “App permissions” part that has to do with oauth1 which we do not use. On the second part “Type of App,” be sure to choose “Web app, Automated App, or bot.” In the part “App info,” enter the URL that corresponds to your server. You should also enter a personal URL (you can put whatever you want here since it’s linked to you) and whatever other optional information you want to include. Once this is done, you’ll get a client_id and a client_secret, which are necessary for oauth2 authentication. Keep these in a safe place. We will use them in the upcoming  CLIENT_ID and CLIENT_SECRET environment scripts which will need to have these values. As for the workflow for getting an access token with tweepy, it works like this:

1. Use the class tweepy.OAuth2UserHandler, entering all the necessary information

import os
import tweepy

oauth2_user_handler = tweepy.OAuth2UserHandler(
    client_id=os.getenv('CLIENT_ID'),
    client_secret=os.getenv('CLIENT_SECRET'),
    redirect_uri='votre url de redirection ici',
    # permissions go here
    scope=['tweet.read', 'tweet.write', 'users.read', 'offline.access'],
)

# this will generate an authorization URL which we'll need to launch in our web browser
print(oauth2_user_handler.get_authorization_url())

You’ll notice for the argument scope that we added the permission offline.access. This allows us to refresh the access token without having to intervene manually (we’ll explain later). Once you have the authorization URL, copy it in the URL bar of your favorite browser and go to it (by hitting Enter, for example). You will then be asked to authorize your bot to have access to your account. Once this is done, you will be redirected to the URL that you defined as the forwarding URL.

2. Copy the forwarding URL in your browser that should contain the code that was attributed to you, and use an OAuth2UserHandler method to retrieve the access token.

access_token = oauth2_user_handler.fetch_token(
    'url de redirection ici'
)

3. Once that’s done, you can re-use the tweepy client as usual, except that instead of the bearer token, you’ll use the access token.

...
client = tweepy.Client('access token')

And then you can create a tweet like this:

client.create_tweet(text='hello from bot', user_auth=False)

The user_auth=False part is important, otherwise tweepy will try to authenticate with oauth1 and the request will fail. It’s a little strange as an API, but this is the legacy of the old API. It’s important to know that an access token is valid for two hours. And even with that, some of you might wonder if you’ll have to manually repeat the operation in the browser … That’s not very automated, is it? Remember the permission offline.access that we used at the beginning to get an authorization URL? This will help us now to refresh our token before it expires. In fact, at the same time that we retrieved our access token, our permissions also allowed us to retrieve a refresh token. This token is what’s use to refresh and it’s kept internally by tweepy. Here’s how you would proceed to refresh the access token with tweepy:

import tweepy

token_info = oauth2_user_handler.refresh_token(
    'https://api.twitter.com/2/oauth2/token'
)

client = tweepy.Client(token['access_token'])

We need to send the URL to refresh a token. If you’re wondering where we found this URL, it’s available on this page under the section “Step 5…” If you have a refresh_token that you obtained other than through tweepy, you can send it using the argument refresh_token. In token_info, we have a set of information, including the access token and the new refresh token (here again, saved by tweepy for another use). We can, then, insert a new client with a new access token without having to intervene manually. For automating all that, we can use apscheduler which we already know so well. An example of code that you could write:

 import os
 import tweepy
 from apscheduler.schedulers.blocking import BlockingScheduler
 scheduler = BlockingScheduler()
 oauth2_user_handler = tweepy.OAuth2UserHandler(
     client_id=os.getenv('CLIENT_ID'),
     client_secret=os.getenv('CLIENT_SECRET'),
     redirect_uri='votre url de redirection ici',
     scope=['tweet.read', 'tweet.write', 'users.read', 'offline.access'],
 )
 def refresh_token():
     token_data = oauth2_user_handler.refresh_token(
         'https://api.twitter.com/2/oauth2/token'
     )
     # save the token where you want
     os.environ['ACCESS_TOKEN'] = token_data['access_token']
 # Consider adding this job the first time you get an access token
 scheduler.add_job(refresh_token, 'interval', hours=1, minutes=55)
 scheduler.start() 

Here, we’re using the trigger interval to refresh the token after an hour and 55 minutes, since a token lasts two hours.

(bonus) Sending a meme as a reply to each new Gandi promo

Now we’re getting to the funny part. We’ll take the last example which appeared in the console and reply with the “sht up and take my money” meme.

import os

import tweepy

client = tweepy.Client(os.getenv('ACCESS_TOKEN'))


class MemePromoAnswer(tweepy.StreamingClient):

    def on_tweet(self, tweet):
        print(tweet.id, tweet.text)
        client.create_tweet(media_ids=['1590404643045216256'], in_reply_to_tweet_id=tweet.id)

    def on_errors(self, errors):
        print(errors)

    def on_connection_error(self):
        self.disconnect()


meme = MemePromoAnswer(os.getenv('BEARER_TOKEN'))
meme.filter()

Notes:

  • You’ll need to have kept the Gandi promo filter for this to work
  • When using create_tweets, we send a table to a single value which corresponds to the ID of the GIF “take my money.” If you’re wondering how we got it, we created a tweet with this meme and got the tweet info. Then, since we’re replying to a tweet, we’ll use the argument in_reply_to_tweet_id and we specify the tweet in question

Mission accomplished. It wsa long, but admit it—it was worth it.