452 hexagons tile this area this zoom level. However, notice that some of them are fully over water, so not every area will have data.

The ratio of the area of a regular hexagon to a circumscribed circle is 0.827. Scaling the area of each H3 cell by this factor can be used to find the radius of a circle that approximately circumscribes that cell. (This is approximate because H3 cells are not regular, as they must tile a sphere. In fact, some of them are pentagons. However, this approximation works for these purposes.

With this method, search queries can be automatically generated for each cell:

Next, these locations and radii can be used to download tweets using TWINT.

Now, tweets for each hexagon will be grouped by hour and counted. Data per day for each hexagon, and data per hexagon (center coordinates, radius, baseline tweet volume) will be constructed and saved to a CSV.

A brief check on the distribution of the number of Twitter users

It is important to take a look at how many individual users are sampled in the dataset. If each hex cell contained tweets from only a handful of users, the data would be much less reliable than if it contained a wider sample.

Compensating for missing data

Some tweets are only included if the query is run within seven days of the tweet (those that match a geocoded search because of the user's profile location.) The following code computes how many tweets might be missing from older data in the TWINT aggregates. Specifically, it finds the ratio between users tweeting with a profile location OR places to users tweeting with places, for each grid cell, based on data available from multiple scraping sessions, separated by a certain amount of time.

As can be seen above, this ratio is highly variable. While the highest variance cells are likely to be those with very few tweets/users, which shouldn't be visualized anyway (and aren't in Bellingcat's visualization), this is still cause for concern. Consequentially tweet volume from more than seven days ago cannot be directly compared with tweet volume from the most recent seven days, even with this attempted compensation method. However, the relative frequency of specific terms within the dataset (for example, "oxygen") can be, though due to the smaller sample size, variance will be larger for older tweets.