June 21, 2007

Twitter - number of tweets per day

Some people would like to know the answer to questions such as "how many users are there in the Twitter community", "how many Twitter messages have been published since the start", "how many Twitter messages are published every day" … Unfortunately, I have not yet found a source with correct and reliable answers to all these questions.

Two weeks ago I discovered the stats page of Twittertroll. Twittertroll describes itself as "the coolest real-time Twitter search engine!". This week I had a look again at their stats page and I was surprized to see the evolution of the number of Twitter messages (tweets) indexed by Twittertroll by day. This number was going doing, pretty hard.


As the Twittertroll stats page only shows the number of tweets for the last seven days, I looked for sources to find number of tweets for the prevous days. Luckely I was able to find a previous version of the Twittertroll stats page in the Google cache and also a printscreen in a message on the Twittown blog. The graph below shows the number of Twitter messages indexed by Twittertroll by day since June 9, 2007.


My question was wether the TwitterTroll data reflected the actual situation. Was there indeed a sharp decline in number of Twitter messages ? Was Twitter really in trouble ?

There is another source of information regarding number of Twitter messages published in the Twitter public timeline. Twitterment, the search engine from eBiquity, not only shows on its stats page graphs of the total number of tweets indexed by Twitterment, but it also publishes the total number of indexed tweets. Comparing the number of indexed tweets for different dates allows to get an idea about the number of tweets per day. I found again several sources where a copy of the Twitterment stats page was kept :
Twittown post - April 13, 2007 - 368103
Google cache - June 4, 2007 - 1633713
Twitterfacts post - June 8, 2007 - 1751613
Live.com cache - June 18, 2007 - 1988173
Own observation - June 19, 2007 around 12:00 PM (CET) - 2014643
Own observation - June 20, 2007 around 12:00 PM (CET) - 2042493


Making the difference of the number of indexed tweets and dividing this difference by the number of days between the two dates gives the average number of tweets per day, as shown in the graph below.


Conclusion : the number of tweets per day is not in sharp decline as may be suggested by the Twittertroll stats page. It is clear from the graphs above that Twittertroll is only indexing about one third of the tweets indexed by Twitterment (8500 compared to 25000). As there is no other source to verify, it is impossible to tell if the number of tweets indexed by Twitterment is the correct number of tweets in the Twitter public timeline. Perhaps the people from Twitter.inc can show some light on global statistics.

4 comments:

Brad said...

TwitterTroll.com had a sharp decline in indexing posts because of a change in the index script I made to cache the users profile image. I've since learned that caching the profile image is pointless as the image links change so I adjusted the script and am now back on track indexing about 20,000 tweets a day:

http://www.twittertroll.com/stats.asp

sheila said...

I think there is something faulty in trying to measure the number of tweets per day by consuming the public timeline, and I assume that's how these stats and user indexing sites are building their knowledge. At least as far as I can push it, I find that twitter only updates the public timeline every so often (every 4 minutes advertised, every 1.5 minutes in practice?). and twitter only allows you to get the most recent 20 on any given call. so it seems as if no matter how fast you loop to get public timeline tweets via the API, you're only SAMPLING them. I don't think one can know if you're sampling 10% of all public tweets, 50%, or whatever.

zzelp1 said...

i know i dredged this page from the depths of time but looking at the stats today, twitertroll seemed to indicate a very large drop in posts since this blog i wonder if you monitor this blog. if so would you comment?

Mendel said...

i'm working on a project to index all of twitter. you can pull from the public timeline every 60 seconds but you can't get much more than that without people's input. so, a way to get around it is to store tweets and index through all of the usernames, their friends, and their direct messages '@xxxx'. when you get ALL backlogged tweets indexed, you can show historics...which is what i'll hopefully have time to do on my site. Anyway, that's my two cents. the website is http://www.thesocialarchive.com/