October 31, 2007

Long tail of Twitter

The Long Tail is a concept to describe certain business and economic models such as Amazon.com or Netflix. A long tail distribution is characterized by a short head and a heavy tail. In many cases the long tail can make up the majority of the distribution. Businesses such as Amazon.com or Netflix can generate a significant part of their income by selling a greater volume of otherwise hard to find items at small volumes than of popular items at large volumes. The long tail was introduced by Chris Anderson who also wrote a book on this subject : The Long Tail: Why the Future of Business is Selling Less of More.

I wanted to know if the long tail distribution could also be observed in the Twitter world. I started by looking around for reliable quantitative information on Twitter users. There are several sites with top lists of Twitter users. I discovered that these top lists are not 100% reliable (see this blogpost). Furthermore they only cover a top 100 (Twitdir) or a top 150 (Twittown). So I created my own top 500 list.

A long tail distribution follows a powerlaw (or Pareto distribution), which can easily be discovered by looking at a log-log plot of the distribution (both axes on a logarithmic scale) where the actual data points should show up as a straight line in case of a powerlaw. If you click on the graph below you will get a better view at the plots of the distribution of the top 500 of Twitter users by number of following (the number of Twitter users a Twitter user is following), by number of followers (the number of other Twitter users following a Twitter account) and by number of updates (number of Twitter messages published).

The "normal" plots of the distributions show a very short head. The distributions on the log-log plots resemble a straight line, suggesting a long tail.

These graphs are only based on the distribution of a top 500 of Twitter users. There are two problems. In the first place I am not absolutely sure that the lists I compiled are the true lists of top 500 users for the three criteria. It is very likely that I missed several Twitter accounts. Furthermore as there are currently over 500,000 Twitter users, a picture of a top 500 does not tell anything about the rest of the distribution.

Instead of focussing on the complete Twitter community, I zoomed in on two subcommunities, the Twitter community of two countries : Brazil in South America and Belgium in Europe. Looking at the location information in the Twitter profiles I was able to identify 1700 Twitter users in Brazil and 1000 Twitter users in Belgium. European users were among the first to start using Twitter. The popularity of Twitter in South America started later.



For both countries the observed distributions show similar patterns. A very short head followed by a long tail. The distributions resemble slightly a powerlaw. The correlation coefficients (R-square values) are between 0,76 (Brazil - number of updates) and 0,86 (Brazil - number of followers). The correlation coefficients for the top 500 distributions are even higher : between 0,95 and 0,98 - suggesting a very strong powerlaw relationship.

The long tail concept was introduced to describe how business could sell less to more. As the founders of Twitter haven't yet decided on their business model it is not clear how money can be made on Twitter. How money can made of the long tail of Twitter users remains even a bigger question.


Kevin Makice said...

I appreciate the work you do on this site, especially in the absence of "official" figures.

I wonder if the long tail for the full 500,000 would be at all informative. It seems to me that the power of Twitter is in being able to find a personal sweet spot of use. It would be much more interesting to be able to separate this data into Twitter strategies (i.e. broadcast = followers >> following, lurking = few/no updates w/ selected following, etc) and see if the distributions are still there.

www.lugo-3d.com said...

It can't succeed in fact, that is what I think.