Page view statistics for Wikimedia projects

Pagecount files per year

What are the page view statistics files and what do they contain?

Each request of a page, whether for editing or reading, whether a "special page" such as a log of actions generated on the fly, or an article from Wikipedia or one of the other projects, reaches one of our squid caching hosts and the request is sent via udp to a filter which tosses requests from our internal hosts, as well as requests for wikis that aren't among our general projects. This filter writes out the project name, the size of the page requested, and the title of the page requested.

Here are a few sample lines from one file:


      fr.b Special:Recherche/Achille_Baraguey_d%5C%27Hilliers 1 624
      fr.b Special:Recherche/Acteurs_et_actrices_N 1 739
      fr.b Special:Recherche/Agrippa_d/%27Aubign%C3%A9 1 743
      fr.b Special:Recherche/All_Mixed_Up 1 730
      fr.b Special:Recherche/Andr%C3%A9_Gazut.html 1 737
    

In the above, the first column "fr.b" is the project name. The following abbreviations are used:

Projects without a period and a following character are wikipedia projects.

The second column is the title of the page retrieved, the third column is the number of requests, and the fourth column is the size of the content returned.

These are hourly statistics, so in the line

      en Main_Page 242332 4737756101
    
we see that the main page of the English language Wikipedia was requested over 240 thousand times during the specific hour. These are not unique visits.

In some directories you will see files which have names starting with "projectcount". These are total views per hour per project, generated by summing up the entries in the pagecount files. The first entry in a line is the project name, the second is the number of non-unique views, and the third is the total number of bytes transferred.

Who came up with this stuff anyways? (Alternatively, who can I nag about it?)

Domas Mituzas, a long-time volunteer db admin for WMF, started generating these statistics in 2007. Some of the older files (from 2010 through at least mid-2011) are also available at the Internet Archive thanks to Federico Leva.


Return to the main index of public data sets provided on this server.

Return to the main index of project dumps in XML format.

Return to the main index of other content