Page MenuHomePhabricator

Raw access to apache/squid logs would be nice
Closed, ResolvedPublic

Description

Author: philip

Description:
Raw access to apache/squid http access log files would be nice. This would allow
individual enthusiasts (including me) to run various statistics on those logs files
for the purpose of boosting communal participation.

For example, I am interested in knowing which articles in English are most accessed
from Bulgaria and which of them are missing so that I can spend some effort improving
them. To the best of my understanding, this information is not available on any of
the reports automatically generated by Wikipedia.

I also believe that many other legitimate uses of the raw log files would be found,
including academical ones, which could regard Wikipedia as a mini-Intenet of sorts,
for which both the full contents (the SQL article dump), the change history, and the
access logs are known. Non of this is available for the real Internet, which may make
Wikipedia a valuable playground for the evaluation of PageRank-like relevancy metrics
and such.

Finally, I believe that downloading compressed logs should not place undue burden on
Wikipedia's servers.

Thank you in advance for considering this suggestion and keep up the good work.


Version: unspecified
Severity: enhancement
OS: Linux
Platform: PC

Details

Reference
bz3029

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:46 PM
bzimport set Reference to bz3029.
bzimport added a subscriber: Unknown Object (MLST).

philip wrote:

*** This bug has been marked as a duplicate of 3028 ***

[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]