Page MenuHomePhabricator

Provide metrics for WMF quarterly report on January-March 2015
Closed, ResolvedPublic

Description

Provide the below numbers for the quarterly report (publication in final form due May 15, cf.T94636), in the same format as for last quarter's scorecard). Compare the analogous ticket from last quarter: T89024

I am creating separate blocking tasks for each metric as I go along, and will link them in the below list. Per earlier discussion with various helpful people from the Analytics team, much of this should now be available on a self-service basis, in which case I will assign the corresponding ticket to myself.

In particular, the following ones should be available directly from http://reportcard.wmflabs.org/ but are currently blocked on the outstanding updates there:

  • Total active editors in March 2015
  • New editors in March 2015
  • Wikipedia Edits for March 2015
  • New articles during January-March 2015 (i.e. article increase from Dec 31 to March 31; like last time - T89283 - I might use Daniel's corresponding numbers instead, as recorded on Meta)

I created a single ticket for these four first, assuming that they all depend on the same missing March dump update: T97379. If it turns out we need to provide substitutes in each case like last quarter (where the dump data was still missing at the time of the report), I will file separate tasks as last time.

Also not yet updated on http://reportcard.wmflabs.org/ :

  • Unique visitors: need the legacy comScore data for March 2015, and find out whether we can already include the new number, or mark it again as "to come". (resolved with Dario's help; will use the legacy data once more and obtained the March number; the report card still needs to be updated though)

Metrics from other sources:

  • New user signups in Jan-Mar 2015 (cf. last time; may need to add a note about possible SUL finalization anomalies about this one. Via Quarry? looking into this with @kevinator)
  • Total pageviews (new definition) for Jan-Mar 2015 (Tilman, using Pentaho)
  • Median read/write latency (from Ori): T97378
  • Uptime for enwiki main page during Jan-Mar 2015, from Nimsoft reports(?)
  • Fundraising metrics (from FR team)

Now would also be a good time to resolve the (mounting) discrepancies in historical TAE numbers: T87738

The scorecard will still be marked as "beta" in the quarterly report, since this selection of metrics may be revisited in coming quarters.

Event Timeline

Tbayer raised the priority of this task from to Needs Triage.
Tbayer updated the task description. (Show Details)
Tbayer subscribed.
Tbayer set Security to None.
Tbayer added a subscriber: kevinator.

Obtained all the data (except the save latency from T97378) and calculated quarterly averages and trends:

  • Signups from updated data
  • New editors from report card data
  • Total active editors from report card data
  • Page views from Cube v0.5 data on Pentaho as of May 12, 2015 (no spiders, no automata). Note: Per Oliver, the definition differs from the one used in the Q2 report scorecard, e.g. by including some previously excluded requests (such as Wikidata requests)
  • comScore uniques (monthly average in Q3)
  • quarterly uptimes from monthly Nimsoft reports ( = #errors / #checks):
    • for Q3 2014/15: 1 - (13 / (46 466 + 41 741 + 45 295)) = 0.99990...
    • for Q3 2013/14: 1 - (4 / (47 309 + 42 507 + 46 659)) =0.99997...

Note: None of the above was normalized by number of days per quarter, which needs to be taken into account for some of the Q2 vs. Q3 comparisons (92 vs. 89 days). As always, year-over-year comparisons are more meaningful anyway because of seasonality.

For the number of new (Wikipedia) articles, I continued to use the approach from last quarter, relying on Emausbot (see also T97476):
35025605 on 00:00, 1 May 2015.
34127177 on 00:00, 1 January 2015 (UTC)
33442581 on 00:01, 1 October 2014 (UTC)
31176373 on 00:00, 1 April 2014 (UTC)
30508150 on 09:51, 1 January 2014 (UTC)

That's (35 025 605 - 34 127 177) / 90 ~9982 articles per day in Q3,
and about 34.2% more than in Q2: ((35 025 605 - 34 127 177) / 90) / ((34 127 177 - 33 442 581) / 92)) - 1
and a 34.4% year-over-year rise: (((35 025 605 - 34 127 177) / 90) / ((31 176 373 - 30 508 150) / 90)) - 1

I noticed that http://reportcard.wmflabs.org/graphs/articles currently has a significantly higher number than the above two sources (Emausbot and Daniel's Wikistats).

PS: the last calculation on the number of new articles might be affected by recent corrections in page counts (i.e some of the new articles might rather be "newly counted" articles), but according to this discussion the impact of these corrections on the overall number of Wikipedia articles was small.

Obtained all numbers in time and published them on May 15 as part of the report (slide 3).
Thanks again, everyone (in particular @ezachte,@ArielGlenn,@kevinator,@Halfak,@Milimetric,@DarTar,@coren,@ori ,@mark, Tony and Lisa) !

Tbayer claimed this task.

As @Jdforrester-WMF and I independently discovered recently, the article numbers calculate above are wrong because I, for inexplicable reasons, used May 1 0:00 instead of April 1 0:00 as the end of the quarter :(
Here is the corrected calculation; I've updated the scorecard in the published versions of the report:

3484322 on 00:00, 1 April 2015 (UTC)
34127177 on 00:00, 1 January 2015 (UTC)
33442581 on 00:01, 1 October 2014 (UTC)
31176373 on 00:00, 1 April 2014 (UTC)
30508150 on 09:51, 1 January 2014 (UTC)

That's (34 843 225 - 34 127 177) / 90 ~ 7956 articles per day in Q3,
and about 6.1% more than in Q2: ((34 843 225 - 34 127 177) / 90) / ((34 127 177 - 33 442 581) / 92)) - 1
and a 7.2% year-over-year rise: (((34 843 225 - 34 127 177) / 90) / ((31 176 373 - 30 508 150) / 90)) - 1 .