Page MenuHomePhabricator

Wikistats Bug: all but 2018 data missing?
Closed, ResolvedPublic5 Story Points

Description

  • Go to Wikistats 2.0. Select English or French Wikipedia.
  • Select editors
  • Ask to see data for two years
  • Expected result: see two years of data
  • Actual results: looks like I'm seeing only results for 2018

https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/editors

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 23 2018, 7:41 PM
Nuria added a subscriber: Nuria.Apr 23 2018, 7:59 PM

working on this.

Nuria added a comment.Apr 23 2018, 8:34 PM

Disabling editing metrics.

Nuria assigned this task to JAllemandou.

Rerunning indexing for 2018-02 snapshot [0033301-180330093100664-oozie-oozi-W]

I re-run indexation for 2018-02 snapshot and now segment sizes in druid are what i would expect, about 2 G per segment.

sudo -u hdfs oozie job --oozie $OOZIE_URL -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/2018* | tail -n 1 | awk '{print $NF}
') -Dqueue_name=production -Doozie_launcher_queue_name=production -Dstart_time=2018-02-01T00:00Z -Dstop_time=2018-02-02T00:00Z -config /srv/dep
loyment/analytics/refinery/oozie/mediawiki/history/reduced/coordinator.properties -run

Note this indexation did not changed the 2018-03 data, we need to take a second look at that snapshot and see if it is correct cc @Milimetric

We also need totake a closer look to what happened here.

Nuria added a comment.Apr 24 2018, 5:38 AM

Enabled editing metrics and deployed: https://gerrit.wikimedia.org/r/#/c/428559/

Nuria triaged this task as Unbreak Now! priority.Apr 24 2018, 5:39 AM
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptApr 24 2018, 5:39 AM
Nuria updated the task description. (Show Details)
Nuria edited projects, added Analytics-Kanban; removed cloud-services-team (Kanban).

@Nuria actions fixed the problem for data up to 2018-02. I restarted a job ending in 2018-03 as the problem is not related to snapshots but to wrong indexation while testing. Will follow up later today when back from day off.

Job finished, data is up to date. Thanks @Nuria and @Milimetric for having spotted the problem and quick fix it !

Milimetric closed this task as Resolved.Apr 24 2018, 12:52 PM

Indeed, confirmed all looks good, I'll put this in code review so we can remember to talk about what happened.

Milimetric reopened this task as Open.Apr 24 2018, 12:52 PM

oops, closed by accident.

Nuria removed JAllemandou as the assignee of this task.Apr 24 2018, 3:00 PM
Nuria set the point value for this task to 5.
Nuria moved this task from In Code Review to Done on the Analytics-Kanban board.
jmatazzoni added a comment.EditedApr 24 2018, 3:36 PM

@JAllemandou I see the Editor data now. Thanks. But when I split by editor type, Anonymous users come in as zero. See screenshot. So it looks like anon editor data is still missing? Or am I doing something wrong—or is the fix just not on production yet?

Milimetric moved this task from Incoming to Wikistats Beta on the Analytics board.Apr 24 2018, 4:51 PM
Nuria added a comment.Apr 24 2018, 6:40 PM

Reindexing again 2018-02 data, looking into 2018-03 issue with anonymous editors

oozie job -info 0034476-180330093100664-oozie-oozi-W

Reindexing done, need to delete last month on snapshot.

JAllemandou claimed this task.

Change 428922 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Correct mediawiki-history job bugs

https://gerrit.wikimedia.org/r/428922

Change 428922 merged by jenkins-bot:
[analytics/refinery/source@master] Correct mediawiki-history job bugs

https://gerrit.wikimedia.org/r/428922

Change 428977 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Add unittest for a Mediawiki-History function already fixed

https://gerrit.wikimedia.org/r/428977

Nuria added a comment.Apr 25 2018, 7:10 PM

Data up to 2018-02 is now 2018-02 snapshot, removed last segment 2018-03

Change 428977 merged by jenkins-bot:
[analytics/refinery/source@master] Add unittest for a Mediawiki-History function already fixed

https://gerrit.wikimedia.org/r/428977

Nuria closed this task as Resolved.May 8 2018, 10:43 PM
Vvjjkkii renamed this task from Wikistats Bug: all but 2018 data missing? to meeaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii lowered the priority of this task from Unbreak Now! to High.
Vvjjkkii removed JAllemandou as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed subscribers: gerritbot, Aklapper.
AfroThundr3007730 renamed this task from meeaaaaaaa to Wikistats Bug: all but 2018 data missing?.Jul 1 2018, 6:24 AM
AfroThundr3007730 closed this task as Resolved.
AfroThundr3007730 raised the priority of this task from High to Unbreak Now!.
AfroThundr3007730 assigned this task to JAllemandou.
AfroThundr3007730 updated the task description. (Show Details)
AfroThundr3007730 set the point value for this task to 5.
AfroThundr3007730 edited subscribers, added: GerritBot, Aklapper; removed: JAllemandou.