Page MenuHomePhabricator

Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}
Closed, ResolvedPublic

Description

Vital signs reports a large spike in page views from 9th Nov to 19th Nov (Although it is continuing)
https://vital-signs.wmflabs.org/#projects=wikidatawiki/metrics=Pageviews

To me this spike appears to be consistent with the spike in usage of the query service at query.wikimedia.org.
The spike can be seen on that dashboard http://discovery.wmflabs.org/wdqs/#wdqs_usage

query.wikidata.org should be excluded from the pageview definition / not counted as a wikidata view.
It would also be great if the data could then be regenerated for all historic days possible (if this was in-fact the reason for the spike)

Event Timeline

Addshore updated the task description. (Show Details)
Addshore raised the priority of this task from to Needs Triage.
Addshore added a subscriber: Addshore.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 19 2015, 10:58 AM
JAllemandou renamed this task from Remove query.wikidata.org from pageview definition (for wikidata) to Investigate wikidata pageview sipke on 2015-11-14.Nov 19 2015, 11:28 AM
JAllemandou set Security to None.
Addshore moved this task from incoming to monitoring on the Wikidata board.Nov 19 2015, 11:36 AM
JAllemandou renamed this task from Investigate wikidata pageview sipke on 2015-11-14 to Fix '.*http.*' not being tagged as spiders in webrequest.Nov 19 2015, 12:18 PM
JAllemandou triaged this task as Unbreak Now! priority.
JAllemandou claimed this task.
JAllemandou edited projects, added Analytics-Kanban; removed Analytics-Backlog.

I messed up a deploy about a month ago, preventing the change merged here: https://gerrit.wikimedia.org/r/#/c/244465/ to actually being applied.
I will:

  • bump refinery-core and refinery-hive (> 0.0.19) and update refine oozie job
  • deploy refinery with these new jar and new refine
  • restart refine process
  • document (wikitech webrequest, research pageview)

Change 254133 had a related patch set uploaded (by Joal):
Upgrade refine oozie job to jar v0.0.20

https://gerrit.wikimedia.org/r/254133

Change 254133 merged by Joal:
Upgrade refine oozie job to jar v0.0.20

https://gerrit.wikimedia.org/r/254133

JAllemandou renamed this task from Fix '.*http.*' not being tagged as spiders in webrequest to Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}.Nov 19 2015, 12:44 PM
JAllemandou moved this task from In Progress to Ready to Deploy on the Analytics-Kanban board.

Per the drop on https://vital-signs.wmflabs.org/#projects=wikidatawiki/metrics=Pageviews it looks like this has worked :)

It would be great to fix the legacy data here! :)

@Addshore: Not feasible since original user_agent is not present in pageview_hourly.

ahh okay! :/

Is it possible to add a not to the spike on the graph displayed on vital-signs?

I see these "A" markers on the graph already but I have no idea what they are about.

@ Addshore: The A are notes (there is a card if you place your mouse over it), and there is a note at deploy when the drop occurs.
Is there a necessity to add another? If you think so, notes are created using wiki: https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations.

Ahh, I never left my mouse over the tag for long enough.
Is it not possible to do notes on a per site basis?

It might be enough to slightly alter it to say specify there was a significant drop on wikidata this time! The drop on commons looks insignificant in comparison.

Wikidata dropped from a spike of 1.87 million to around 300k!

Notes are to the dashiki page, but I think you can modify the existing ones if you wish :)

Nuria closed this task as Resolved.Nov 27 2015, 8:49 PM