Page MenuHomePhabricator

Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}
Closed, ResolvedPublic

Description

Vital signs reports a large spike in page views from 9th Nov to 19th Nov (Although it is continuing)
https://vital-signs.wmflabs.org/#projects=wikidatawiki/metrics=Pageviews

To me this spike appears to be consistent with the spike in usage of the query service at query.wikimedia.org.
The spike can be seen on that dashboard http://discovery.wmflabs.org/wdqs/#wdqs_usage

query.wikidata.org should be excluded from the pageview definition / not counted as a wikidata view.
It would also be great if the data could then be regenerated for all historic days possible (if this was in-fact the reason for the spike)

Event Timeline

Addshore raised the priority of this task from to Needs Triage.
Addshore updated the task description. (Show Details)
Addshore subscribed.
JAllemandou renamed this task from Remove query.wikidata.org from pageview definition (for wikidata) to Investigate wikidata pageview sipke on 2015-11-14.Nov 19 2015, 11:28 AM
JAllemandou set Security to None.
JAllemandou renamed this task from Investigate wikidata pageview sipke on 2015-11-14 to Fix '.*http.*' not being tagged as spiders in webrequest.Nov 19 2015, 12:18 PM
JAllemandou claimed this task.
JAllemandou triaged this task as Unbreak Now! priority.
JAllemandou edited projects, added Analytics-Kanban; removed Analytics-Backlog.

I messed up a deploy about a month ago, preventing the change merged here: https://gerrit.wikimedia.org/r/#/c/244465/ to actually being applied.
I will:

  • bump refinery-core and refinery-hive (> 0.0.19) and update refine oozie job
  • deploy refinery with these new jar and new refine
  • restart refine process
  • document (wikitech webrequest, research pageview)

Change 254133 had a related patch set uploaded (by Joal):
Upgrade refine oozie job to jar v0.0.20

https://gerrit.wikimedia.org/r/254133

Change 254133 merged by Joal:
Upgrade refine oozie job to jar v0.0.20

https://gerrit.wikimedia.org/r/254133

JAllemandou renamed this task from Fix '.*http.*' not being tagged as spiders in webrequest to Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk}.Nov 19 2015, 12:44 PM
JAllemandou moved this task from In Progress to Ready to Deploy on the Analytics-Kanban board.

Per the drop on https://vital-signs.wmflabs.org/#projects=wikidatawiki/metrics=Pageviews it looks like this has worked :)

It would be great to fix the legacy data here! :)

@Addshore: Not feasible since original user_agent is not present in pageview_hourly.

ahh okay! :/

Is it possible to add a not to the spike on the graph displayed on vital-signs?

I see these "A" markers on the graph already but I have no idea what they are about.

@ Addshore: The A are notes (there is a card if you place your mouse over it), and there is a note at deploy when the drop occurs.
Is there a necessity to add another? If you think so, notes are created using wiki: https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations.

Ahh, I never left my mouse over the tag for long enough.
Is it not possible to do notes on a per site basis?

It might be enough to slightly alter it to say specify there was a significant drop on wikidata this time! The drop on commons looks insignificant in comparison.

Wikidata dropped from a spike of 1.87 million to around 300k!

Notes are to the dashiki page, but I think you can modify the existing ones if you wish :)