Page MenuHomePhabricator

Pageviews agent=bot is always 0
Closed, ResolvedPublic

Event Timeline

MusikAnimal subscribed.

It's just rare, I'm guessing. Pageviews Analysis is pulling this from the Pageviews-API, which returns zero: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/bot/Michael_Jackson/daily/2015070100/2018061400

Maybe the bot agent just isn't used and could be removed? I can't say I've ever noticed any data, anywhere, for this agent. E.g. see all time results for enwiki's Main Page: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=bot&range=all-time&pages=Main_Page

Ottomata triaged this task as Medium priority.Jun 21 2018, 4:36 PM
Ottomata moved this task from Incoming to Data Quality on the Analytics board.
Vvjjkkii renamed this task from Pageviews agent=bot is always 0 to ezaaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.

The database table underlying the API, pageview_hourly, only has two possible values for agent type according to the schema description:

agent_type              string                  Agent accessing the pages, can be spider or user

So it is strange that the API claims to offer "bot" as a third value, both in the documentation (source) and per the admissible API request values.

Indeed, that is a mistake that should be corrected on docs.

I noticed https://github.com/MusikAnimal/pageviews/commit/34d62de36701246dbb1604ae23b2cbd30999b056 because of the many changes in the i18n files, and I have a little comment about it, and this is probably the best place for it.

I haven't gone over each line, but as far as I can see it's valid. However, there is probably an easier way to do it. Some things to consider for future patches of this kind:

  • Simple removals of messages ("bot": "Bot",) are only need in en.json and qqq.json. Removing the lines from other languages is not harmful, but it isn't necessary, because they will be removed automatically by translatewiki sync scripts.
  • Changing translations in source code is not forbidden, but not recommended. It's better to let translators do it in translatewiki.
  • It's a good idea to avoid changing $1, $2, $3, etc. parameters, but sometimes it's necessary. If parameters have to change, there are several options:
    • Make the old parameters empty in the code and add new ones.
    • Create a new message. It's OK if the rest of the text is similar: the translators' effort is not wasted because they can use translation memory.

I noticed https://github.com/MusikAnimal/pageviews/commit/34d62de36701246dbb1604ae23b2cbd30999b056 because of the many changes in the i18n files, and I have a little comment about it, and this is probably the best place for it.

I haven't gone over each line, but as far as I can see it's valid. However, there is probably an easier way to do it. Some things to consider for future patches of this kind:

  • Simple removals of messages ("bot": "Bot",) are only need in en.json and qqq.json. Removing the lines from other languages is not harmful, but it isn't necessary, because they will be removed automatically by translatewiki sync scripts.

I've got a Node script that removes the for all files, so as long as it's not hurting anything I'll continue using it.

  • Changing translations in source code is not forbidden, but not recommended. It's better to let translators do it in translatewiki.
  • It's a good idea to avoid changing $1, $2, $3, etc. parameters, but sometimes it's necessary. If parameters have to change, there are several options:
    • Make the old parameters empty in the code and add new ones.
    • Create a new message. It's OK if the rest of the text is similar: the translators' effort is not wasted because they can use translation memory.

It's a bit complicated what happened here. Basically url-structure-agent was removed (from all files), and url-structure-agent-no-bots was renamed to url-structure-agent (across all files). So this is more of a key change, rather than changes to the message itself. I realize this wasn't truly necessary, I just felt url-structure-agent was the better key name.

Hopefully I didn't mess anything up!

MusikAnimal claimed this task.
MusikAnimal moved this task from In Development to Done on the Tool-Pageviews board.

I have removed "bot" as one of the valid agents.