Page MenuHomePhabricator

Provide basic page view metrics for individual tools on toollabs
Open, LowestPublic

Description

Popularity metrics such as these are important, I think.

Should just be a thing somewhere that provides daily total 2xx response counts.

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Toolforge.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 16 2015, 8:48 AM
Ironholds triaged this task as Lowest priority.Jan 16 2015, 4:29 PM

Oh, just whatever any UA classifier considers as browsers vs bots UA.

Our pageviews definition is based, in large parts, on MediaWiki's
structure. Applying it to the tool labs structure would require an
entirely new setup - one with variable accuracy depending on things as
idiosyncratic as how individual users decide to structure their tools.

Metrics are important; metrics are important for tools. I'm not
convinced metrics for tools are important enough to justify that kind
of effort (and the ongoing effort required to /keep/ it useful). Is
there a reason not to simply use a raw request count?

What I specifically have in mind is:

  1. Parse the nginx logs directly once an hour, and update the count publicly somewhere. This would be good enough for tools, since they don't get as much traffic as prod does.
  2. Use the same counting methodology that prod uses (not the same pipeline), so it is consistent.

I'm also not hoisting this on analytics :D I would consider this a Tool-Labs feature and put time into building this out myself. I only want guidance from analytics as to how the prod system works so I can replicate it here.

The counting methodology will /not/ be consistent. It's based on
things like MediaWiki directory names and specific hosts ;p.

Ok, so 'as consistent as possible'? Which I suppose boils down to just deciding which UAs to bucket as 'humans' and which as 'bots' and nothing more.

Or, say, MIME type filtering. But yes. So, you want ua-parser ;p

Cool. Can you tell me what exactly mime filtering is used for?

Filtering out calls to JS/image/css assets

scfc moved this task from Triage to Backlog on the Toolforge board.Apr 6 2015, 10:36 AM
yuvipanda removed yuvipanda as the assignee of this task.Jun 7 2015, 4:53 PM
yuvipanda set Security to None.
Harej added a subscriber: Harej.Jan 24 2018, 11:30 PM
bd808 renamed this task from Provide page view metrics for individual tools on toollabs to Provide basic page view metrics for individual tools on toollabs.May 16 2018, 12:37 AM
bd808 removed a project: Cloud-Services.
bd808 updated the task description. (Show Details)
bd808 claimed this task.May 16 2018, 12:42 AM
bd808 added a subscriber: bd808.

I have been working on this a bit weekends/evenings and I think I have a viable basic process worked out. I'm dropping the original ideas of hourly data (moderately interesting, but makes the data set 24 times larger) and bot/non-bot classification (also moderately interesting, but a big pain to keep up with parsing and categorizing User-Agent data). I am also defining the metric as "any 200 to 299 status code returned by the tool" rather than "page" as categorizing content type is very tool specific. This metric will be useful for rough orders of magnitude comparisons in tool usage, which is much better than having no usage data at all which is our current state.

Change 482237 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: process dynamicproxy access logs

https://gerrit.wikimedia.org/r/482237

Change 482238 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[labs/private@master] toolforge: profile::toolforge::toolviews::mysql_password

https://gerrit.wikimedia.org/r/482238

Change 482238 merged by Andrew Bogott:
[labs/private@master] toolforge: profile::toolforge::toolviews::mysql_password

https://gerrit.wikimedia.org/r/482238

Change 482237 merged by Andrew Bogott:
[operations/puppet@production] toolforge: process dynamicproxy access logs

https://gerrit.wikimedia.org/r/482237

Change 486822 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: fix script naming for run-parts

https://gerrit.wikimedia.org/r/486822

Change 486822 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: fix script naming for run-parts

https://gerrit.wikimedia.org/r/486822

@MusikAnimal Hey! This is the project I was talking to you about at the Prague Hackathon. There is currently a web interface at https://tools.wmflabs.org/toolviews/api/v1/day/2019-05-31 that returns a json dump of each day's traffic stats. The web service is a really simple flask app with no consumers yet, so I can tweak the response any way you'd like to make it easier for you to put a pretty UI on it. It would be pretty awesome if we could generate topviews and siteviews style visualizations of this raw data. Let me know your thoughts about how we might accomplish that without me learning a whole lot about modern javascript UIs. :)

As a tool provider it would (also) be nice to have the data transposed - so to speak: Eg. https://tools.wmflabs.org/toolviews/api/v1/tool/scholia where the returned data is across multiple days.

@MusikAnimal Hey! This is the project I was talking to you about at the Prague Hackathon. There is currently a web interface at https://tools.wmflabs.org/toolviews/api/v1/day/2019-05-31 that returns a json dump of each day's traffic stats. The web service is a really simple flask app with no consumers yet, so I can tweak the response any way you'd like to make it easier for you to put a pretty UI on it. It would be pretty awesome if we could generate topviews and siteviews style visualizations of this raw data. Let me know your thoughts about how we might accomplish that without me learning a whole lot about modern javascript UIs. :)

I am interested! I can't remember the term from the xkcd about getting easily persuaded into building an app for something, but that applies here :)

The Topviews-style visualization makes the most sense with the format of the API response. I think this would be fairly easy to build. A Siteviews-style app (where you can selectively enter in specific tools, or "all") would be awesome, but ideally we'd have endpoints similar to https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Cat/daily/2019051200/2019060100 , where it gives us per-day data for a given tool and date range (which I think is what Fnielsen is talking about). So let's start off with just the Topviews variant and go from there.

Would it be possible to for me to use the toolviews account? I see https://tools.wmflabs.org/toolviews/ says "coming soon"; and frankly, "Toolviews" is the most fitting name :)

bd808 added a comment.Jun 3 2019, 4:22 AM

Would it be possible to for me to use the toolviews account? I see https://tools.wmflabs.org/toolviews/ says "coming soon"; and frankly, "Toolviews" is the most fitting name :)

Yes, but... you would either need to do your work in the existing Python3 Flask webservice that is running there to provide the API, or I would need to move the API to another tool account. Either is possible, so let me know if the language+framework constraint is something you can work with or not.

bd808 added a comment.Jun 3 2019, 4:40 AM

As a tool provider it would (also) be nice to have the data transposed - so to speak: Eg. https://tools.wmflabs.org/toolviews/api/v1/tool/scholia where the returned data is across multiple days.

This is entirely possible, yes. I think we would want to do something similar to what @MusikAnimal mentioned in T87001#5229051 and actually give this per-tool endpoint some way to specify the desired date range instead of dumping out all data known for the specified tool. The backing database is only storing one row per tool per day, but even that will become an unwieldy result set over time.

Maybe something like /toolviews/api/v1/tool/<toolname>?start=<ISO 8601 date>&end=<ISO 8601 date> with both start and end defaulting to the prior day? We could mirror the pageviews URL format too, but these criteria really seem more like query string parameters than path components to me.

You can also use your ToolsDB credentials ($HOME/replica.my.cnf) to access the s53734__toolviews_p database directly until an API is available (NOTE: any data from before 2019-02-01 should be treated as a guess at best):

$ sql toolsdb
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 142786401
Server version: 10.1.38-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

(u3518@tools.db.svc.eqiad.wmflabs) [(none)]> use s53734__toolviews_p;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
(u3518@tools.db.svc.eqiad.wmflabs) [s53734__toolviews_p]> show tables;
+-------------------------------+
| Tables_in_s53734__toolviews_p |
+-------------------------------+
| daily_raw_views               |
+-------------------------------+
1 row in set (0.00 sec)

(u3518@tools.db.svc.eqiad.wmflabs) [s53734__toolviews_p]> describe daily_raw_views;
+-------------+------------------+------+-----+---------+-------+
| Field       | Type             | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+-------+
| tool        | varchar(128)     | NO   | PRI | NULL    |       |
| request_day | date             | NO   | PRI | NULL    |       |
| hits        | int(11) unsigned | NO   |     | 0       |       |
+-------------+------------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

(u3518@tools.db.svc.eqiad.wmflabs) [s53734__toolviews_p]> select sum(hits) from daily_raw_views where tool = 'scholia';
+-----------+
| sum(hits) |
+-----------+
|   5026749 |
+-----------+
1 row in set (0.01 sec)

Would it be possible to for me to use the toolviews account? I see https://tools.wmflabs.org/toolviews/ says "coming soon"; and frankly, "Toolviews" is the most fitting name :)

Yes, but... you would either need to do your work in the existing Python3 Flask webservice that is running there to provide the API, or I would need to move the API to another tool account. Either is possible, so let me know if the language+framework constraint is something you can work with or not.

Unfortunately there is a PHP 7.2 dependency. This is only to use Krinkle's Intuition i18n framework. Everything else is just JS/CSS.

That said, https://tools.wmflabs.org, https://toolsadmin.wikimedia.org/, etc. don't appear to be localized (that's not a complaint), so maybe we don't need to localize Toolviews either? This way the frontend and the API could live on the same tool account, assuming it's trivial for the Python3 Flask webservice to serve static assets. Basically I could end up giving you three files: the HTML, JS and CSS.

bd808 added a comment.Jun 3 2019, 11:03 PM

Unfortunately there is a PHP 7.2 dependency. This is only to use Krinkle's Intuition i18n framework. Everything else is just JS/CSS.
That said, https://tools.wmflabs.org, https://toolsadmin.wikimedia.org/, etc. don't appear to be localized (that's not a complaint), so maybe we don't need to localize Toolviews either? This way the frontend and the API could live on the same tool account, assuming it's trivial for the Python3 Flask webservice to serve static assets. Basically I could end up giving you three files: the HTML, JS and CSS.

I think we can live without i18n of the small number of UI strings for a bit. Maybe this would motivate me to figure out a Flask integration for Intuition. :)

@bd808 Could we extend the /toolviews/api/v1/day/YYYY-MM-DD endpoint to accept and end date too, as with /toolviews/api/v1/day/YYYY-MM-DD/YYYY-MM-DD? This way we can do a Topviews-style visualization but allow any arbitrary date range. This I think is different than T227120, which is about getting timeline data for a specific tool (Pageviews-style visualization).

Also, is there any easy way to tell if a tool has a webservice, other than checking the response of tools.wmflabs.org/toolname? It'd be neat to link to the tools in the interface, similar to how https://tools.wmflabs.org/admin/tools only links if there is a webservice.

bd808 added a comment.Mon, Jul 8, 12:03 AM

@bd808 Could we extend the /toolviews/api/v1/day/YYYY-MM-DD endpoint to accept and end date too, as with /toolviews/api/v1/day/YYYY-MM-DD/YYYY-MM-DD? This way we can do a Topviews-style visualization but allow any arbitrary date range. This I think is different than T227120, which is about getting timeline data for a specific tool (Pageviews-style visualization).

Also, is there any easy way to tell if a tool has a webservice, other than checking the response of tools.wmflabs.org/toolname? It'd be neat to link to the tools in the interface, similar to how https://tools.wmflabs.org/admin/tools only links if there is a webservice.

bd808 added a comment.Mon, Jul 8, 7:36 PM

@MusikAnimal I think I got all the things you asked for in the API implemented. I fell into a gold plating rabbit hole too and ended up adding an OpenAPI spec and UI to the app too: https://tools.wmflabs.org/toolviews/api/

In T87001#5314929, bd808 wrote:

@MusikAnimal I think I got all the things you asked for in the API implemented. I fell into a gold plating rabbit hole too and ended up adding an OpenAPI spec and UI to the app too: https://tools.wmflabs.org/toolviews/api/

Yay! This is awesome :) It will be much easier to put something together. I should have enough free time this week to make a prototype.