Popularity metrics such as these are important, I think.
Should just be a thing somewhere that provides daily total 2xx response counts.
Popularity metrics such as these are important, I think.
Should just be a thing somewhere that provides daily total 2xx response counts.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | bd808 | T166406 Program 10 Outcome 3: Outreach | |||
Resolved | Quiddity | T176677 Promote Toolforge Tools and their maintainers within Wikimedia communities | |||
Open | None | T178834 Provide any rough metrics for tool and project usage | |||
Open | None | T129630 Collect and display basic metrics for all tools (service groups) | |||
Open | None | T87001 Provide basic page view metrics for individual tools on toollabs | |||
Resolved | valhallasw | T121233 Implement metrics for tool labs (under NDA?) | |||
Resolved | None | T227120 Toolforge toolviews API could return views for all dates | |||
Resolved | bd808 | T227163 Add CORS support to Toolforge toolviews API | |||
Resolved | bd808 | T237080 Toolviews data loading from Toolforge front proxy access log stopped on 2019-10-28 |
Our pageviews definition is based, in large parts, on MediaWiki's
structure. Applying it to the tool labs structure would require an
entirely new setup - one with variable accuracy depending on things as
idiosyncratic as how individual users decide to structure their tools.
Metrics are important; metrics are important for tools. I'm not
convinced metrics for tools are important enough to justify that kind
of effort (and the ongoing effort required to /keep/ it useful). Is
there a reason not to simply use a raw request count?
What I specifically have in mind is:
I'm also not hoisting this on analytics :D I would consider this a Tool-Labs feature and put time into building this out myself. I only want guidance from analytics as to how the prod system works so I can replicate it here.
The counting methodology will /not/ be consistent. It's based on
things like MediaWiki directory names and specific hosts ;p.
Ok, so 'as consistent as possible'? Which I suppose boils down to just deciding which UAs to bucket as 'humans' and which as 'bots' and nothing more.
I have been working on this a bit weekends/evenings and I think I have a viable basic process worked out. I'm dropping the original ideas of hourly data (moderately interesting, but makes the data set 24 times larger) and bot/non-bot classification (also moderately interesting, but a big pain to keep up with parsing and categorizing User-Agent data). I am also defining the metric as "any 200 to 299 status code returned by the tool" rather than "page" as categorizing content type is very tool specific. This metric will be useful for rough orders of magnitude comparisons in tool usage, which is much better than having no usage data at all which is our current state.
Change 482237 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: process dynamicproxy access logs
Change 482238 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[labs/private@master] toolforge: profile::toolforge::toolviews::mysql_password
Change 482238 merged by Andrew Bogott:
[labs/private@master] toolforge: profile::toolforge::toolviews::mysql_password
Change 482237 merged by Andrew Bogott:
[operations/puppet@production] toolforge: process dynamicproxy access logs
Change 486822 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: fix script naming for run-parts
Change 486822 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: fix script naming for run-parts
@MusikAnimal Hey! This is the project I was talking to you about at the Prague Hackathon. There is currently a web interface at https://tools.wmflabs.org/toolviews/api/v1/day/2019-05-31 that returns a json dump of each day's traffic stats. The web service is a really simple flask app with no consumers yet, so I can tweak the response any way you'd like to make it easier for you to put a pretty UI on it. It would be pretty awesome if we could generate topviews and siteviews style visualizations of this raw data. Let me know your thoughts about how we might accomplish that without me learning a whole lot about modern javascript UIs. :)
As a tool provider it would (also) be nice to have the data transposed - so to speak: Eg. https://tools.wmflabs.org/toolviews/api/v1/tool/scholia where the returned data is across multiple days.
I am interested! I can't remember the term from the xkcd about getting easily persuaded into building an app for something, but that applies here :)
The Topviews-style visualization makes the most sense with the format of the API response. I think this would be fairly easy to build. A Siteviews-style app (where you can selectively enter in specific tools, or "all") would be awesome, but ideally we'd have endpoints similar to https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Cat/daily/2019051200/2019060100 , where it gives us per-day data for a given tool and date range (which I think is what Fnielsen is talking about). So let's start off with just the Topviews variant and go from there.
Would it be possible to for me to use the toolviews account? I see https://tools.wmflabs.org/toolviews/ says "coming soon"; and frankly, "Toolviews" is the most fitting name :)
Yes, but... you would either need to do your work in the existing Python3 Flask webservice that is running there to provide the API, or I would need to move the API to another tool account. Either is possible, so let me know if the language+framework constraint is something you can work with or not.
This is entirely possible, yes. I think we would want to do something similar to what @MusikAnimal mentioned in T87001#5229051 and actually give this per-tool endpoint some way to specify the desired date range instead of dumping out all data known for the specified tool. The backing database is only storing one row per tool per day, but even that will become an unwieldy result set over time.
Maybe something like /toolviews/api/v1/tool/<toolname>?start=<ISO 8601 date>&end=<ISO 8601 date> with both start and end defaulting to the prior day? We could mirror the pageviews URL format too, but these criteria really seem more like query string parameters than path components to me.
You can also use your ToolsDB credentials ($HOME/replica.my.cnf) to access the s53734__toolviews_p database directly until an API is available (NOTE: any data from before 2019-02-01 should be treated as a guess at best):
$ sql toolsdb Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 142786401 Server version: 10.1.38-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. (u3518@tools.db.svc.eqiad.wmflabs) [(none)]> use s53734__toolviews_p; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed (u3518@tools.db.svc.eqiad.wmflabs) [s53734__toolviews_p]> show tables; +-------------------------------+ | Tables_in_s53734__toolviews_p | +-------------------------------+ | daily_raw_views | +-------------------------------+ 1 row in set (0.00 sec) (u3518@tools.db.svc.eqiad.wmflabs) [s53734__toolviews_p]> describe daily_raw_views; +-------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+------------------+------+-----+---------+-------+ | tool | varchar(128) | NO | PRI | NULL | | | request_day | date | NO | PRI | NULL | | | hits | int(11) unsigned | NO | | 0 | | +-------------+------------------+------+-----+---------+-------+ 3 rows in set (0.01 sec) (u3518@tools.db.svc.eqiad.wmflabs) [s53734__toolviews_p]> select sum(hits) from daily_raw_views where tool = 'scholia'; +-----------+ | sum(hits) | +-----------+ | 5026749 | +-----------+ 1 row in set (0.01 sec)
Unfortunately there is a PHP 7.2 dependency. This is only to use Krinkle's Intuition i18n framework. Everything else is just JS/CSS.
That said, https://tools.wmflabs.org, https://toolsadmin.wikimedia.org/, etc. don't appear to be localized (that's not a complaint), so maybe we don't need to localize Toolviews either? This way the frontend and the API could live on the same tool account, assuming it's trivial for the Python3 Flask webservice to serve static assets. Basically I could end up giving you three files: the HTML, JS and CSS.
I think we can live without i18n of the small number of UI strings for a bit. Maybe this would motivate me to figure out a Flask integration for Intuition. :)
@bd808 Could we extend the /toolviews/api/v1/day/YYYY-MM-DD endpoint to accept and end date too, as with /toolviews/api/v1/day/YYYY-MM-DD/YYYY-MM-DD? This way we can do a Topviews-style visualization but allow any arbitrary date range. This I think is different than T227120, which is about getting timeline data for a specific tool (Pageviews-style visualization).
Also, is there any easy way to tell if a tool has a webservice, other than checking the response of tools.wmflabs.org/toolname? It'd be neat to link to the tools in the interface, similar to how https://tools.wmflabs.org/admin/tools only links if there is a webservice.
Also, is there any easy way to tell if a tool has a webservice, other than checking the response of tools.wmflabs.org/toolname? It'd be neat to link to the tools in the interface, similar to how https://tools.wmflabs.org/admin/tools only links if there is a webservice.
@MusikAnimal I think I got all the things you asked for in the API implemented. I fell into a gold plating rabbit hole too and ended up adding an OpenAPI spec and UI to the app too: https://tools.wmflabs.org/toolviews/api/
Yay! This is awesome :) It will be much easier to put something together. I should have enough free time this week to make a prototype.
Unlicking this cookie. https://tools.wmflabs.org/toolviews/api/ is working, but we still need a UI to draw pretty graphs of the data. @MusikAnimal may or may not be able to help get someone started in the right direction on doing that. I would be happy to add co-maintainers to toolviews as needed to make deploying that UI possible.