Page MenuHomePhabricator

wikitech.wikimedia.org missing from pageviews API
Closed, ResolvedPublic

Description

https://wikimedia.org/api/rest_v1/metrics/pageviews/top/wikitech.wikimedia.org/all-access/2016/12/18

{"type":"https://restbase.org/errors/not_found","title":"Not found.","method":"get","detail":"The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet.  Please check https://wikimedia.org/api/rest_v1/?doc for more information.","uri":"/analytics.wikimedia.org/v1/pageviews/top/wikitech.wikimedia.org/all-access/2016/12/18"}

Since a list of valid domain names is not discoverable (e.g. via https://wikimedia.org/api/rest_v1/metrics/pageviews/top/) consuming tools appear to use the SiteMatrix API - which includes wikitech.wikimedia.org, and as such expect it to work.

http://tools.wmflabs.org/pageviews/?project=wikitech.wikimedia.org&platform=all-access&agent=user&range=latest-20&pages=Release_Engineering/SAL

Uses the SiteMatrix API to discover wikitech, and uses Wikitech's Query API (PrefixSearch) to provide the "Release_Engineering/SAL" title, but then fails to provide actual page view information.

Considering that https://wikitech.wikimedia.org/api/rest_v1/ and https://wikitech.wikimedia.org/api/rest_v1/ are also 404 Not Found, I suspect this may be intentional.

However, given that the underlying data source does include data for wikitech, I imagine enabling the pageview API for wikitech should be fairly trivial - even without a public RESTBase view on wikitech.wikimedia.org itself.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Oh, no, my fault, I got confused by the assumption around https://wikitech.wikimedia.org/api/rest_v1/ not working. That part is what we'd address in T119094. As for pageviews for wikitech, those should be in the api. The reason they're not is because wikitech is not included in this whitelist:

https://github.com/wikimedia/analytics-refinery-source/blob/bf6221db8623e705782cd2ab4abdbd23976c2589/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L51

So it's an easy change to add it, and I vote we should add it. Sorry about the accidental close.

Are they even being collected given that wikitech is not behind varnish?

@Krenair if wikitech is not behing varnish pageviews cannot be collected. Correct. Seems that we can close ticket?

No, we should leave it open and blocked on wikitech being set up properly. We could of course collect pageviews via some other route and stick them into kafka but that seems extreme. Are there plans on putting wikitech behind varnish?

Are there plans on putting wikitech behind varnish?

Indirectly, yes via T161859: Make Wikitech an SUL wiki. At the conclusion of T161859 wikitech will be a "normal" wiki in that it will not have OpenStackManager (T161553) or Semantic MediaWiki (T53642) installed. When we reach that point it will make good sense to move the labswiki database off of silver and serve wikitech via the general MediaWiki server pool. There is currently no official goal or timeline to complete this work, but unofficially I hope that we can have it done by the end of the 2017 calendar year.

bd808 lowered the priority of this task from High to Medium.May 8 2017, 3:22 PM

Lowering priority from high to normal. Having pageview data on wikitech would be nice, but I don't see that it is urgent in any sense nor likely to be a priority for the Cloud Services or Analytics team in the near term.

@Krenair if wikitech is not behing varnish pageviews cannot be collected. Correct. Seems that we can close ticket?

Be that as it may - we do actually have data in the webrequest table for Wikitech. Using a somewhat simplistic pageview definition, here are the 100 most viewed pages for September 2018 (without spiders) according to that data. Looks quite plausible.

desktopurlviews
https://wikitech.wikimedia.org/wiki/Main_Page30299
https://wikitech.wikimedia.org/wiki/Portal:Wikimedia_Labs14249
https://wikitech.wikimedia.org/wiki/6453
https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction3554
https://wikitech.wikimedia.org/wiki/Help:Toolforge3539
https://wikitech.wikimedia.org/wiki/Puppet_coding3248
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly3017
https://wikitech.wikimedia.org/wiki/Special:RecentChanges2975
https://wikitech.wikimedia.org/wiki/Switch_Datacenter2476
https://wikitech.wikimedia.org/wiki/Deployments2317
https://wikitech.wikimedia.org/wiki/Logstash2149
https://wikitech.wikimedia.org/wiki/Portal:Toolforge1761
https://wikitech.wikimedia.org/wiki/Server_Admin_Log1703
https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL1367
https://wikitech.wikimedia.org/wiki/APT_repository1299
https://wikitech.wikimedia.org/wiki/Prometheus1271
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS1177
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest1058
https://wikitech.wikimedia.org/wiki/Talk:Portal:Cloud_VPS1037
https://wikitech.wikimedia.org/wiki/PartMan1030
https://wikitech.wikimedia.org/wiki/Maps1004
https://wikitech.wikimedia.org/wiki/EventStreams/Powered_By970
https://wikitech.wikimedia.org/wiki/Portal:Wikitech937
https://wikitech.wikimedia.org/wiki/Portal:Data_Services937
https://wikitech.wikimedia.org/wiki/Help:Getting_Started922
https://wikitech.wikimedia.org/wiki/Kafka/Administration907
https://wikitech.wikimedia.org/wiki/Etcd903
https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions883
https://wikitech.wikimedia.org/wiki/Portal:Tool_Labs872
https://wikitech.wikimedia.org/wiki/MegaCli681
https://wikitech.wikimedia.org/wiki/EventStreams671
https://wikitech.wikimedia.org/wiki/Building_a_Shiny_Dashboard654
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Projectview_hourly638
https://wikitech.wikimedia.org/wiki/Help:Contents634
https://wikitech.wikimedia.org/wiki/Special:CreateAccount593
https://wikitech.wikimedia.org/wiki/Special:Search581
https://wikitech.wikimedia.org/wiki/Reprepro578
https://wikitech.wikimedia.org/wiki/Network_cheat_sheet553
https://wikitech.wikimedia.org/wiki/Special:Watchlist510
https://wikitech.wikimedia.org/wiki/Cloud_VPS_2018_Purge501
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews481
https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use474
https://wikitech.wikimedia.org/wiki/Clusters453
https://wikitech.wikimedia.org/wiki/Fundraising452
https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_developer_account422
https://wikitech.wikimedia.org/wiki/Wikitech%3aCloud_Services_Terms_of_use386
https://wikitech.wikimedia.org/wiki/Labs_Server_Admin_Log386
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Privacy385
https://wikitech.wikimedia.org/wiki/Special:Preferences382
https://wikitech.wikimedia.org/wiki/Puppet_Hiera365
https://wikitech.wikimedia.org/wiki/Operations_requests351
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database349
https://wikitech.wikimedia.org/wiki/How_to_deploy_code342
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Getting_started301
https://wikitech.wikimedia.org/wiki/Talk:Analytics/Systems/Wikistats299
https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use295
https://wikitech.wikimedia.org/wiki/User:Luxo283
https://wikitech.wikimedia.org/wiki/Server_Lifecycle278
https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep276
https://wikitech.wikimedia.org/wiki/PartMan/Auto267
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help265
https://wikitech.wikimedia.org/wiki/Special:SpecialPages265
https://wikitech.wikimedia.org/wiki/index.php262
https://wikitech.wikimedia.org/wiki/Special:UserLogin259
https://wikitech.wikimedia.org/wiki/Special:CreateAccount%26returnto%3DMain_Page257
https://wikitech.wikimedia.org/wiki/Special:MobileMenu247
https://wikitech.wikimedia.org/wiki/Performance/WebPageTest247
https://wikitech.wikimedia.org/wiki/LibreNMS242
https://wikitech.wikimedia.org/wiki/Yubikey-SSH236
https://wikitech.wikimedia.org/wiki/Special:PasswordReset231
https://wikitech.wikimedia.org/wiki/Graphite/Scaling229
https://wikitech.wikimedia.org/wiki/Confd219
https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2018_Purge217
https://wikitech.wikimedia.org/wiki/Special:RecentChangesLinked212
https://wikitech.wikimedia.org/wiki/Swift/How_To211
https://wikitech.wikimedia.org/wiki/Talk:Main_Page211
https://wikitech.wikimedia.org/wiki/Server_admin_log/Archive_17208
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Rules206
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web204
https://wikitech.wikimedia.org/wiki/Search204
https://wikitech.wikimedia.org/wiki/Graphite202
https://wikitech.wikimedia.org/wiki/Production_shell_access199
https://wikitech.wikimedia.org/wiki/Server_admin_log194
https://wikitech.wikimedia.org/wiki/IPsec194
https://wikitech.wikimedia.org/wiki/Help:Access193
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Documentation191
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews188
https://wikitech.wikimedia.org/wiki/Puppet184
https://wikitech.wikimedia.org/wiki/Managing_multiple_SSH_agents183
https://wikitech.wikimedia.org/wiki/SWAT_deploys182
https://wikitech.wikimedia.org/wiki/Network_design181
https://wikitech.wikimedia.org/wiki/File:What_is_Cloud_Services%3F_poster.pdf176
https://wikitech.wikimedia.org/wiki/Analytics175
https://wikitech.wikimedia.org/wiki/Incident_documentation/20120607-LastModifiedExtension174
https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org174
https://wikitech.wikimedia.org/wiki/Help:MySQL_queries171
https://wikitech.wikimedia.org/wiki/Live-1.5170
https://wikitech.wikimedia.org/wiki/User:Henna168
https://wikitech.wikimedia.org/wiki/Remove_a_message_from_mailing_list_archive166
https://wikitech.wikimedia.org/wiki/Puppet_CA_replacement161

Data via

SELECT CONCAT('https://wikitech.wikimedia.org',uri_path) AS desktopurl, COUNT(*) AS views
FROM wmf.webrequest 
WHERE year = 2018 AND month = 9
AND uri_host = 'wikitech.wikimedia.org' AND uri_path LIKE '/wiki/%' 
AND agent_type = 'user'
GROUP BY uri_path
ORDER BY views DESC LIMIT 100;

@Krenair if wikitech is not behing varnish pageviews cannot be collected. Correct. Seems that we can close ticket?

Be that as it may - we do actually have data in the webrequest table for Wikitech. Using a somewhat simplistic pageview definition, here are the 100 most viewed pages for September 2018 (without spiders) according to that data. Looks quite plausible.

I think at some point since I wrote that, the setup was changed and wikitech went behind varnish.

wikitech is not part of the projects to account for in PageviewDefinition code (https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L77). Easy to change if needed.

Why is this not documented on the wiki creation page? And why is it in a source file instead of a config file?

Why is this not documented on the wiki creation page?

I don't underdstand what the 'wiki creation page' is, but I think the current doc is correct about wikitech not being a pageview (https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters).

And why is it in a source file instead of a config file?

Technical debt ... It's been implemented this way originally, and never changed since.

Why is this not documented on the wiki creation page?

I don't underdstand what the 'wiki creation page' is

https://wikitech.wikimedia.org/wiki/Add_a_wiki

Right, I get it now :)
We discussed withbthe team and our plan is to change how we detect/filter pageviews from a domain perspective. We'll update the doc as needed when we're done refactoring.
In the meantime, we're going to add wikitech.wikimedia.org to the regex, providing pageviews for wikitech site.

Adding @Harej and @srodlund as subscribers as I think they will be interested in the outcome here.

Clarifying:

  • wikitech pageviews can now be computed as now wikitech wiki is behind varnish, webrequest table gets all data (indirectly) from varnish.

This is all all third party users need to do. For all purposes here wikitech is a "new wiki" so it should be included on this list (a whitelist exists cause we do not want to surface pageviews for wikis that are private)

If there is any need to modify the PageviewDefinition in any way , the analytics team will take care of that.

the two steps needed to add this to the pageview definition are:

  1. What nuria mentioned, adding it to the whitelist (https://wikitech.wikimedia.org/wiki/Add_a_wiki#Analytics)
  2. Adding wikitech to this regex that includes it in the pageview definition: https://phabricator.wikimedia.org/diffusion/ANRS/browse/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java$77 (I've added this to the link above for future clarity)

Change 481223 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[analytics/refinery@master] Add wikitech to whitelist

https://gerrit.wikimedia.org/r/481223

Change 481224 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[analytics/refinery/source@master] Add wikitech to pageview definition

https://gerrit.wikimedia.org/r/481224

Change 481224 merged by jenkins-bot:
[analytics/refinery/source@master] Add wikitech to pageview definition

https://gerrit.wikimedia.org/r/481224

Change 481223 merged by Joal:
[analytics/refinery@master] Add wikitech to whitelist

https://gerrit.wikimedia.org/r/481223

Thanks for the merge, @JAllemandou ! What are the next steps to complete this task?
IIUC the 2 steps that Milimetric described above have now been completed, but pageviews are not yet visible for wikitech (example1, example2). Perhaps it just needs to wait for the deployment train to do a circuit? (There's no rush, I just want to make sure it's moving forward :)

This is exactly it @Quiddity : A deploy on our side should unlock the thing.