I just wrote a nice little graph that can be easily pasted everywhere. It works fine on all the language wikipedias, but it fails on mediawiki.org because it uses {{SERVERNAME}} by default, which resolves to www.mediawiki.org, whereas pageviews api uses a strange mediawiki.org string instead. Please allow full server name for all wikis. Thanks!
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Strip out www. in front of project names | analytics/aqs | master | +19 -7 |
Related Objects
Event Timeline
@Milimetric - I think this is not exactly the "www" prefix, but rather always allowing canonical domains in addition to whatever shorter versions we have. When used from wiki markup, i can easily get the canonical domain, but it is hard to do string manipulations :)
@Yurik, the reason for changing the scope of the ticket is thus:
- This is how we clean the very dirty hostname we get into a "project" name [1]
- We use that project to load data into Cassandra, and this is not something we can change, there are terrabytes of data loaded already
- So for better or worse, we have just "en.wikipedia" or "mediawiki" in Cassandra
- We currently remove ".org" if it's passed in to map to the aforementioned convention we used
Therefore, the only sensible thing we can do is also remove "www." if the domain is prefixed with it. I guess we should do that generally, not just if mediawiki or wikidata are passed in. Hope that makes sense.
We are currently looking into rewriting /api/rest_v1/ in RESTBase itself (related to T127370), and stripping the www prefix from the host / domain could be something we can do in the same vein. @Pchelolo has started to create a request filter / middleware concept along the lines of T127132, which should let us do such API-specific mangling without hacks in HyperSwitch.
mmm, but this would be in the {project} parameter passed to AQS, not the domain. This could apply to the AQS config that we'll put on each wiki later, but we still have to handle the problem in the AQS backend.
Change 275681 had a related patch set uploaded (by Milimetric):
Strip out www. in front of project names