Page MenuHomePhabricator

Techblog: Change URL permalink style to better measure pageviews
Closed, DeclinedPublic

Description

Just a suggestion.

We have now hits for pageviews on matomo on our new and shinny techblog.

The way urls are shaped there are three levels of directories before you get to the actual content. Example:

https://techblog.wikimedia.org/2020/03/24/computational-knowledge-wikidata-wikidata-query-service-and-women-who-are-mayors/

A url structure like the following would be better suited for absolute ranking of pageviews per blogpost (dates removed):

https://techblog.wikimedia.org/computational-knowledge-wikidata-wikidata-query-service-and-women-who-are-mayors/

With the date structure included, to be clear, it is also possible to intercompare blogposts but just not as easy or obvious as pageviews will be counted for the intermediate levels 2020, 2020/03, 2020/03/26 and so forth and those are probably not that helpful.

Event Timeline

Nuria created this task.Mar 26 2020, 7:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 26 2020, 7:19 PM
bd808 added a subscriber: bd808.Mar 27 2020, 3:06 PM

The /%YEAR/%MONTH/%DAY/%SLUG permalink structure matches the historical techblog & blog layout. Right now we do not have legacy articles loaded into the blog, but T243407: Import blog posts from old https://blog.wikimedia.org/c/technology/ into archive of the new Tech blog hopes to fix that soon. If we change the permalink structure we won't be able to keep links to old content working.

Aklapper renamed this task from Techblog. Change url shape to better measure pageviews to Techblog: Change URL permalink style to better measure pageviews.Mar 27 2020, 4:58 PM

On the technical side, Settings 🡒 Permalinks is currently set to Day and name. This request asks to change it to Post name.
The question is if we want this, as per last comment...

Nuria added a comment.Mar 27 2020, 5:18 PM

If we change the permalink structure we won't be able to keep links to old content working.

can't we mod_rewrite it? as the change is an easy path rewrite

bd808 added a subscriber: Krinkle.Mar 27 2020, 7:56 PM

If we change the permalink structure we won't be able to keep links to old content working.

can't we mod_rewrite it? as the change is an easy path rewrite

I need to poke around a bit more in the WPVIP documentation to be sure of the level of control we have over the HTTP server layer. I know there is some facility for customization and link rewriting, but I'm not sure if it is only for simple /x -> /y static mappings or if regex, etc can be used.

One potential downside of only using %SLUG for the canonical URL is that those slugs will have to be globally and temporally unique. So if say @Krinkle moves to publishing the monthly "Production Excellence" posts to techblog they will all need unique names (which is apparently something that was actually done retroactively to the Phame posts very recently).

I don't have access to our Matomo instance, but I've used it for my own sites for a long time. As I understand it, Matomo uses the canonical url to identify what a page or post is. Any slashes etc within it are not significant. However, there are additional aggregations that can be done which become in fact much simpler if there are dates present in the url. For example, one can easily see with Matomo's default page overview how the traffic breaks down between people reading new posts vs older posts,

Matomo is quite powerful and I would expect such popular package to first-class support permalinks with dates.

Can you elaborate on what the problem is, e.g. a screenshot or description of what we want and why it is hard or impossible currently? Are there any other ways that it could be solved?

Nuria added a comment.Mar 29 2020, 6:19 PM

Please do not spend much time in this cause it is not a big deal at all. My point was a minor one.

One potential downside of only using %SLUG for the canonical URL is that those slugs will have to be globally and temporally unique. So if say @Krinkle moves to publishing the monthly "Production Excellence" posts to techblog they will all need unique names

Indeed, ya, that is a downside.

Can you elaborate on what the problem is, e.g. a screenshot or description of what we want and why it is hard or impossible currently? Are there any other ways that it could be solved?

There is no issue with counting rather the default reports are run per directory and thus reporting is happening for directories like https://techblog.wikimedia.org/2020, https://techblog.wikimedia.org/2002/02 and https://techblog.wikimedia.org/2020/01/01 which are counted as "entry pages". So they will be distinctively aggregated on the dashboard when they do not add much value. That being said, again, not a big deal.

I don't have access to our Matomo instance

You can take a look at https://piwik.wikimedia.org with your ldap credentials, the user under which you can see reports is listed on stats1007 on /home/nuria/piwik

Thanks, I understand it better now. I haven't tried this myself yet, but the action_url_category_delimiter option might be of help here. A number of upstream issues describe this when other people had similar use cases that did not work the way they wanted to (by default).

The category delimiter for urls defaults to /, and for page titles it used to default to > or ::. As of a year ago or so, the title categorisation is now disabled by default, but url categorisation is still enabled by default.

If this is set to no value (empty string), it treats all pages as equal without grouping like it does now. Perhaps that's an option for the blog domain?

Nuria added a comment.Mar 30 2020, 3:45 AM

it treats all pages as equal without grouping like it does now

Right, it is set for all sites at once.

fdans moved this task from Incoming to Radar on the Analytics board.Mar 30 2020, 4:12 PM

it treats all pages as equal without grouping like it does now

Right, it is set for all sites at once.

The software defaults (config/global.ini.php) and instance defaults (config/common.config.ini.php) are loaded for all monitored sites.

But, the optional local settings file is per-site, e.g. config/techblog.wikimedia.org.config.ini.php. If this doesn't exist, Matomo loads config/config.ini.php as local settings file instead. I noticed this is absent from the docs, which I've pinged upstream about (issue).

Nuria added a comment.Mar 31 2020, 4:06 PM

mmmm, let me see if this is accessible through the ui

Nuria added a comment.Mar 31 2020, 5:14 PM

I do not see any way to access config settings that are in the global config and per the issue you filed this looks like it no work so let's scrape my suggestion?

Krinkle closed this task as Declined.Apr 21 2020, 10:10 PM