Page MenuHomePhabricator

Massviews is creating URLs which cannot be used
Open, HighPublicBUG REPORT

Description

I found an issue that Massviews creates URLs to queries that don't work, this is a big issue for me because its the only tool I can use to provide metrics to UN partner organisations I'm working with). E.g

https://pageviews.wmcloud.org/massviews/?platform=all-access&agent=user&source=search&range=latest-20&project=en.wikipedia.org&sort=views&direction=1&view=list&target=insource:/(fao.org%7Cpublisher=FAO)/

It even breaks just providing a link here, even if you copy and paste the link, you'll have to manually copy and paste insource:/(fao.org|publisher=FAO)/ into Massviews to get the correct result, otherwise it chops off the publisher its looking for).

Is there any way Massviews could phrase the URLs differently to avoid them breaking? Would using HTML Entity Encodes help? (I have no idea, I'm completely out of my depth with this)

Thanks very much

Event Timeline

@Aklapper sorry to not include a tag in this, I cant for the life of me find a suitable one

John_Cummings renamed this task from Allow URLs from https://pageviews.wmcloud.org to be shortned with the Wikimedia URL shortner (the full URLs break when added as links on wiki pages) to Massviews is creating URLs which cannot be used.Feb 9 2024, 4:26 AM
John_Cummings updated the task description. (Show Details)

OK, I was kindly helped by @PrimeHunter

The workaround is to use a slightly different query which gives the same results and doesn't break the URL

insource:fao insource:/(fao.org|publisher=FAO)/

https://pageviews.wmcloud.org/massviews/?platform=all-access&agent=user&source=search&range=latest-20&project=en.wikipedia.org&sort=views&direction=1&view=list&target=insource%3Afao+insource%3A%2F%28fao.org%7Cpublisher%3DFAO%29%2F

I'll make sure to include this in the documentation.

Steps to reproduce for a simpler example without so many search results.

  1. Go to https://pageviews.wmcloud.org/massviews
  2. Select "Search" under "Source"
  3. Enter foobar=baz in the next field and en.wikipedia.org in the next
  4. Click Submit

At this point the page correctly says foobar=baz and shows the correct results, currently 9 pages.
The url has automatically changed to
https://pageviews.wmcloud.org/massviews/?platform=all-access&agent=user&source=search&range=latest-20&project=en.wikipedia.org&sort=views&direction=1&view=list&target=foobar=baz

  1. Copy the url from the address bar and use it to try to redo the query

Now the page only says foobar and includes results without baz, currently 54 pages in total.

The problem is that foobar=baz in the url made by Massviews was not encoded as foobar%3Dbaz.
It gives the correct results if I manually make that encoding in the above url:
https://pageviews.wmcloud.org/massviews/?platform=all-access&agent=user&source=search&range=latest-20&project=en.wikipedia.org&sort=views&direction=1&view=list&target=foobar%3Dbaz

However, it only works once. The url has changed back to foobar=baz and gives wrong results if I use that url to try to repeat the query.

I don't know whether there are other special charaters which would need encoding in other examples.

It was tested in Firefox, Chrome and Edge on Windows 10 with the same result.

I'm having trouble setting up a dev environment on my local machine, but I'm fairly confident that the problem here is with https://github.com/MusikAnimal/pageviews/blob/1852ad6160e463113adca1e1a1333adbecd58384/javascripts/shared/pv.js#L1570. It's only running URL-encoding on a select subset of characters that doesn't include =, (, or /. I don't immediately see why the URI-encoding can't be unconditional (or better yet, use URLSearchParams rather than doing the encoding manually), but unless I can get a test environment up I can't make a PR.

MusikAnimal changed the subtype of this task from "Task" to "Bug Report".Feb 12 2024, 6:33 PM

I'm having trouble setting up a dev environment on my local machine, but I'm fairly confident that the problem here is with https://github.com/MusikAnimal/pageviews/blob/1852ad6160e463113adca1e1a1333adbecd58384/javascripts/shared/pv.js#L1570. It's only running URL-encoding on a select subset of characters that doesn't include =, (, or /.

Yup. You nailed it! I think the only one of concern here however is the =, at least as far as this bug goes.

I don't immediately see why the URI-encoding can't be unconditional (or better yet, use URLSearchParams rather than doing the encoding manually)

I'm going by memory, but I believe some characters like | we didn't want to URL-encode to keep them looking "pretty". This may go against a spec or something, but is consistent with other WMF tools such as https://stats.wikimedia.org

URLSearchParams wasn't used because at the time, we still supported IE. There's plenty of cleanup to do in that regard across the repo.

… but unless I can get a test environment up I can't make a PR.

Sorry to hear that :( I don't have the Docker skills to make a container, but the setup is otherwise fairly simple with minimal dependencies. See CONTRIBUTING.md if you haven't already.


I can get this hot-fixed soon, but I'll note the repo in general is still being reworked and currently in a partially broken state (a PR for the above should cherry pick fine, though).