Page MenuHomePhabricator

return/returnto query not always hashed as intended
Closed, DeclinedPublic

Description

tl;dr there are issues with using the returnto query parameter to correlate with other events.

To reproduce:

In the EventLogging data, you'll see events like this:

  • "query": "title=Special%3ACreateAccount&returnto=102b47c1303a182d384587bac8f350ea9812808da1262092106d185a5687d2b47b1d25a97ea1111f4abf077282dc9ca5ea6056789bf37d8ddb55b8ec3d7379df&returntoquery=action%3Dedit"
  • "query": "title=Special%3AWelcomeSurvey&returnto=102b47c1303a182d384587bac8f350ea9812808da1262092106d185a5687d2b47b1d25a97ea1111f4abf077282dc9ca5ea6056789bf37d8ddb55b8ec3d7379df&returntoquery=action%3Dedit&group=exp1_group1",

These first two events are generated when creating the account then getting redirected to the user survey. When you end up back on https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Portal_talk:Featured_content&action=edit&redlink=1, the event looks like this:

"event": {
   "user_id": 23,
   "page_title": "Editing Talk:d518ef5aea7fd7fb161aa1bcf5809d8c7fc57b2fb4f87fc13891120f2746f9b2adf52e1c2aaa6c38a89e4d5c2926a4efe9033c9ab2b9c46cc2ea19e5506d6f32",
   "title": "d518ef5aea7fd7fb161aa1bcf5809d8c7fc57b2fb4f87fc13891120f2746f9b2adf52e1c2aaa6c38a89e4d5c2926a4efe9033c9ab2b9c46cc2ea19e5506d6f32",
   "permission_errors": "",
   "namespace": 1,
   "request_method": "GET",
   "is_mobile": false,
   "path": "/mediawiki/index.php",
   "action": "edit",
   "http_response_code": 200,
   "query": "title=Talk:f6de99c18b28cb6ad79732f03c767cead43cf97d0fb335dd142b495d733036899034f03f45969fb29bae10dee6e95d6e82d3d1021fa453c67c48928a2d0d7832&action=edit",
   "page_id": "3842ebe5ceb3e08d523250779443c8dd1d122fd754dab43373b7202abbceed8d65ed59b4ea7ecad5c0163c85f231aa6e4f399232c540cec26da9654c58efdf4f"
 },

As you can see, 102b47c1303a182d384587bac8f350ea9812808da1262092106d185a5687d2b47b1d25a97ea1111f4abf077282dc9ca5ea6056789bf37d8ddb55b8ec3d7379df is not found in the GET for the page you're editing, although it should be.

There are two issues. One is that the URL when you click "Create account" on https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Portal_talk:Featured_content&action=edit&redlink=1 takes you to https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:CreateAccount&campaign=anoneditwarning&returnto=Portal_talk%3AFeatured_content, so : is transformed to %3A, and that raw value is encoded rather than :. We could work around that, but there's still a second problem: we could hash returnto=Portal_talk:Featured_content but in the "View" event we are hashing "Featured_content" and not "Portal_talk:Featured_content", so it's not possible to correlate the hashed returnto parameter with subsequent events.

Maybe this isn't such a big deal, since we might just look at the stream of events rather than looking at what's in "returnto" but wanted to note it for @nettrom_WMF and @MMiller_WMF to consider.

Event Timeline

I looked through the questions we're asking about new users, particularly around account creation, and this issue isn't a problem for us.
We're only concerned about the context in which the account was created (reading/editing, which is answered by the ServerSideAccountCreation schema) rather than more specifically what the users were reading prior to creating their account. It's still worthwhile to take note of this, so I've created T210434 to track the things I need to keep in mind as I'm working with the data.