Page MenuHomePhabricator

EditorJourney title hashing issue
Closed, ResolvedPublic

Description

The following edit attempt on testwiki correctly hashes three of the places where the page title is featured (page_id, page_title, and title) but fails to hash the title in the query string. I'm unsure what the reason is, note that the page is in the main namespace (namespace 0), but is a sub-page of a page that contains ":". Here's the event structure from the Data Lake for this event:

{"action":"edit","http_response_code":200,"is_mobile":false,"namespace":0,"page_id":"d8f76424b2894c33930aef9a1aa76d533656f0ae57cd57defbdb9cd134bfe0c4e2aef7593a4c97efe2deb0f712642848ad2365d39c2c676c7754aa64a879cbf8","page_title":"Editing 735fe8a89bf1780f9a30ffefda587f066e6fbc7748007452e7321b564a52c34c4c878d108f35c644915845e9134f31c5b6d380f58e72413794061ce87fc786ca","path":"/w/index.php","permission_errors":"","query":"title=Wikip%C3%A9dia:Administradores/Pedidos_de_aprova%C3%A7%C3%A3o/!SilentTest/8&action=edit","request_method":"GET","title":"3995795f0b25da2561b8c75c75ba2501f12f35ec2eb54d46b1bda629e3d2f8a3923b31cda8d5058831f4f90e9d7f22fe07d0203410425aac482c88a0a8739c31","user_id":41395}

Here's the query to grab the event out of the Data Lake in case more information (e.g. the event capsule) is helpful:

SELECT * FROM editorjourney
WHERE year = 2018 AND month = 11 AND day = 12 AND hour = 21
AND wiki = 'testwiki' AND event.user_id = 41395 AND dt = '2018-11-12T21:53:19Z';

Event Timeline

I'm guessing the str_replace() is failing on Wikip%C3%A9dia:Administradores/Pedidos_de_aprova%C3%A7%C3%A3 but can look at it more in depth tomorrow.

Change 473562 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] Ensure that urlencoded values are hashed

https://gerrit.wikimedia.org/r/473562

Change 473562 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Ensure that urlencoded values are hashed

https://gerrit.wikimedia.org/r/473562

Change 473812 had a related patch set uploaded (by Catrope; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Ensure that urlencoded values are hashed

https://gerrit.wikimedia.org/r/473812

Change 473812 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Ensure that urlencoded values are hashed

https://gerrit.wikimedia.org/r/473812

"namespace":0 and "query":"title=Wikip%C3%A9dia:Administradores/Pedidos_de_aprova%C3%A7%C3%A3o/!SilentTest/8&action=edit" do look unusual.

Tested in kowiki betalabs - created a page with the same title as above - Wikipédia:Administradores/Pedidos de aprovação/!SilentTest/8 - the page title was successfully hushed in event_query.

Note:
(1) There is event_query: token=redacted and some empty event_query - but it's up to @Morten-Haan to decide whether such values there make sense.
(2) There are few examples ( before the fix) in betalabs where hashing was not happening.
e.g.
[log]> select * from EditorJourney_18504997 where id=1785;

event_path: /w/index.php
             event_query: title=Saiba_Onde_Est%C3%A1_Errando_E_Tal_Como_Abater&action=edit
             event_title: ede5bf391469e983afed88d2adf0965987c1dbd5dedbab1c2adefa9c5c2637de8708415c3847c9fe369c38071c774ee60f2111acf910f44e99c87feb716784fe
         event_is_mobile: 0
           event_page_id: 0
        event_page_title: