Page MenuHomePhabricator

Empty reply for pages with Special characters in french REST API
Open, LowPublic

Description

Hello folks at Wikimedia :)

given the followings request to regular pages, the replies are empty. There is a common denominator in that they all contain special characters in their titles.

https://fr.wikipedia.org/api/rest_v1/page/summary/Championnat_du_monde_de_snooker_%C3%A0_six_billes_rouges_2017?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/Masters_d%27Europe_de_snooker_2018?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/Masters_d%27Europe_de_snooker_2016?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/Open_d%27Irlande_du_Nord_de_snooker_2018?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/Masters_d%27Europe_de_snooker_2017?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%AA?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%AC?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%E2%B1%A9?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%99%A8?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%99%AA?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%99%AC?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%A8?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9E%90?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%A2?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%A4?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%A2?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%B2?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%9C?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%AC?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%B6?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%BA?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9C%BC?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%A0?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%B9?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%AA?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%9A?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%E2%B1%B5?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9D%BB?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9E%82?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9E%86?redirect=true
https://fr.wikipedia.org/api/rest_v1/page/summary/%EA%9E%84?redirect=true

Keep up the great work, Enrico

Event Timeline

daniel triaged this task as High priority.Jan 6 2020, 5:48 PM
daniel raised the priority of this task from High to Needs Triage.Jan 6 2020, 6:53 PM
daniel triaged this task as High priority.
daniel moved this task from Inbox to Backlog on the Platform Team Workboards (Clinic Duty Team) board.

Some of the links in the task description do provide the content.

Others generate a self-redirect with no content, which are the problem. I'll pick '%EA%9E%86' as an example

Getting Parsoid HTML for these pages give the HTML for a redirect page, redirecting to self. That is what's stored in Cassandra for the page. The etag of the page has some obscure revision number '124002832'.

Requesting a rerender of the page with 'no-cache' generates a proper render, with a proper revision number '93335907'. However it's not stored in the latest bucket since we do rely on the revision number increasing in Cassandra.

So, the bug is clearly in RESTBase. The revision '124002832' belongs to the page https://fr.wikipedia.org/w/index.php?title=ꞇ&redirect=no - the small version of the letter .

So the core reason of the problem is again differences between the behavior of strtoupper in PHP vs .toUpperCase in JS.

Seems like the core reason is the overrides introduced in T219279 and mediawiki-config Php72ToUpper.php file that overrides the upper-casing of certain characters in php for backwards compatibility.

Pchelolo lowered the priority of this task from High to Low.Apr 15 2020, 4:33 PM

This isn't good, but we can't do much about it until we have more info. If this pops up again, we should bump the prio again.

Aklapper added a subscriber: AMooney.

@AMooney: Assuming that "Set projects" was accidentally used instead of "Add projects", hence restoring some previous project tags.