Page MenuHomePhabricator

The titles that are in non-normalized Unicode form aren't reported as non-normalized by API
Closed, DuplicatePublic

Description

This request contains two non-normalized titles: _AMISSING with underscore in it and Greek yota letter with accent:

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=_AMISSING|Ϊ́&rvprop=timestamp|user|comment|content&format=json&continue=&utf8

Result only lists _AMISSING as non-normalized, and Greek yota is not in "normalized" section, but its byte-representation in "pages" section has been normalized:

{
 "batchcomplete": "",
 "query": {
  "normalized": [
   {
    "from": "_AMISSING",
    "to": "AMISSING"
   }
  ],
  "pages": {
   "-1": {
    "ns": 0,
    "title": "AMISSING",
    "missing": ""
   },
   "-2": {
    "ns": 0,
    "title": "Ϊ́",
    "missing": ""
   }
  }
 }
}

I guess, parser missed the title that is only non-normalized due to the Unicode non-normality.

Automated parsers will get confused because they will see the result for the item that they didn't submit in request.

Event Timeline

Yurivict raised the priority of this task from to Needs Triage.
Yurivict updated the task description. (Show Details)
Yurivict added a subscriber: Yurivict.
Yurivict renamed this task from API doesn't report as non-normalized titles that are in non-normalized Unicode form to API doesn't report as non-normalized the titles that are in non-normalized Unicode form.Aug 4 2015, 2:41 AM
Yurivict set Security to None.
Yurivict renamed this task from API doesn't report as non-normalized the titles that are in non-normalized Unicode form to The titles that are in non-normalized Unicode form aren't reported as non-normalized by API.Aug 4 2015, 4:50 AM