Page MenuHomePhabricator

WDQS sitelinks are stored in a non-canonical form
Closed, DuplicatePublic

Description

MediaWiki canonical URL form has "_" symbols instead of spaces. Yet, it seems WDQS stores them as %20, which makes them non-matching and harder to analyse.

select * where {
  ?sitelink schema:about wd:Q27.
  ?sitelink schema:inLanguage "en" .     
}
returns:   <https://en.wikipedia.org/wiki/Republic%20of%20Ireland>
expected:  <https://en.wikipedia.org/wiki/Republic_of_Ireland>

Event Timeline

Yurik created this task.May 26 2017, 5:41 AM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptMay 26 2017, 5:41 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Yurik updated the task description. (Show Details)May 26 2017, 5:42 AM
Yurik updated the task description. (Show Details)
Yurik updated the task description. (Show Details)