Page MenuHomePhabricator

Add support for content-language meta tag to determine language
Open, LowPublic

Description

In this edit (among others), I used the automated reference interpreter to fill the citation template: https://nl.wikipedia.org/w/index.php?title=Miro_Cerar&diff=prev&oldid=49432423

Most of the included references in Slovenian did not have the language recognized, except this one:

<ref name=":1">{{Citeer web|url=http://www.student.si/preberi-si/aktualno/intervju-prof-dr-miro-cerar.html|titel=INTERVJU - prof. dr. Miro Cerar|bezochtdatum=2017-07-10|auteur=Študent|taal=si-SI}}</ref>

(where the parameter 'taal' is the language parameter). The language code used is si, which is sinhalese - quite uncomparable a language with even a different script. Seems like the language code is somehow mixed up? The language code should have been sl (if insisting to add country, sl-si ).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana moved this task from To Triage to External and Administrivia on the VisualEditor board.

For what it's worth: The source (http://www.student.si/preberi-si/aktualno/intervju-prof-dr-miro-cerar.html) reports <meta http-equiv="Content-Language" content="sl-SI"/> and <html lang="si-SI" xml:lang="si-SI" xmlns="http://www.w3.org/1999/xhtml">. Seems incoherent to me.

Yeah, we use the HTML lang property, and in the source it's incorrect. Not
much we can do about that. We could add support for the meta tag though.

Mvolz lowered the priority of this task from Medium to Low.Jul 11 2017, 7:26 PM
Mvolz renamed this task from Citoid identifies webpage in Slovenian as Sinhalese to Add support for content-language meta tag to determine language.Sep 5 2017, 9:30 AM
Mvolz moved this task from Backlog to Zotero on the Citoid board.
Mvolz moved this task from Zotero to Service: Scraper & Validation on the Citoid board.