Page MenuHomePhabricator

Add non-English wiki support to Parsoid
Closed, ResolvedPublic

Description

Tasks:

  • try to pull config options (namespace names mainly) from the MW API when accessing the wiki first
  • our link t[r]ail regexp needs to be developed further / tested in other languages; it is a negative char match rather than the language-specific positive regexps, so can potentially be made to work across languages.
  • language variants are further down on the list (see bug 41716).

Version: unspecified
Severity: normal

Details

Reference
bz43332

Related Objects

StatusAssignedTask
ResolvedJdforrester-WMF
ResolvedMarkTraceur
Resolvedcscott
Resolved GWicke
Opencscott
OpenNone
Opencscott
Invalid GWicke
Resolvedliangent
Resolvedthiemowmde
OpenNone
Resolvedcscott
Resolvedcscott
ResolvedElitre
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Opencscott
Resolvedcscott
Opencscott
Opencscott
Opencscott
Resolved GWicke
ResolvedMarkTraceur

Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz43332.

I'll see what I can do about the config options, definitely. Are there any other options that we need besides namespace names?

The actual GET request URL (displayed in the sandbox at the bottom after executing the sample query):

https://en.wikipedia.org//w/api.php?action=query&meta=siteinfo&format=json&siprop=namespaces|namespacealiases|specialpagealiases

Oh, fun- my vague memory that image / file options and magic words are localized too turned out to be true:

Look for 'img_' in

https://fr.wikipedia.org/w/api.php?action=query&meta=siteinfo&format=json&siprop=general%7Cnamespaces%7Cnamespacealiases%7Cspecialpagealiases%7Cmagicwords%7Cinterwikimap%7Cdbrepllag%7Cstatistics%7Cusergroups%7Cextensions%7Cfileextensions%7Crightsinfo%7Clanguages%7Cskins%7Cextensiontags%7Cfunctionhooks%7Cshowhooks%7Cvariables

That output has more interesting things about registered extensions, parser functions etc- probably a good idea to simply wrap the entire JSON object with a WikiConfig object which provides convenient accessor methods for the info we need. That way we would only need to update those accessors if the underlying JSON structure changes.

The patch is now updated (rebased), and has addressed most of the issues raised in review.

More steps taken in https://gerrit.wikimedia.org/r/43972

The next step is magic words, which could take rather a bit of work.

https://gerrit.wikimedia.org/r/44353

That patch is a bit of work on using magic words from the remote wiki config. It's still WIP, but the feature is coming!

This is now pretty far along and used in production. Please report further issues separately.

[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]