Page MenuHomePhabricator

Investigation: Multilingual support for Pageviews Analysis
Closed, ResolvedPublic2 Estimated Story Points

Description

@MusikAnimal is looking for a way to support translations for the Pageviews Analysis tool:

https://tools.wmflabs.org/pageviews

The i18n framework for Tool Labs tools T112307 works for Python tools, and Intuition only works with PHP tools.

Are there tools available for Ruby, or can we port Intuition to Ruby?

Event Timeline

The main thing to note is most of the readable text lives in the markup, which is written in HAML and precompiled into HTML. This means we do have a Ruby interpreter at our disposal when building the application – but not at runtime since it's all-frontend. That being said, I have two ideas:

Translating at compile-time
We could use svenfuchs/i18n (or something similar) by creating "dictionaries" of translations for all the text (along with some interpolation helpers, etc). This is nice because we'd have a single view in our developer code base... just the .haml file. The problem is once compiled, we'll end up with a HTML file for each language. This is a real mess for deployment and organization. Furthermore, I guess we'd need to have special routing to determine which HTML file to load. E.g. /pageviews/en-US or /pageviews/de-DE, etc. Or we could use JavaScript to dynamically load which HTML file we want, so you could use URL parameters like &lang=en-US – this dynamic loading however I think would slow things down quite a bit.

Translating at run-time
Another is fnando/i18n-js (or something similar), which is a similar dictionary-based solution, except for JavaScript. When the application is ran it internally checks navigator.lang to see what language the user's computer is set to, and translates everything accordingly. This is more ideal than using URLs to specify the language. For instance, I may contribute to the English and German Wikipedias, but my primary language is English – so if I click on a pageviews link on dewiki it would be preferred to see the tool in English. The downside is the dictionary JSON file could be quite large... when all you're going to use is a single language. I was thinking we could use localStorage to help with this. So, the first time you load the page, the application checks navigator.lang and makes an AJAX request to get the corresponding dictionary file. It then puts it in local storage. With subsequent visits to the application, the dictionary is already in memory and the other dictionaries aren't unnecessarily taking a toll on download time.

Both of these use this "dictionary" approach as opposed to some dynamic translator like Google Translate... I'm not really sure what the industry standard is. If dictionaries are the way to go, I think this won't be hard to implement, it'll just take a lot of effort from volunteers to do the actual translating.

Finally, despite using HAML and SASS, I don't want to move in a direction that will have a hard dependency on Ruby – a language that seems to be a bit unpopular amongst the Wikimedia community. So all things considered, I think the run-time solution coupled with local storage might be best.

Hey @MusikAnimal, I played around with some of the various i18n libraries (none of which I had actually used before), and I think Intuition would actually work out really well. Even though it's a fairly big library with lots of features, it's actually extremely easy to integrate it into pageviews. All you have to do is...

  1. Add a composer.json file which lists "Krinkle/intuition" as a requirement.
  2. Rename index.html to index.php.
  3. Add 3 lines of PHP to the top of the file.
  4. Create a messages directory with some JSON dictionaries in it, i.e. en.json, fr.json.
  5. Use <?=$I18N->msg( 'some-key' ); ?> wherever we want to output a translated message.

This will require us abandoning HAML, but that seems to be true for all the possible solutions.

A few of the really nice things about Intuition are:

  • Doesn't require modifying the URL. It uses the Accept-Language header that the browser sends. It's also super easy to let people switch the language, which it then stores in a cookie. The cookie is also shared with all the other Tool Labs tools that use Intuition, which is a nice bonus.
  • It supports the Banana Milkshake message file format and is easy to hook into TranslateWiki. You can even have TranslateWiki automatically push message translations to the GitHub repo.
  • It's all server-side, which means no delays or flashes, and also no need to store dictionaries on the client side.
  • It's PHP-based, so no Ruby dependencies.

If this sounds like a good solution, I would be happy to handle the implementation of it. Let me know what you think.

@kaldari Wow! I really like how it would seamlessly connect with other apps that use Intuition, which would include very popular tools like XTools. On my end I'd need to set up a PHP environment, but honestly I think that was bound to happen eventually :) I'm assuming this works without any extra overhead with Lighttpd, the default server on Tool Labs. I do a little devops maintenance for XTools and this is what we're using there. Leaving Ruby behind as I said on IRC is a good idea. HAML is just syntactic sugar, nothing important.

A potential blocker: I'm going to within the next few days push a big update to the repo. This will include the start on the "Topviews Analysis" tool, to show the top viewed pages within a timeframe. We don't need to worry about translating that yet, just thought I'd make you aware given the refactoring involved -- though it's mostly JS. The branch is at https://github.com/MusikAnimal/pageviews/tree/topviews

Next, I'm not sure the translation effort involved, but again we can side-step things like the FAQ, disclaimer and URL structure pages -- if we want. The interface itself should have priority.

One last concern is the refactoring of the views, namely the header and footer. I assume with PHP we can do some similar rendering of "partials" like we do with HAML. If we aren't able to figure this out at first that's fine, we can revisit the issue later.

Many thanks!

@MusikAnimal: Thanks for the warning about the refactoring. Hopefully the file conflicts will be minimal.

I'm assuming this works without any extra overhead with Lighttpd, the default server on Tool Labs.

Correct, it won't require any extra overhead on Tool Labs.

One last concern is the refactoring of the views, namely the header and footer. I assume with PHP we can do some similar rendering of "partials" like we do with HAML. If we aren't able to figure this out at first that's fine, we can revisit the issue later.

This is actually what PHP was originally invented to handle. See https://en.wikibooks.org/wiki/PHP_Programming/headers_and_footers.