Page MenuHomePhabricator

Include more languages in the selector
Closed, DeclinedPublic

Description

If we have more languages available than English and French, then include these in the language selection input. Otherwise, please invalidate this task.

It would be a courtesy to automatically default the input field to the best language for the user's browser locale or to a sticky cookie. This can be split into another task.

Event Timeline

After reading data/region_groundtruth_2020_11_29_aggregated_enwiki.json.bz2, I'm confused. The filename and contents seem to only be enwiki. How can we support the French option?

Nice question. There is a french ground-truth available. Subsequently, the model will accommodate several ground truths which would be used alongside the language picker.

Just out of curiosity, I still don't understand how the app is serving French articles. Even with my local installation where only the English data is loaded from data, I can query a French article title http://127.0.0.1:5000/api/v1/get-summary?lang=fr&title=Saint-Maurice%20(Puy-de-D%C3%B4me)&threshold=0.5 and get seemingly correct results. Where is the data coming from?

Where is the data coming from?

The groundtruth is currently based on articles from a specific language edition of Wikipedia but the "vocabulary" of the app is Wikidata IDs. So English data really just means the subset of Wikidata that has English sitelinks. In theory that means the app works for any language even when just being based on English. In practice, just using English Wikipedia groundtruth would limit its coverage though and (I suspect) bias the results to the US/UK/Canada etc. Eventually, the groundtruth will be based on any Wikidata item with at least one Wikipedia sitelink.