Feature summary:
Generate data when a user switches language in an article (for example, opens an article in ukwiki, but then switches to the same article in ruwiki).
It'll help to find badly written articles - if an article have rate of lang switches that is higher than average rate, it indicates that the article needs to be improved. Because, usually, when a bilingual user opens an article and doesn't find information he's looking for, he switches to an article in another language he knows.
Current state:
Clickstream provides the following data:
other-internal <article_name> external <count>
Here, other-internal means that user get here from another Wikimedia project. However, this data is not enough, because it's not clear which Wikimedia project was used.
Proposal
It's easier to generate a new dataset, than changing existing Clickstream data format.
So, generate the following data for all wikis (especially for small- and medium-size wiki):
<lang> <article> <count_language_switched_to_lang>
I.e. for enwiki we would have smth like this for article Europe:
fr Europe 40 <means user switched from en:Europe to fr:Europe 40 times>
de Europe 25
pl Europe 10
etc.