**Feature summary**:
Clickstream can be a very effective tool to identify badly written articles, in case it's format will be slightly expanded.
Usually, when a bilingual user opens an article and doesn't find information he's looking for, he switches to an article in another language he knows.
Currently, an interlanguage clicks in Clickstream is specified as:
other-internal <article_name> external <count>
Here, `other-internal` means that user get here from another Wikimedia project, however not clear which one.
My proposal is to include language-project codes into the format, i.e.
frwiki <article_name> langswitch <count>
which would mean that a certain number of users changed article language from French to English (if it's enwiki dataset).
**Use Case**
Articles with more than average rate of lang switches would identify badly written articles, that need to be improved.
**Alternative implementation:**
Since 1) above means breaking existing format of Clickstream data and 2) This data is mostly useful for small and medium-sized wikis - it's better to create a new dataset (instead of changing Clickstream) and generate it for all wikis. Format would be following
<lang> <article> <count_language_switched_to_lang>
I.e. for enwiki we would have smth like this for article `Europe`:
`fr Europe 40`
`de Europe 25`
`pl Europe 10`
etc.