**Feature summary**:
Clickstream can be a very effective tool to identify badly written articles, in case it's format will be slightly expanded.
Usually, when a bilingual user opens an article and doesn't find information he's looking for, he switches to an article in another language he knows.
Currently, an interlanguage clicks in Clickstream is specified as:
other-internal <article_name> external <count>Generate data when a user switches language in an article (for example, opens an article in ukwiki, but then switches to the same article in ruwiki).
It'll help to find badly written articles - if an article have rate of lang switches that is higher than average rate, it indicates that the article needs to be improved. Because, usually, when a bilingual user opens an article and doesn't find information he's looking for, he switches to an article in another language he knows.
**Current state:**
Clickstream provides the following data:
Here, `other-internal` means that user get here from another Wikimedia project, however not clear which one.
My proposal is to include language-project codes into the format, i.e.
frwikiother-internal <article_name> langswitchexternal <count>
which would mean that a certain number of users changed article language from French to English (if it's enwiki dataset)Here, `other-internal` means that user get here from another Wikimedia project. However, this data is not enough, because it's not clear which Wikimedia project was used.
**Use Case**
Articles with more than average rate of lang switches would identify badly written articles, that need to be improved.Proposal**
**Alternative implementation:**
Since 1) above means breaking existing format of Clickstream data and 2) This data is mostly useful for small and medium-sized wikis - it's better to create a new dataset (instead of changing Clickstream) and generate it for all wikis.It's easier to generate a new dataset, than changing existing Clickstream data format.
So, Format would be following
generate the following data for all wikis (especially for small- and medium-size wiki):
<lang> <article> <count_language_switched_to_lang>
I.e. for enwiki we would have smth like this for article `Europe`:
`fr Europe 40` <means user switched from en:Europe to fr:Europe 40 times>
`de Europe 25`
`pl Europe 10`
etc.