Page MenuHomePhabricator

Generate data to count langswitches for every article
Open, Needs TriagePublicFeature

Description

Feature summary:

Generate data when a user switches language in an article (for example, opens an article in ukwiki, but then switches to the same article in ruwiki).

It'll help to find badly written articles - if an article have rate of lang switches that is higher than average rate, it indicates that the article needs to be improved. Because, usually, when a bilingual user opens an article and doesn't find information he's looking for, he switches to an article in another language he knows.

Current state:

Clickstream provides the following data:

other-internal <article_name> external <count>

Here, other-internal means that user get here from another Wikimedia project. However, this data is not enough, because it's not clear which Wikimedia project was used.

Proposal

It's easier to generate a new dataset, than changing existing Clickstream data format.

So, generate the following data for all wikis (especially for small- and medium-size wiki):

<lang> <article> <count_language_switched_to_lang>

I.e. for enwiki we would have smth like this for article Europe:
fr Europe 40 <means user switched from en:Europe to fr:Europe 40 times>
de Europe 25
pl Europe 10
etc.

Event Timeline

Kanzat renamed this task from Specify source Wikimedia project in Clickstream data to Generate data to count langswitches for every article.Jun 19 2022, 11:15 PM
Kanzat updated the task description. (Show Details)
Restricted Application added a subscriber: Base. · View Herald TranscriptJul 3 2022, 3:47 PM