Page MenuHomePhabricator

Evaluate the impact of de-orphanization of articles in terms of visibility
Closed, ResolvedPublic

Description

In T299092, we evaluated the link recommendations for orphan articles comparing link translation with other methods. We showed that recommendations from link translation have by far the highest score in terms of macro-average indicating that this method works particularly well for the many small- and medium-sized wikis.

In this task, we are interested in the effect of de-orphanization on the visibility of these articles. We know, by construction, that the de-orphanized articles received at least one new incoming link; thus increasing its visibility from other Wikipedia articles. We are interested whether this also leads to an increase in readers (or editors) visiting the de-orphanized articles. If yes, this would add further motivation to suggest links for de-orphanizing articles.

Event Timeline

Update week 2022-08-22:

  • Discussing through different setups how to perform this analysis.
  • We define a treatment group as all articles that were de-orphanized in a given month.
  • As a first step, we can consider the increase in the number of clicks from the source-article (where the link was added) to the de-orphanized article. Before the treatment, the number of clicks was 0 since the target-article was an orphan. The number of clicks after the treatment will be the increase. We can easily extract this number from the clickstream dataset (though only for a selection of wikis)

weekly update:

  • refined analysis setup. the aim is the show that articles that are de-orphanized by editors show an increase in visibility
    • dataset: all articles that were de-orphanized in a given month
    • metric: we operationalize visibility of the target article (the orphan that gets de-orphanized) in three different ways. : i) number of pageviews to the source-articles, i.e. all articles linking to the target-article; ii) number of clicks from the source-articles to the target article; iii) number of previews from the source articles to the target article.
    • in each case the visilibity before orphanization is X-=0 and after de-orphanization is X+>=0. therefore, we aim for the following two analysis: i) is there an increase in visibility, i.e. is X+ statistically significantly different from 0; ii) how big is the increase in visibility, i.e. comparing the increase from X- to X+ to the average increase in visibility of other articles that received new inlinks.
  • next step: implement analysis

weekly update:

  • retrieving the timeseries of clicks to deorphanized articles from clickstream before and after they were deoprhanized. Resolving technical issues around pages that were renamed or became redirects (since clickstream only contains pagetitles) to track the number of clicks from month to month.

weekly update:

  • completed first analysis on number of clicks to newly deorphanized articles from clickstream data.
  • found statistically significant increase in the number of clicks in the month after de-orphanization: average increases from 0 to 32; 17% of the de-orphanizing links appeared with more than 10 clicks per month, this is much higher than for the average link ( Dimitrov et al. write "on Wikipedia only around 4% of all existing links are clicked by visitors more frequently than 10 times within a month"
  • we will likely refine this analysis to include a more robust control groups to compare the increase in visibility

weekly update:

  • setting up more robust experiment in which we define a control group of articles that were not de-orphanized.

weekly update:

  • extracted treatment/control pairs for two complementary studies in effect of deorphanization on visibility
    • 1) treatment: article a deorphanized in month t in wiki w; control: article a remains orphan in wiki w' != w
    • 2) treatment: article a orphanized in month t in wiki w; control: article a'!=a remains orphan in wiki w
  • next: extracting timeseries and calculating difference in differences to estimate effect of treatment

weekly update:

  • obtained first results for impact of treated articles (de-orphanized) vs control articles (same article in another language that remained orphan)
  • on average: there is a 40% increase in the number of pageviews for treated articles; we dont see an increase for the control articles
  • next: wiki-specific analysis via regression

weekly update:

  • Running regression models to quantify impact in a statistically more rigorous way (this also makes it easier to include wiki-specific analysis more elegantly)

weekly update:

  • completed regression analysis for quantifying effect of de-orphanization on number of pageviews for all wikis.
  • results hold for most wikis and are statistically signficant: overall, large effect sizes of 50% (or more) increases in the number of pageviews following treatment.
  • I consider the work done as part of this task, but will keep this task open until results are documented on the respective meta-page