Page MenuHomePhabricator

Internship: Understanding Readers' engagement with Wikipedia through Traffic Logs
Closed, ResolvedPublic


During @Daniram3 's 12-week internship, we are going to work on understanding how readers from all around the world engage with images in Wikipedia editions from different languages. Meta page coming soon.

Event Timeline

Updates from the last two weeks:

  • Internship started on first week of May
  • Scoping down the project for the internship period. Focus on specific topics (education topic) and 1 or 2 research questions, leave others for later:
    • How are people engaging with images?
    • How does this change across different countries/different segments of countries having different development index levels?
  • Focus on 13 Wikipedia language editions (most visited or spoken): Chinese, English, Hindi, Spanish, Russian, Arabic, Portuguese, Bengali, French, Indonesian, Japanese, German, Polish. Retrieved topic classification for all articles on Wikipedias from here (using Wikidata topic models) and presence of images for such Wikipedia editions.
  • Analysis of image percentage distributions by language, page length, page age, and page topic across selected Wikipedia editions.
  • Worked on exploring the data and on understanding what external data classification do we need to complete te project (country characteristics, image classifiers).
  • Set up access to production servers.

Updates from the last two weeks:

  • Started exploring image view and page view statistics at page level from production servers. Computed average image click-through rate (imageviews/pageviews) per page.
  • Found a couple of issues:
    • image view and page view tables cannot be joined at page level on a unique key, such as page_id. Solved by joining on page_title/uri_path
    • there is more than one image per article, so we need to normalize somehow by number of images
  • First results show that, for English Wikipedia, image click-through rate sets at around 10% in average across countries.
  • Planned to extend this analysis to the others Wikipedia language editions and for a longer period of time (around two weeks).

@Daniram3 I moved this task to this quarter's column/lane. No extra action needed on your end. For your information only. :) (Thanks for your weekly updates. I find them really helpful.:)

I'll resolve this task as it's done. If you disagree, please reopen.