Page MenuHomePhabricator

Quantifying the importance of images in Wikipedia
Open, Needs TriagePublic


  • Qualitative: through a large-scale user study, understand the importance of images in learning through Wikipedia
  • Quantitative: measure readers' interactions with images across Wikipedias

Event Timeline

  • Qualitative: onboarded 2 students from UW who will help implement the user study on importance of image in learning on the labinthewild platform
  • Quantitative: met with the team after WWW deadline. Scoped out ideas for future work.

End of quarter updates:

  • Qualitative: designed a Mturk task to expand the dataset of questions for reading comprehension. Generated a list of potentially "good" articles to be fed to this task: articles that are relatively long, have more than 1 image, and contain sections such as "description" or "characteristics". Anntoated the articles with topic and popularity score to further filter out potentially unuseful articles.
  • Quantitative: @Daniram3 started analyzing metrics such as dwell time and session length (in time). Early results show that dwell time is significantly longer for pages with images (or image clicks), and session time is longer when browsing through pages with images (or when sessions include image clicks), even when controlling for page length and number of pages visited in a session. This is somehow related to T265772

Weekly updates:

  • Qualitative: We had a lot of issues when generating the data through MTurk: we were able to gather only ~60 valid questions. We are going to work with students to generate this data. We have a now well-defined set of good articles from which questions can be formulated.
  • Quantitative: @Daniram3, @tizianopiccardi and I are working on re-running experiments from the rejected WWW paper, on January data. I finished computing all features over the 4.5M images on English Wikipedia. We are now targeting ACM Multimedia on April 3rd as potential deadline for resubmission.

Weekly updates:

  • Qualitative: None - spring break at UW
  • Quantitative: We have finished re-running experiments on January data. Most results are consistent with the August 2020 data, but the analysis of readers' engagement with biographies and faces need more investigation.

Weekly updates:

  • Qualitative: The interface for the LabintheWild experiment is almost ready. We expect to start testing the pilot already next week.
  • Quantitative: We have finished re-running all experiments on January data. Experiments include: overall CTR estimation on images for English Wikipedia, analysis of the factors which make images more interesting than others, observational study of the role of faces in readers' engagement with images, observational study on the role of images in readers' engagement with pages, and a large-scale unsupervised learning study to define groups of Wikipedia pages based on how readers' interact with visual content.
  • Qulitative: none
  • Quantitative: we have started re-writing the paper for resubmission.
  • Qualitative: finalized the results page for the LabInTheWild experiment. Ready to pilot test.
  • Quantitative: re-writing the paper for resubmission to EPJ Data Science
  • Qualitative: fine-tuned the last details of the back-end data collection for the LabInTheWild experiment. Ready to pilot test.
  • Quantitative: re-writing the paper for resubmission to EPJ Data Science

Weekly updates:

  • Qualitative: no updates
  • Quantitative: finalizing the paper for resubmission to EPJ Data Science