Understanding curious and critical readers (Q2)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MGerlach
	Oct 11 2021, 7:34 PM

Description

We have sketched a first research proposal around this project to i) scope the problem and ii) reach out potential collaborators T288341.

In this quarter we want to

put project on meta
formalize collaboration
announce research project
~~onboard collaborators~~ (moved to next quarter)
generate datasets for first exploratory analysis

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		MGerlach	T293036 [EPIC] Understanding curious and critical readers
		Resolved		MGerlach	T293037 Understanding curious and critical readers (Q2)

Event Timeline

MGerlach created this task.Oct 11 2021, 7:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 11 2021, 7:34 PM

Update week 2021-10-11:

created meta-page for project: https://meta.wikimedia.org/wiki/Research:Understanding_Curious_and_Critical_Readers

Update week 2021-10-25:

ongoing discussions to formalize collaboration with potential collaborators
starting to explore how to generate relevant datasets

Update week 2021-11-04:

following up with formalizing potential collaboration
generated datasets for knowledge networks capturing curiosity based on reading sessions from requests from apps
- spent some time pre-processing this data since app-requests are treated differently to desktop/mobile web. for example, namespace_id and page_id are not set; talking with data engineering this is expected and is not a bug; in any case this requires some extra steps such as resolving redirects in the pageview requests
- exploratory analysis of sessions from 1 month in enwiki yields roughly 2M reading sessions from different unique ids; these reading sessions are much longer (mean=28 pageviews) than for desktop/mobile web since they dont rely on approximate fingerprinting
generated datasets for critical reading sessions. as a first step, I identified sessions in which readers view the version_history of an article
- calls to the version-history are identified via the action=history in the query_uri field
- I wanted to focus on readers, so some preprocessing needed to be done to filter editors
- from a first small sample of data, roughly 1 out of 500 pageviews lead to a view of the version-history. while this seems low at first, this number is on the same order of magnitude as the rate at which readers engage with citations (see https://arxiv.org/abs/2001.08614). thus this requires some further validation but is promising in capturing another important aspect in which readers engage with the content in a critical way.

Update week 2021-11-15:

spent some time to better figure out how version-history is captured in the webrequest logs.
for desktop, the signature is in the form https://en.wikipedia.org/w/index.php?title=Marie_Curie&action=history
- they are captured as normal pageviews (is_pageview=True), i.e. we keep track of the page_title and page_id of the content page
- the corresponding field action=history is the same across different wikis
for mobile (web), calls are handled differently via https://en.m.wikipedia.org/wiki/Special:History/Marie_Curie
- one of the main limitations is that this signature is different across wikis, e.g. in German the call to version history for the same article is https://de.m.wikipedia.org/wiki/Spezial:Versionsgeschichte/Marie_Curie this will make it much more cumbersome to capture these events systematically in different languages.
- however, the number of calls to version-history from mobile is much smaller (less than 10% of what we see in desktop), so in a first approximation, we might focus on the signature from desktop.

with this, we can systematically capture the extent to which readers visit the version-history and, more importantly, whether there are specific articles that lead to more/less visits of the version-history.

Update week 2021-11-22:

finished pipeline to get all version-history requests in enwiki for a full day applying the following filters/processing steps
- removing editors
- removing repeated requests for version-history of the same article by the same reader
- removing requests to version-history of articles outside main namespace (this is challenging since page title and namespace are not properly recorded for these events in the webrequest table so this reuqired lots of manual processing)
next step: how common is requesting the version-history. are there article-properties (quality, reliability, degree of controversy, popularity) which lead to more/fewer requests of version history.?

Miriam subscribed.Jan 10 2022, 4:27 PM

Update week 2022-01-10:

followed up with potential collaborators for signing MOU/NDA to formalize collaboration. received positive response and waiting for necessary information to start process with Legal.

MGerlach updated the task description. (Show Details)Jan 21 2022, 4:42 PM

Update week 2022-01-17:

preparing the MOUs/NDAs for formalizing the collaboration
generated a first dataset for exploratory analysis around critical readers. the dataset captures all interactions of readers with the version-history as well as the article talk page of each article in enwiki in one month. we distinguish whether reader was logged-in or not and whether they attempted to edit a page or not in order to understand how users who only read the article make use of these features. in addition, I captured several article features to understand whether readers access version-history and talk pages in specific contexts: i) popularity of pages (total pageviews), ii) topic (language-agnostic topic model), iii) quality (language-agnostic quality prediction), iv) reliability (templates from wiki-reliability). analysis in next tasks.

MGerlach updated the task description. (Show Details)Jan 21 2022, 4:55 PM

MGerlach mentioned this in T299786: Curiosity: exploratory analysis and onboarding collaborators (Q3).Jan 21 2022, 5:43 PM

leila awarded a token.Jan 21 2022, 5:45 PM

Update meta and formal-collaborations page.
announced on wiki-research-l https://lists.wikimedia.org/hyperkitty/list/wiki-research-l@lists.wikimedia.org/thread/Y27NUTRKPUWHCD7XDDWIWVU3KNLXBSTR/

Understanding curious and critical readers (Q2)Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Understanding curious and critical readers (Q2)
Closed, ResolvedPublic
Actions

Related Objects
Search...