Maniphest T332057

Find efficient ORES articlequality data source
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	awight
	Mar 14 2023, 4:48 PM

Description

Ideally, we can load a dumpfile with a snapshot articlequality score for each page. In the worst case it would be possible to request a score for every page.

The goal is that we can compare ORES scores from now against 1-2 years later, and see if there are correlations (either positive or negative) between score shifts and indicators related to our reference feature work.

Perhaps https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/ORES/Historified_scores ? Check the data source to understand how long the data is retained--do we need to copy a subset of the data in order to preserve it? Is this allowed according to our data retention policies? Are we confident that a comparable data source will still exist in 1-2 years? Can we be alerted before the data source becomes deprecated?

Results:

The Hive data source doesn't exist / never existed.
Articlequality scores are stored in mysql under the ores_classification table and could be queried or exported in bulk.
These rows are written by the ORES extension. The data source is definitely "endangered" however may be safe over our 1-2 year time frame because RecentChanges filtering still depends on this data.
The only way to stay ahead of future changes is to communicate with the WMF machine learning team.
There should be no issues with data retention. The table includes data back to inception in 2018, and the contents of each row are purely numeric and internally-generated.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T345411 Scraper: destroy Cloud VPS runner instance
Resolved	None	T341751 Publish dump scraper reports
Resolved	None	T335411 Scraper: produce spreadsheet of scraped statistics for comparing wikis
Resolved	awight	T332032 Create baseline statistics for reference usage (2023)
Resolved	awight	T332057 Find efficient ORES articlequality data source

Event Timeline

awight created this task.Mar 14 2023, 4:48 PM

lilients_WMDE moved this task from Incoming to In progress on the WMDE-References-FocusArea board.Mar 15 2023, 2:15 PM

Sadly, the page about ORES in Hive was created by me many years ago, and as far as I can tell has always been wrong. There is no such data store at the moment.

awight closed this task as Resolved.Mar 22 2023, 11:26 AM

awight claimed this task.

awight updated the task description. (Show Details)

awight moved this task from Sprint Backlog to Done on the WMDE-TechWish-Sprint-2023-03-14 board.

awight moved this task from In progress to Done on the WMDE-References-FocusArea board.Oct 23 2024, 7:07 AM

Find efficient ORES articlequality data sourceClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Find efficient ORES articlequality data source
Closed, ResolvedPublic
Actions

Related Objects
Search...