[L] Prepare image suggestions for a new set of Wikipedias
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	mfossati
	Tue, May 7, 9:24 AM

Description

The Growth team is planning to expand add an image to 10 new Wikipedias as per T360059: Communication around scaling "add an image" to 10 more Wikipedias.
We already generate suggestions for those wikis:

isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2024-04-01" and wiki in ("ckbwiki", "frrwiki", "hywiki", "jvwiki", "kuwiki", "newiki", "pawiki", "simplewiki", "sqwiki", "skwiki")')
alis = isu.where('section_index is null')
slis = isu.where('section_index is not null')
alis.groupBy('wiki').count().show(truncate=False), slis.groupBy('wiki').count().show(truncate=False)

+----------+-------+
|wiki      |count  |
+----------+-------+
|hywiki    |708904 |
|simplewiki|3124112|
|sqwiki    |4162703|
|frrwiki   |23313  |
|ckbwiki   |661384 |
|jvwiki    |1946997|
|newiki    |2731371|
|kuwiki    |532225 |
|pawiki    |2850709|
|skwiki    |5737200|
+----------+-------+

+----------+-----+
|wiki      |count|
+----------+-----+
|hywiki    |31074|
|simplewiki|34753|
|sqwiki    |12049|
|frrwiki   |281  |
|ckbwiki   |525  |
|jvwiki    |7144 |
|newiki    |3186 |
|kuwiki    |2399 |
|pawiki    |4732 |
|skwiki    |42531|
+----------+-----+

Tasks

manually check a random sample & determine general data quality
run detect_html_tables.py against the target wikis
eventually run check_bad_parsing.py against relevant wikis
put scripts' outputs on HDFS at analytics_platform_eng
check production run

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T340437 [EPIC] Data pipelines maintenance
		Open		None	T364374 [L] Prepare image suggestions for a new set of Wikipedias

Event Timeline

mfossati created this task.Tue, May 7, 9:24 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, May 7, 9:24 AM

mfossati added a parent task: T340437: [EPIC] Data pipelines maintenance .Tue, May 7, 9:25 AM

AUgolnikova-WMF added projects: Section-Level-Image-Suggestions, Image-Suggestions.Tue, May 7, 9:29 AM

AUgolnikova-WMF moved this task from Triage to Current Work on the Structured-Data-Backlog board.

AUgolnikova-WMF edited projects, added Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.

AUgolnikova-WMF moved this task from Incoming to Ready for Estimation on the Structured-Data-Backlog (Current Work) board.

KStoller-WMF subscribed.Tue, May 7, 1:31 PM

Since the Growth team is committed to scaling to 10 more wikis, we wanted to suggest "backup" wikis that we could consider scaling to if any of the ten defined in this task have low data quality. Backup wikis to consider scaling to:

nlwiki
cewiki
thwiki
lvwiki

KStoller-WMF mentioned this in T360059: Communication around scaling "add an image" to 10 more Wikipedias.Tue, May 7, 5:24 PM

Trizek-WMF subscribed.Mon, May 13, 6:58 PM

MarkTraceur renamed this task from Prepare image suggestions for a new set of Wikipedias to [L] Prepare image suggestions for a new set of Wikipedias.Wed, May 15, 4:36 PM

MarkTraceur moved this task from Ready for Estimation to Ready for Development on the Structured-Data-Backlog (Current Work) board.

KStoller-WMF mentioned this in T360060: Scale "add an image" to 10 more Wikipedias.Fri, May 24, 4:26 PM

@AUgolnikova-WMF - I just wanted to check in. Is it still possible for Structured Content to complete this task in early June, so Growth has time to complete T360060: Scale "add an image" to 10 more Wikipedias by the end of June?

[L] Prepare image suggestions for a new set of WikipediasOpen, Needs TriagePublicActions

Description

Tasks

Related ObjectsSearch...

Event Timeline

[L] Prepare image suggestions for a new set of Wikipedias
Open, Needs TriagePublic
Actions

Related Objects
Search...