Page MenuHomePhabricator

Cparle (Cormac Parle)
Staff Software Engineer, Structured Data team, Wikimedia Foundation

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2017, 4:16 PM (291 w, 7 h)
Availability
Available
IRC Nick
cormacparle
LDAP User
Cparle
MediaWiki User
CParle (WMF) [ Global Accounts ]

Recent Activity

Thu, Mar 16

Cparle moved T330667: [M] Make sure DAGs are run in the correct order from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Thu, Mar 16, 3:25 PM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Level-Image-Suggestions, Epic
Cparle claimed T330667: [M] Make sure DAGs are run in the correct order.
Thu, Mar 16, 3:24 PM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Level-Image-Suggestions, Epic
Cparle added a comment to T328670: Add section title column to image_suggestions.suggestions table schema.

We're adding a single (un-indexed) attribute named section_heading of type text, correct?

Correct - section_heading is the only thing clients need, so we're keeping it as simple as possible for now

Thu, Mar 16, 10:27 AM · Cassandra, Section-Level-Image-Suggestions, Structured-Data-Backlog

Tue, Mar 14

Cparle added a comment to T328670: Add section title column to image_suggestions.suggestions table schema.

Good catch @xcollazo - I edited the task description

Tue, Mar 14, 4:13 PM · Cassandra, Section-Level-Image-Suggestions, Structured-Data-Backlog
Cparle updated the task description for T328670: Add section title column to image_suggestions.suggestions table schema.
Tue, Mar 14, 4:12 PM · Cassandra, Section-Level-Image-Suggestions, Structured-Data-Backlog
Cparle closed T323505: [L] Exclude sections-tables from having section topics as Resolved.

Merged, closing. Follow-up work in T330848

Tue, Mar 14, 12:22 PM · Section-Topics, Structured-Data-Backlog (Current Work)
Cparle closed T323505: [L] Exclude sections-tables from having section topics, a subtask of T311745: [EPIC] Section topics data pipeline, as Resolved.
Tue, Mar 14, 12:22 PM · Data Pipelines, Research-Backlog, Structured-Data-Backlog (Current Work), Section-Topics, Epic

Mon, Mar 13

Cparle updated subscribers of T311825: [M] Create the section-level image suggestions Airflow DAG.

I propose that we just add this to the image-suggestions DAG rather than it having its own DAG. @mfossati @xcollazo @matthiasmullie what do you think?

Mon, Mar 13, 4:32 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle updated subscribers of T328670: Add section title column to image_suggestions.suggestions table schema.

Ok myself and @mfossati agree that this is no longer blocked.

Mon, Mar 13, 2:39 PM · Cassandra, Section-Level-Image-Suggestions, Structured-Data-Backlog

Fri, Mar 10

Cparle closed T322669: [L] Do not recommend images that are already on the page, a subtask of T311814: [EPIC] Section-level image suggestions data pipeline, as Resolved.
Fri, Mar 10, 4:19 PM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Level-Image-Suggestions, Research-Backlog, Epic
Cparle closed T322669: [L] Do not recommend images that are already on the page as Resolved.
Fri, Mar 10, 4:19 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Mon, Mar 6

Cparle updated the task description for T331048: Filter out icons from section-level image suggestions.
Mon, Mar 6, 5:20 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Fri, Mar 3

Cparle moved T322669: [L] Do not recommend images that are already on the page from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Fri, Mar 3, 5:46 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle claimed T322669: [L] Do not recommend images that are already on the page.
Fri, Mar 3, 12:15 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T322669: [L] Do not recommend images that are already on the page from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Fri, Mar 3, 12:15 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Feb 10 2023

Cparle updated subscribers of T328672: [M] Populate Hive tables that will feed Cassandra.

Update on the suggestions part ... I have altered the image_suggestions_suggestions table in my own hive db (the hql query to do so is here) to hold section_heading, but I'm getting an error writing the altered table when I run the pipeline script on stat1007

Feb 10 2023, 12:32 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Feb 8 2023

Cparle moved T322668: [S] Exclude certain articles from having section-level image suggestions from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Feb 8 2023, 5:14 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T311730: [L] Exclude certain sections from having generated image suggestions from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Feb 8 2023, 5:13 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle updated the task description for T311730: [L] Exclude certain sections from having generated image suggestions.
Feb 8 2023, 5:08 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle created T329202: Exclude empty sections from section-image-suggestions.
Feb 8 2023, 5:05 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle closed T329197: Exclude sections that are lists from section-image-suggestions as Invalid.

Already covered by T323505 (and done!)

Feb 8 2023, 4:53 PM · Section-Level-Image-Suggestions
Cparle updated the task description for T329197: Exclude sections that are lists from section-image-suggestions.
Feb 8 2023, 4:47 PM · Section-Level-Image-Suggestions
Cparle created T329197: Exclude sections that are lists from section-image-suggestions.
Feb 8 2023, 4:47 PM · Section-Level-Image-Suggestions
Cparle added a comment to T326215: Exclude section that are lists from having image suggestions.

Reopening as we want to split up T311814

Feb 8 2023, 4:44 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog
Cparle reopened T326215: Exclude section that are lists from having image suggestions, a subtask of T311814: [EPIC] Section-level image suggestions data pipeline, as Open.
Feb 8 2023, 4:44 PM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Level-Image-Suggestions, Research-Backlog, Epic
Cparle reopened T326215: Exclude section that are lists from having image suggestions as "Open".
Feb 8 2023, 4:44 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog
Cparle added a comment to T322668: [S] Exclude certain articles from having section-level image suggestions.

Already implemented as part of T311829 (in progress)

Feb 8 2023, 3:57 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T322668: [S] Exclude certain articles from having section-level image suggestions from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Feb 8 2023, 3:55 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T311730: [L] Exclude certain sections from having generated image suggestions from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Feb 8 2023, 3:55 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle created T329191: Use existing section images to evaluate section-level image suggestions.
Feb 8 2023, 3:48 PM · Structured-Data-Backlog, Section-Level-Image-Suggestions
Cparle added a comment to T327933: We prioritise article level over section level image suggestions for unillustrated articles in notifications for ECs.

As we have no API, we (SD) have no control over how the data is read, only how it is written

Feb 8 2023, 2:48 PM · Structured-Data-Backlog, Section-Level-Image-Suggestions

Feb 7 2023

Cparle added a comment to T316149: [L] Create tool for manual evaluation of section-level image suggestions.

Rating summary by wiki on Tues Feb 7

Feb 7 2023, 4:42 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a comment to T316151: Manually evaluate section-level image suggestions.

@Ankan_WMF there should be more suggestions available for bnwiki now

Feb 7 2023, 11:15 AM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Feb 3 2023

Cparle created T328789: [M] Section image suggestions data pipeline monitoring.
Feb 3 2023, 4:42 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Jan 30 2023

Cparle added a comment to T320831: Section Level Image Suggestions - Data Persistence Request.

Did we decide definitively which fields need to be added to the data model? If not then we ought to asap ...

Jan 30 2023, 10:33 AM · Data-Engineering-Planning, Section-Level-Image-Suggestions, Cassandra, Image-Suggestions

Jan 27 2023

Cparle added a comment to T314149: Configure beta wikis and local development environments to use new image suggestions API.

The existing production workflow is probably not easily adapted to beta cluster (cc @Cparle -- does that sound correct?)

Jan 27 2023, 1:58 PM · Growth-Team, Image-Suggestion-API, GrowthExperiments

Jan 19 2023

Cparle added a comment to T318722: [L] Experiment with adding section information to schema.org metadata.

Got it, thanks. Can we trigger the explicit indexing today? I'm happy to take on running searches regularly to check.

Jan 19 2023, 2:41 PM · Structured-Data-Backlog (Current Work), Section-Topics

Jan 16 2023

Cparle added a comment to T326884: Update Structured Data Team-owned products that may be affected by IP Masking.

Let's begin by checking if this materially affects us, and if so create separate tickets for updating of our products that are affected

Jan 16 2023, 5:39 PM · Structured-Data-Backlog, IP Masking
Cparle added a comment to T321785: Exclude articles with infoboxes from image-suggestion-notification-for-experienced-users.

We're not inside MW though - we need to be able to do this from python script running on an airflow machine

Jan 16 2023, 11:00 AM · Structured-Data-Backlog, Image-Suggestions

Jan 13 2023

Cparle added a comment to T321785: Exclude articles with infoboxes from image-suggestion-notification-for-experienced-users.

We can figure out which templates are infoboxes for a particular wiki by extracting the must_not queries on template.keyword in the response to https://<wiki>.wikipedia.org/w/index.php?title=Special:Search&cirrusDumpQuery=&ns0=1&search=hasrecommendation%3Aimage+-hastemplatecollection%3Ainfobox

Jan 13 2023, 3:08 PM · Structured-Data-Backlog, Image-Suggestions
Cparle added a comment to T321779: Growth tasks API: provide a way to retrieve ElasticSearch query for TaskSetFilters.

No, I don't think so. Our problem is we don't know which templates are in the template collection ... actually though I see now (looking at searchDebugUrls) that we can figure that out for any wiki by picking out the must_not bits from a query like https://cs.wikipedia.org/w/index.php?title=Speci%C3%A1ln%C3%AD:Hled%C3%A1n%C3%AD&cirrusDumpQuery=&ns0=1&search=hasrecommendation%3Aimage+-hastemplatecollection%3Ainfobox

Jan 13 2023, 3:06 PM · GrowthExperiments, Growth-Team

Jan 12 2023

Cparle added a comment to T315976: [L] Build experimental dataset.

Added numbers for intersections to the table above (https://phabricator.wikimedia.org/T315976#8456730) so I think this can closed now @mfossati ?

Jan 12 2023, 6:12 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog

Jan 10 2023

Cparle claimed T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code.
Jan 10 2023, 2:30 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Jan 10 2023, 2:29 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Jan 9 2023

Cparle added a comment to T323614: [M] Reduce image_suggestion HDFS files footprint.

Should this be in "code review" instead of blocked?

Jan 9 2023, 5:21 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions

Dec 20 2022

Cparle added a comment to T323614: [M] Reduce image_suggestion HDFS files footprint.

No - I'm probably being over-cautious, we've never needed to go back and regenerate old data so far, and I can't think why we'd need to. Doing what everyone else is doing is fine with me

Dec 20 2022, 4:05 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T325629: Ignore main pages when gathering lead image data for image-suggestions.
Dec 20 2022, 1:44 PM · Structured-Data-Backlog, Section-Level-Image-Suggestions, Image-Suggestions
Cparle renamed T325629: Ignore main pages when gathering lead image data for image-suggestions from Consider ignoring frequently-modified pages when gathering lead image data for image-suggestions to Ignore main pages when gathering lead image data for image-suggestions.
Dec 20 2022, 1:43 PM · Structured-Data-Backlog, Section-Level-Image-Suggestions, Image-Suggestions
Cparle closed T317138: [XL] Bug in lead image data in image-suggestions data pipeline as Resolved.

Ok, so I don't think there's a bug after all

Dec 20 2022, 1:22 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle created T325629: Ignore main pages when gathering lead image data for image-suggestions.
Dec 20 2022, 1:22 PM · Structured-Data-Backlog, Section-Level-Image-Suggestions, Image-Suggestions
Cparle claimed T317138: [XL] Bug in lead image data in image-suggestions data pipeline.
Dec 20 2022, 11:30 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle moved T317138: [XL] Bug in lead image data in image-suggestions data pipeline from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Dec 20 2022, 11:30 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a comment to T323614: [M] Reduce image_suggestion HDFS files footprint.

It's probably worth keeping some of the data, just in case. The last 4 snapshots, perhaps? And maybe the first one from each month for the last 6 months - so a total of 10. If there's a need for other old data we can always regenerate it from the source data

Dec 20 2022, 10:04 AM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions

Dec 19 2022

Cparle added a comment to T316149: [L] Create tool for manual evaluation of section-level image suggestions.

Code for the tool here https://gitlab.wikimedia.org/toolforge-repos/section-image-suggestions-test

Dec 19 2022, 5:21 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T316149: [L] Create tool for manual evaluation of section-level image suggestions from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Dec 19 2022, 5:19 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Dec 14 2022

Cparle added a subtask for T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code: T322668: [S] Exclude certain articles from having section-level image suggestions.
Dec 14 2022, 5:52 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a parent task for T322668: [S] Exclude certain articles from having section-level image suggestions: T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code.
Dec 14 2022, 5:52 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a comment to T311829: [XL] Combine suggestions based on section topics with section alignment ones and convert notebook code into idiomatic data pipeline code.

Note that the above notebooks have been combined in https://gitlab.wikimedia.org/cparle/notebooks/-/blob/main/section_image_suggestions_data.ipynb ... still needs to be productionized though

Dec 14 2022, 5:47 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a comment to T322669: [L] Do not recommend images that are already on the page.

FYI research's code for section-alignment already generates a parquet with sections-with-images, so we can probably use this as an input

Dec 14 2022, 5:44 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a comment to T315976: [L] Build experimental dataset.

So ... can we count this as done?

Dec 14 2022, 4:21 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog
Cparle added a comment to T311730: [L] Exclude certain sections from having generated image suggestions.

Discussed with @AUgolnikova-WMF and we agreed that what we have already is adequate for this stage of the project, and we can revisit the community config after the MVP stage

Dec 14 2022, 3:07 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a comment to T323505: [L] Exclude sections-tables from having section topics.

Can we add lists to this too? Some sections are entirely enclosed with <ul></ul> tags

Dec 14 2022, 11:33 AM · Section-Topics, Structured-Data-Backlog (Current Work)
Cparle added a comment to T324588: Schedule the section-image-recommendations based on section-alignment code.

Yeah fair point, maybe we should bring @MunizaA 's code into our repo and call it from our DAG

Dec 14 2022, 10:52 AM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Dec 13 2022

Cparle added a comment to T324138: "Add an image" for established editors.

In October we sent ~18.5k notifications and 264 images were added as a result, so the work/impact ratio for notifications-for-experienced-users has so far been rather low. Are we sure about expending more effort on something that has made so little impact so far?

Dec 13 2022, 12:05 PM · Structured-Data-Backlog, Growth-Structured-Tasks, Image-Suggestions, Growth-Team

Dec 12 2022

Cparle added a comment to T311818: Convert section-level image suggestions notebook code into idiomatic data pipeline code.

Notebook on which this might be based can be found here https://gitlab.wikimedia.org/cparle/notebooks/-/blob/main/section_image_suggestions_data.ipynb

Dec 12 2022, 4:50 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Dec 9 2022

Cparle claimed T316149: [L] Create tool for manual evaluation of section-level image suggestions.

Here's a sample of the data generated by T315976 (approx 2000 suggestions per wiki, 1000 generated via section topics and 1000 via section alignment)

Dec 9 2022, 6:10 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle moved T316149: [L] Create tool for manual evaluation of section-level image suggestions from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Dec 9 2022, 6:08 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle reassigned T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results from Cparle to Eevans.
Dec 9 2022, 4:28 PM · Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Cparle removed a project from T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results: Structured-Data-Backlog (Current Work).
Dec 9 2022, 4:27 PM · Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Cparle moved T316149: [L] Create tool for manual evaluation of section-level image suggestions from Blocked to Ready for Development on the Structured-Data-Backlog (Current Work) board.
Dec 9 2022, 1:14 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

@Eevans ... I won't close this if you're still working on it, but I might take it off our board if that's ok?

Dec 9 2022, 1:06 PM · Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Cparle moved T315976: [L] Build experimental dataset from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Dec 9 2022, 1:05 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog
Cparle claimed T315976: [L] Build experimental dataset.
Dec 9 2022, 1:05 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog
Cparle updated the task description for T315976: [L] Build experimental dataset.
Dec 9 2022, 1:04 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog
Cparle added a comment to T315976: [L] Build experimental dataset.
section-alignment suggestionssection-topics-plus-p18 suggestionsintersection
enwiki2480355033715114536
ptwiki*148838147934584
idwiki7561816773782103
ruwiki267413118650987743
arwiki9788632263472828
bnwiki28796406662213
eswiki2155931174791610621
cswiki12483439013334644
frwiki2596041644638110244
Dec 9 2022, 1:04 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog

Dec 8 2022

Cparle added a comment to T315976: [L] Build experimental dataset.

Sample dataset for enwiki

Dec 8 2022, 6:11 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work), Research-Backlog

Dec 6 2022

Cparle renamed T324590: Transfer responsibility for model used in section-alignment to the ML team from Transfer responsibility of model used in section-alignment to the ML team to Transfer responsibility for model used in section-alignment to the ML team.
Dec 6 2022, 4:27 PM · Research, Structured-Data-Backlog, Section-Level-Image-Suggestions
Cparle created T324590: Transfer responsibility for model used in section-alignment to the ML team.
Dec 6 2022, 4:27 PM · Research, Structured-Data-Backlog, Section-Level-Image-Suggestions
Cparle created T324588: Schedule the section-image-recommendations based on section-alignment code.
Dec 6 2022, 4:21 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions

Dec 5 2022

Cparle moved T323614: [M] Reduce image_suggestion HDFS files footprint from Incoming to Ready for Estimation on the Structured-Data-Backlog (Current Work) board.
Dec 5 2022, 5:43 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle moved T324125: Figure out a good place for static HDFS helper files for the structured data team. from Incoming to Ready for Estimation on the Structured-Data-Backlog (Current Work) board.
Dec 5 2022, 5:43 PM · Structured-Data-Backlog, Section-Topics, Data Pipelines
Cparle edited projects for T323614: [M] Reduce image_suggestion HDFS files footprint, added: Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.
Dec 5 2022, 5:43 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle edited projects for T324125: Figure out a good place for static HDFS helper files for the structured data team., added: Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.
Dec 5 2022, 5:42 PM · Structured-Data-Backlog, Section-Topics, Data Pipelines
Cparle added a comment to T323340: Stop producing and cleanup wbmi-mediasearch-entities_* spam on WANObjectCache grafana dashboard.

so ... can this be closed @aaron ?

Dec 5 2022, 5:37 PM · MW-1.40-notes (1.40.0-wmf.19; 2023-01-16), Structured-Data-Backlog (Current Work), Performance-Team, WikibaseMediaInfo
Cparle updated subscribers of T322015: [S] SearchPreview - Performance Review - Change the TOC link to improve CDN caching.

@matthiasmullie who signs off on this? is it @Krinkle ?

Dec 5 2022, 5:19 PM · Performance-Team (Radar), MW-1.40-notes (1.40.0-wmf.12; 2022-11-28), Structured-Data-Backlog (Current Work), SDAW-Search-Improvements (Milestone 2: QuickView MVP)

Dec 2 2022

Cparle created T324321: Add option to imagerec/recommendation.py to exclude sections that already have images.
Dec 2 2022, 11:27 AM · Research, Structured-Data-Backlog, Section-Level-Image-Suggestions

Nov 29 2022

Cparle added a comment to T318722: [L] Experiment with adding section information to schema.org metadata.

All page have schema.org information, and the following pages have additional schema.org information about some of their sections

Nov 29 2022, 4:28 PM · Structured-Data-Backlog (Current Work), Section-Topics
Cparle moved T318722: [L] Experiment with adding section information to schema.org metadata from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Nov 29 2022, 4:19 PM · Structured-Data-Backlog (Current Work), Section-Topics

Nov 28 2022

Cparle added a comment to T323561: Remove old data from instanceof_cache and title_cache in image_suggestions.suggestions in Cassandra.

Yeah we should be able to suss it out from Hive I think ... I don't think we can truncate because Growth are using the data all the time

Nov 28 2022, 12:21 PM · Structured-Data-Backlog, Image-Suggestions
Cparle updated the task description for T323561: Remove old data from instanceof_cache and title_cache in image_suggestions.suggestions in Cassandra.
Nov 28 2022, 12:20 PM · Structured-Data-Backlog, Image-Suggestions

Nov 25 2022

Cparle added a comment to T311730: [L] Exclude certain sections from having generated image suggestions.

Out of scope: Excluding suggestions based on custom community configuration (like excluding articles with certain categories or templates). Can be done on the frontend. It's unrelated to the API, specific to the Growth use case, and easy to implement within the frontend's search query construction logic.

Nov 25 2022, 5:12 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
Cparle updated the task description for T323504: Exclude additional section titles.
Nov 25 2022, 5:04 PM · Structured-Data-Backlog (Current Work), Data Pipelines, Section-Topics
Cparle added a comment to T311730: [L] Exclude certain sections from having generated image suggestions.

As a follow-on from @kostajh 's comment above ... I think this data is just what we need - if a section should be excluded from getting link recommendations it's probably a safe bet to assume it shouldn't have images either. Perhaps we could grab the data from there via an api call for each relevant wiki at the start of the data pipeline? It'd be easy to grab the json from a call like this, and parse it to get the sections we want to exclude

Nov 25 2022, 4:58 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions
fgiunchedi awarded T312235: [L] Image suggestions data pipeline monitoring a Like token.
Nov 25 2022, 11:17 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle closed T322866: Calculate pollution rate of illustrated infoboxes in article image suggestions as Resolved.
Nov 25 2022, 10:50 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle closed T312235: [L] Image suggestions data pipeline monitoring as Resolved.

OK all working now, hooray!

Nov 25 2022, 10:44 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

It's in /home/cparle on stat1008, also in hdfs:///user/cparle/all_page_with_suggestions_20221027.csv

Nov 25 2022, 10:43 AM · Growth-Team (Current Sprint), Structured Data Engineering, Cassandra

Nov 23 2022

Cparle added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

I assume, that whatever process loaded data into the suggestions table, also loaded into instanceof_cache and title_cache, is this assumption safe?

Nov 23 2022, 9:52 AM · Growth-Team (Current Sprint), Structured Data Engineering, Cassandra

Nov 22 2022

Cparle added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

New ticket for cleaning up the other tables T323561

Nov 22 2022, 11:09 AM · Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Cparle created T323561: Remove old data from instanceof_cache and title_cache in image_suggestions.suggestions in Cassandra.
Nov 22 2022, 11:08 AM · Structured-Data-Backlog, Image-Suggestions