Page MenuHomePhabricator

Cparle (Cormac Parle)
Staff Software Engineer, Structured Data team, Wikimedia Foundation

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2017, 4:16 PM (253 w, 5 d)
Availability
Available
IRC Nick
cormacparle
LDAP User
Cparle
MediaWiki User
CParle (WMF) [ Global Accounts ]

Recent Activity

Fri, Jul 1

Cparle moved T311840: Fewer than expected articles have the 'has suggestion' flag set in elasticsearch from Incoming to Doing on the Structured-Data-Backlog (Current Work) board.
Fri, Jul 1, 1:34 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle edited projects for T311840: Fewer than expected articles have the 'has suggestion' flag set in elasticsearch, added: Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.
Fri, Jul 1, 1:34 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added projects to T311840: Fewer than expected articles have the 'has suggestion' flag set in elasticsearch: Image-Suggestions, Structured-Data-Backlog.
Fri, Jul 1, 1:33 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated subscribers of T311840: Fewer than expected articles have the 'has suggestion' flag set in elasticsearch.

@EBernhardson re-imported the data, and now for ptwiki at least we have ~130k articles again. Still trying to see what went wrong

Fri, Jul 1, 1:32 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T311840: Fewer than expected articles have the 'has suggestion' flag set in elasticsearch.
Fri, Jul 1, 1:31 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle created T311840: Fewer than expected articles have the 'has suggestion' flag set in elasticsearch.
Fri, Jul 1, 1:31 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

Thu, Jun 30

Cparle updated subscribers of T311476: Unable to get list of more than 10k pages with recommendations.

Here's what's involved in doing a Cassandra-based solution

Thu, Jun 30, 2:39 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a comment to T311476: Unable to get list of more than 10k pages with recommendations.

One other option:

Thu, Jun 30, 12:50 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), Image-Suggestions

Mon, Jun 27

Cparle closed T311220: title_cache endpoint for image suggestions api doesn't work as Resolved.
Mon, Jun 27, 1:53 PM · Image-Suggestions, Image-Suggestion-API
Cparle added a comment to T311220: title_cache endpoint for image suggestions api doesn't work.

LGTM too, thanks everyone

Mon, Jun 27, 1:53 PM · Image-Suggestions, Image-Suggestion-API
Cparle added a comment to T311220: title_cache endpoint for image suggestions api doesn't work.

I thought @hnowlan 's patch meant that this was deployed but it's still not working so I guess not?

Mon, Jun 27, 10:03 AM · Image-Suggestions, Image-Suggestion-API

Thu, Jun 23

Cparle created T311220: title_cache endpoint for image suggestions api doesn't work.
Thu, Jun 23, 10:43 AM · Image-Suggestions, Image-Suggestion-API

Tue, Jun 7

Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

This can probably be closed now?

Tue, Jun 7, 10:24 AM · Discovery-Search (Current work), Image-Suggestions

May 30 2022

Cparle moved T277301: [L] Create script to add existing images on Commons from specific categories to the popular CAT queue from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
May 30 2022, 4:18 PM · MW-1.38-notes (1.38.0-wmf.6; 2021-10-26), Structured-Data-Backlog (Current Work)

May 25 2022

Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

Ok looks like the data has imported correctly, hooray!

May 25 2022, 11:20 AM · Discovery-Search (Current work), Image-Suggestions
Cparle added a comment to T277301: [L] Create script to add existing images on Commons from specific categories to the popular CAT queue.

Done, processed 58495 files

May 25 2022, 10:19 AM · MW-1.38-notes (1.38.0-wmf.6; 2021-10-26), Structured-Data-Backlog (Current Work)

May 24 2022

Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

Ideal except for we'd have to re-rewrite a bunch of code ...

May 24 2022, 2:32 PM · Discovery-Search (Current work), Image-Suggestions
Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

I have no idea why that is ... we're just using df.write.saveAsTable(). Is there any config we can do to improve this?

May 24 2022, 9:38 AM · Discovery-Search (Current work), Image-Suggestions

May 23 2022

Cparle added a comment to T307356: Image suggestions data pipeline onboarding request.

Waiting for the patch to be merged before closing this

May 23 2022, 4:24 PM · Structured-Data-Backlog (Current Work), Image-Suggestions, Generated Data Platform
Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

yeah, basically it's one dataset - we didn't think of it that way at the start, but it turns out the data is the same shape for both so it's all in the same table

May 23 2022, 3:00 PM · Discovery-Search (Current work), Image-Suggestions

May 20 2022

Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

Ok so now the data is being written to the tables image_suggestions_search_index_full and image_suggestions_search_index_delta in the hive db analytics_platform_eng. Partitioned by a snapshot column in the format yyyy-mm-dd

May 20 2022, 2:41 PM · Discovery-Search (Current work), Image-Suggestions
Cparle added a comment to T305851: Import has-suggestions flags to search indices.

Data is being written to the tables image_suggestions_search_index_full and image_suggestions_search_index_delta in the hive db analytics_platform_eng. Partitioned by a snapshot column in the format yyyy-mm-dd

May 20 2022, 2:37 PM · Discovery-Search (Current work), Image-Suggestions

May 12 2022

Cparle added a comment to T293878: [L] Gather labeled data relevant to synonyms.

Still, my personal feeling is that we should target the overall Commons search system effectiveness for users, rather than focusing on eventual recall changes due to the activation of a feature.

May 12 2022, 11:13 AM · Structured-Data-Backlog

May 11 2022

Cparle added a comment to T293878: [L] Gather labeled data relevant to synonyms.

Marco's sampled the search terms from the logs based on a mixture of popularity and random, but just looking at the sampled search terms for French, for example, very few of them match up with wikidata labels and therefore won't have any synonyms ... and seeing that the point of this exercise is to capture the effect of the synonyms patch, we've probably been barking up the wrong tree.

May 11 2022, 5:24 PM · Structured-Data-Backlog
Cparle added a comment to T293878: [L] Gather labeled data relevant to synonyms.

Update on this ticket - looking at the data I'm not sure that what we've gathered is capturing the effect of the synonyms patch, and I think we might need to curate it more carefully.

May 11 2022, 10:14 AM · Structured-Data-Backlog

May 10 2022

Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

One limitation of the current import scripts is they expect everything to be sourced from partitioned hive tables. Typically we partition by a date col of the airflow execution date. Would it take much to arrange these into a partitioned table? Saving to hive partitions might also resolve the permissions issues by way of different defaults, although I'm not entirely sure.

May 10 2022, 8:23 AM · Discovery-Search (Current work), Image-Suggestions
Cparle created T307983: Write search index data for image suggestions into a hive table rather than local hdfs files.
May 10 2022, 8:22 AM · Structured-Data-Backlog (Current Work), Image-Suggestions

May 9 2022

Cparle added a comment to T283865: [XL] Estimate coverage of image suggestions at different confidence levels.

@Cparle which confidence level are we using in the current iteration of the data pipeline?

May 9 2022, 4:36 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

May 4 2022

Cparle moved T283865: [XL] Estimate coverage of image suggestions at different confidence levels from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
May 4 2022, 10:13 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a comment to T283865: [XL] Estimate coverage of image suggestions at different confidence levels.

Ok to resolve this @CBogen ?

May 4 2022, 10:12 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T283865: [XL] Estimate coverage of image suggestions at different confidence levels.
May 4 2022, 10:11 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a comment to T283865: [XL] Estimate coverage of image suggestions at different confidence levels.

Confidence >= 90%

May 4 2022, 10:11 AM · Structured-Data-Backlog (Current Work), Image-Suggestions

May 3 2022

Cparle closed T307092: Prune suggestions from previous iterations from Cassandra, a subtask of T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra, as Declined.
May 3 2022, 8:43 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle closed T307092: Prune suggestions from previous iterations from Cassandra as Declined.

Having spoken to @Eevans about this I'm going to close this ticket. Because the script runs only once a week there's no way to completely prevent out of date data from being served to users, and keeping the has-suggestion flags up to date in the wiki search indices should prevent the particular problem we were trying to fix with this ticket from ever reaching users

May 3 2022, 8:43 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 28 2022

Cparle claimed T307092: Prune suggestions from previous iterations from Cassandra.
Apr 28 2022, 10:54 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle created T307092: Prune suggestions from previous iterations from Cassandra.
Apr 28 2022, 10:50 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 27 2022

Cparle closed T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra as Resolved.

Merged, closing

Apr 27 2022, 2:44 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle closed T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra, a subtask of T296814: [EPIC] Create airflow job to gather image recommendations and push to various persistence layers, as Resolved.
Apr 27 2022, 2:44 PM · Image-Suggestions, Epic, Structured-Data-Backlog (Current Work)
Cparle closed T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra, a subtask of T299781: [EPIC] Image suggestions backend , as Resolved.
Apr 27 2022, 2:44 PM · Image-Suggestion-API, Image-Suggestions, Epic, Structured-Data-Backlog (Current Work)

Apr 26 2022

Cparle added a comment to T299890: [M] Exclude previously rejected image suggestions when generating new suggestions.

Patch written https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/tree/T299890-exclude-rejections but can't fully test it until the schema is deployed (see https://gitlab.wikimedia.org/repos/generated-data-platform/topics/image-suggestions-feedback/-/merge_requests/1)

Apr 26 2022, 2:23 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle moved T299890: [M] Exclude previously rejected image suggestions when generating new suggestions from Ready for Development to Doing on the Structured-Data-Backlog (Current Work) board.
Apr 26 2022, 10:50 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 25 2022

Cparle added a comment to T306349: Public-facing API for image suggestions data.

As an alternative to a public API, we could provide a flat file containing all image suggestions for any wiki quite easily (in .csv format or similar)

Apr 25 2022, 4:45 PM · Structured-Data-Backlog, Image-Suggestions, Foundational Technology Requests
Cparle moved T299884: Prepare has-recommendation data for import to wiki search indices from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Apr 25 2022, 10:00 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle added a comment to T299884: Prepare has-recommendation data for import to wiki search indices.

https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/merge_requests/46

Apr 25 2022, 9:59 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 21 2022

Cparle closed T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs, a subtask of T299781: [EPIC] Image suggestions backend , as Resolved.
Apr 21 2022, 2:43 PM · Image-Suggestion-API, Image-Suggestions, Epic, Structured-Data-Backlog (Current Work)
Cparle closed T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs, a subtask of T296814: [EPIC] Create airflow job to gather image recommendations and push to various persistence layers, as Resolved.
Apr 21 2022, 2:43 PM · Image-Suggestions, Epic, Structured-Data-Backlog (Current Work)
Cparle closed T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs as Resolved.

... aaand merged. Closing the ticket

Apr 21 2022, 2:43 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle added a comment to T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs.

Ok ran the old IMA again but with the data-loss part fixed (I think!), and now from it we're getting suggestions for 82399 articles, 81177 of which are also suggested by the new pipeline

Apr 21 2022, 1:33 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 20 2022

Cparle closed T303816: [L] Refactor data pipeline queries, a subtask of T296814: [EPIC] Create airflow job to gather image recommendations and push to various persistence layers, as Resolved.
Apr 20 2022, 4:18 PM · Image-Suggestions, Epic, Structured-Data-Backlog (Current Work)
Cparle closed T303816: [L] Refactor data pipeline queries as Resolved.

I think all remaining refactoring work has been done, so closing

Apr 20 2022, 4:18 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T299884: Prepare has-recommendation data for import to wiki search indices.
Apr 20 2022, 9:29 AM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 19 2022

Cparle added a comment to T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs.

Ok this is pretty much done. We're writing to Hive instead of hdfs, to make it easier to export to Cassandra

Apr 19 2022, 3:17 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 11 2022

Cparle edited projects for T298684: [M] Add 'custommatch' params to commons config for searching media files using wikidata ids, added: Structured-Data-Backlog; removed Structured-Data-Backlog (Current Work).
Apr 11 2022, 4:50 PM · Structured-Data-Backlog
Cparle added a comment to T298684: [M] Add 'custommatch' params to commons config for searching media files using wikidata ids.

We've changed our approach to calculating confidence scores, and are now estimating them before storing image suggestions. This ticket is therefore no longer necessary for image suggestions, as we don't have another use case for getting images for a particular Q-id

Apr 11 2022, 4:49 PM · Structured-Data-Backlog
Cparle renamed T299884: Prepare has-recommendation data for import to wiki search indices from Write has-recommendation flags to relevant wikis to Prepare has-recommendation data for import to wiki search indices.
Apr 11 2022, 2:45 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle created T305851: Import has-suggestions flags to search indices.
Apr 11 2022, 2:13 PM · Discovery-Search (Current work), Image-Suggestions

Apr 8 2022

Cparle updated the task description for T299884: Prepare has-recommendation data for import to wiki search indices.
Apr 8 2022, 4:16 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Apr 7 2022

Cparle updated the task description for T299890: [M] Exclude previously rejected image suggestions when generating new suggestions.
Apr 7 2022, 2:32 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle added a comment to T302925: [SPIKE] Investigate and Decide on Solution for Image Suggestions Feedback.

A potential issue with the 2-3 hour delay will be that image suggestions that have been rejected could be potentially resurfaced to users.

Apr 7 2022, 2:28 PM · Patch-For-Review, Image-Suggestions, Image-Suggestion-API, Spike, Generated Data Platform
Cparle updated the task description for T299890: [M] Exclude previously rejected image suggestions when generating new suggestions.
Apr 7 2022, 1:57 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle updated the task description for T305615: Performance review of Extension:ImageSuggestions.
Apr 7 2022, 1:02 PM · Performance-Team (Radar), Image-Suggestions, Structured-Data-Backlog (Current Work)

Apr 6 2022

Cparle added a comment to T304954: Import data from hdfs to commonswiki_file.

We have example data files on hdfs

Apr 6 2022, 3:41 PM · Discovery-Search (Current work), Image-Suggestions
Cparle updated the task description for T304954: Import data from hdfs to commonswiki_file.
Apr 6 2022, 3:39 PM · Discovery-Search (Current work), Image-Suggestions
Cparle updated the task description for T304954: Import data from hdfs to commonswiki_file.
Apr 6 2022, 3:38 PM · Discovery-Search (Current work), Image-Suggestions

Apr 4 2022

Cparle moved T293878: [L] Gather labeled data relevant to synonyms from Blocked to Code Review on the Structured-Data-Backlog (Current Work) board.
Apr 4 2022, 3:50 PM · Structured-Data-Backlog

Mar 29 2022

Cparle updated the task description for T304954: Import data from hdfs to commonswiki_file.
Mar 29 2022, 2:31 PM · Discovery-Search (Current work), Image-Suggestions
Cparle created T304954: Import data from hdfs to commonswiki_file.
Mar 29 2022, 2:13 PM · Discovery-Search (Current work), Image-Suggestions
Cparle added a comment to T294468: [SPIKE] Decide on best approach for API access to Cassandra.

Also ... I don't think the way the data is being stored allows for that anyway. We store the user who has rejected an image, not the tool they were using at the time, see P21420 Perhaps this is what the comment field is intended for? Not sure.

Mar 29 2022, 1:53 PM · Spike, Generated Data Platform
Cparle added a comment to T294468: [SPIKE] Decide on best approach for API access to Cassandra.

It would mean that, yes

Mar 29 2022, 1:50 PM · Spike, Generated Data Platform
Cparle added a comment to T294468: [SPIKE] Decide on best approach for API access to Cassandra.

AIUI this plan sounds OK. To recap Growth team's existing use cases:

...

  1. Currently, when a user accepts or rejects an image suggestion, we enqueue a job with CirrusSearch to reset the weighted tag for hasrecommendation:image for that article. With the new API, we would also send an HTTP request to the feedback endpoint proposed in T294468#7748996. I assume that the search index updating code in T299884 (cc @Cparle) would take into account where an article has feedback before updating the weighted tag hasrecommendation:image for an article, or perhaps a new field like hasfeedback:image.rejected would be useful to someone.
Mar 29 2022, 9:23 AM · Spike, Generated Data Platform

Mar 21 2022

Cparle moved T283865: [XL] Estimate coverage of image suggestions at different confidence levels from Ready for Development to Blocked on the Structured-Data-Backlog (Current Work) board.
Mar 21 2022, 10:29 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a comment to T283865: [XL] Estimate coverage of image suggestions at different confidence levels.

Moved this into blocked - it should be quite easy to do once we have T299789 done, so there's no point in wasting effort doing it before then

Mar 21 2022, 10:29 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle placed T296507: fetchSuggestions opens connection to depooled database after nine hours up for grabs.
Mar 21 2022, 10:17 AM · MW-1.39-notes (1.39.0-wmf.18; 2022-06-27), Structured-Data-Backlog (Current Work), MachineVision

Mar 14 2022

Cparle added a comment to T295369: Exclude biographies from image suggestions notifications.

In the source-of-truth for image suggestions (Cassandra, see T293808) we'll be storing the value of instance of for each article. This means we can exclude articles with instance of==Q5

Mar 14 2022, 5:44 PM · MW-1.39-notes (1.39.0-wmf.16; 2022-06-13), Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a comment to T283865: [XL] Estimate coverage of image suggestions at different confidence levels.

We'll need at least a preliminary dataset from to do this work

Mar 14 2022, 5:41 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T283865: [XL] Estimate coverage of image suggestions at different confidence levels.
Mar 14 2022, 5:33 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T295369: Exclude biographies from image suggestions notifications.
Mar 14 2022, 5:32 PM · MW-1.39-notes (1.39.0-wmf.16; 2022-06-13), Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle updated the task description for T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra.
Mar 14 2022, 5:31 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle updated the task description for T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs.
Mar 14 2022, 5:28 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle closed T208810: Consider changes to search as Resolved.

This ticket was a change to an interface that never made it to production and is no longer in development, so closing

Mar 14 2022, 4:39 PM · Structured-Data-Backlog, Structured-Data-Design, SDC Design

Mar 8 2022

Cparle updated the task description for T303271: Investigate improvements to the 'multimedia' widget on the search results page.
Mar 8 2022, 2:25 PM · Structured-Data-Backlog, WikibaseMediaInfo
Cparle updated the task description for T303271: Investigate improvements to the 'multimedia' widget on the search results page.
Mar 8 2022, 2:24 PM · Structured-Data-Backlog, WikibaseMediaInfo
Cparle created T303271: Investigate improvements to the 'multimedia' widget on the search results page.
Mar 8 2022, 2:24 PM · Structured-Data-Backlog, WikibaseMediaInfo

Mar 3 2022

Cparle added a comment to T294468: [SPIKE] Decide on best approach for API access to Cassandra.

ok grand

Mar 3 2022, 2:18 PM · Spike, Generated Data Platform
Cparle added a comment to T294468: [SPIKE] Decide on best approach for API access to Cassandra.

Will there be another API with some business logic to complement the generic API?

Mar 3 2022, 11:54 AM · Spike, Generated Data Platform

Feb 23 2022

Cparle closed T301687: Calculate image suggestions confidence score without using elasticsearch, a subtask of T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs, as Resolved.
Feb 23 2022, 6:45 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle closed T301687: Calculate image suggestions confidence score without using elasticsearch as Resolved.
Feb 23 2022, 6:45 PM · Structured-Data-Backlog (Current Work)
Cparle added a comment to T301687: Calculate image suggestions confidence score without using elasticsearch.

After running queries on the labeled data, it turns out the most reliable confidence score is simply based on the source of the match

Feb 23 2022, 6:15 PM · Structured-Data-Backlog (Current Work)
Cparle added a comment to T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra.

Note that we currently have a notebook for gathering the data but we don't have agreement about how to get the data into cassandra yet

Feb 23 2022, 5:36 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle placed T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra up for grabs.
Feb 23 2022, 5:30 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle claimed T299885: [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra.
Feb 23 2022, 5:29 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions

Feb 22 2022

Cparle added a comment to T293878: [L] Gather labeled data relevant to synonyms.

Another source of ground truth might be images that were added then reverted within e.g. a day?

Feb 22 2022, 11:33 AM · Structured-Data-Backlog

Feb 14 2022

Cparle added a comment to T300045: Refactor commons_wikidata_links/gather_data.ipynb notebook as a python script.

Hmmm ok so you have no dump-and-reload mechanism? If not we'll have to keep the data from the previous run in order to work out the __DELETE_GROUPING__ part

Feb 14 2022, 6:09 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
Cparle added a parent task for T301687: Calculate image suggestions confidence score without using elasticsearch: T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs.
Feb 14 2022, 4:50 PM · Structured-Data-Backlog (Current Work)
Cparle added a subtask for T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs: T301687: Calculate image suggestions confidence score without using elasticsearch.
Feb 14 2022, 4:50 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle updated the task description for T299789: [XL] Store a list of unillustrated articles with suggested images in hdfs.
Feb 14 2022, 4:44 PM · Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
Cparle created T301687: Calculate image suggestions confidence score without using elasticsearch.
Feb 14 2022, 4:39 PM · Structured-Data-Backlog (Current Work)

Feb 8 2022

Cparle created T301223: [M] Handle deprecation of Serializable interface in WikibaseMediaInfo.
Feb 8 2022, 9:44 AM · MW-1.39-notes (1.39.0-wmf.7; 2022-04-11), Structured-Data-Backlog (Current Work), WikibaseMediaInfo, PHP 8.1 support

Feb 7 2022

Cparle added a comment to T301048: Structured data not visible in structured data tab on a lot of files.

@Multichill is the bot just using wbsetclaim then a null edit? Are you getting this with any of your other bots?

Feb 7 2022, 5:56 PM · Structured-Data-Backlog (Current Work), Editing-team, Commons, StructuredDataOnCommons, Structured Data Engineering

Feb 4 2022

Cparle added a comment to T287865: Review documentation - API and related.

@Zbyszko this isn't api documentation as such, but explains how MediaSearch works https://www.mediawiki.org/wiki/Extension:WikibaseMediaInfo/MediaSearch MediaSearch just uses the standard search api with a media-specific profile loaded if you're searching in the File namespace, though there are a couple of features that were developed specifically for media, namely haswbstatement and wbstatementquantity (in WikibaseCirrusSearch) and custommatch (in WikibaseMediaInfo)

Feb 4 2022, 10:08 AM · Discovery-Search, Wikidata, Documentation