Page MenuHomePhabricator

Prevent storing duplicate Wikidata articles in page collection recommendations cache
Closed, ResolvedPublic2 Estimated Story PointsBUG REPORT

Description

The Recommendation API can end up with duplicate instances of the same Wikidata article for a given collection in its cache.

The method responsible for this is get_candidates_in_collection_page. While it’s still unclear how many issues this may cause, it is not the correct behavior and should be addressed to ensure data consistency.

Event Timeline

Change #1206883 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[research/recommendation-api@master] Page collections caching: Use sitematrix lang code for all articles

https://gerrit.wikimedia.org/r/1206883

ngkountas set the point value for this task to 2.

Change #1206883 merged by jenkins-bot:

[research/recommendation-api@master] Page collections caching: Use sitematrix lang code for all articles

https://gerrit.wikimedia.org/r/1206883

Change #1211118 had a related patch set uploaded (by Sbisson; author: Sbisson):

[operations/deployment-charts@master] Update recommendation-api to 2025-11-20-132855-production

https://gerrit.wikimedia.org/r/1211118

Change #1211118 merged by jenkins-bot:

[operations/deployment-charts@master] Update recommendation-api to 2025-11-20-132855-production

https://gerrit.wikimedia.org/r/1211118

Mentioned in SAL (#wikimedia-operations) [2025-11-25T14:23:00Z] <stephanebisson> Updated recommendation-api to 2025-11-20-132855-production (T410396, T410387)

This is not QA testable, and the code prevents duplication. I will move this to Sign-off. Thanks for all your work!