Page MenuHomePhabricator

Create maintenance script to apply normalization and deduplicate existing reading list entries
Closed, ResolvedPublic5 Estimated Story Points

Description

Follow up to T419466: Spike - Title normalization is not applied when saving pages to the reading list via the API

We need to create a maintenance script that applies normalization (space -> underscore) to existing reading list entry page titles for any pages that currently are saved with spaces, and then soft-delete any duplicates (with saved with the space form)

  • Identify entries saved with spaces.
  • Check if there is a duplicate entry saved with underscores.
    • If the entry (with spaces) is a duplicate, soft delete it
    • If no duplicates, then apply normalization (convert spaces to underscores)
  • Adjust rl_size in the reading_list database table

Event Timeline

aude triaged this task as High priority.Apr 1 2026, 4:10 PM
aude set the point value for this task to 5.
aude moved this task from Incoming to Ready for sprint on the Reader Experience Team board.
aude lowered the priority of this task from High to Medium.Apr 7 2026, 6:19 PM

Change #1271046 had a related patch set uploaded (by LorenMora; author: LorenMora):

[mediawiki/extensions/ReadingLists@master] Create maintenance script to normalize reading list entries

https://gerrit.wikimedia.org/r/1271046

Change #1271046 merged by jenkins-bot:

[mediawiki/extensions/ReadingLists@master] Create maintenance script to normalize reading list entries

https://gerrit.wikimedia.org/r/1271046

Jdlrobson-WMF subscribed.

@aude said this has been tested with @LMora-WMF If you could leave a quick summary here of how we confirmed, and resolve the ticket, that would be much appreciated!

Katie and I ran the script for Beta, as well as in production for items in Katie's ReadingList, and confirmed that it is working.