Page MenuHomePhabricator

Revisit failed token ranges in thin-out script
Closed, ResolvedPublic


While thinning out old revisions with the thin-out script, some token offsets are failing. This might be caused by a tombstone overwhelm after many recent deletions.

The current solution is to skip over those ranges by:

  • decode the pagestate with new Buffer('<pagestate>', 'hex').toString(), look for the _domain and key
  • get the token for that _domain and key with select token('domain', 'key') in cqlsh
  • skip over that by adding a where token("_domain", 'key') > <token + n>

Since those failed ranges often correspond to extremely wide rows, it would be good to record and revisit those ranges later, in order to make sure that those super-wide rows are also thinned out successfully.

In order to do so, lets record the failed pagestates below:

wikipedia data-parsoid

  • token("_domain", key) > token('', 'User:OlEnglish/Dashboard')'
  •, Wikipedia:WikiProject_Biography/Deletion_sorting
  •, Վիքիպեդիա:Նախագիծ:Վիքիընդլայնում
  •, User:JamesR/AdminStats
  •, Wikipedia:Auskunft
  •, Age_of_the_Vikings ... 3257335165771148493
  •, бэйда-Суміцкі
  •, User_talk:Lolomg ... 2207849253371058408
  •, Wikipedia:In_the_news/Candidates

wikipedia html

  •, User_talk: ... -793006042568050703
  •, Funkcja_Β; 314000445674974489 ... 314000545674974489
  •, Xestia_chosenbaja ... 3257335050961748689
  • 314000445674974489

wikimedia data-parsoid

  •, Commons:Quality_images_candidates/candidate_list

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a subscriber: GWicke.
GWicke set Security to None.
GWicke updated the task description. (Show Details)

The most efficient way to resolve this should be to do a full run after most tombstones have been compacted away. That should be the case 2-3 weeks from now.

GWicke triaged this task as Medium priority.Jul 21 2015, 5:57 PM
GWicke claimed this task.