Page MenuHomePhabricator

Revisit failed token ranges in thin-out script
Closed, ResolvedPublic


While thinning out old revisions with the thin-out script, some token offsets are failing. This might be caused by a tombstone overwhelm after many recent deletions.

The current solution is to skip over those ranges by:

  • decode the pagestate with new Buffer('<pagestate>', 'hex').toString(), look for the _domain and key
  • get the token for that _domain and key with select token('domain', 'key') in cqlsh
  • skip over that by adding a where token("_domain", 'key') > <token + n>

Since those failed ranges often correspond to extremely wide rows, it would be good to record and revisit those ranges later, in order to make sure that those super-wide rows are also thinned out successfully.

In order to do so, lets record the failed pagestates below:

wikipedia data-parsoid

  • token("_domain", key) > token('', 'User:OlEnglish/Dashboard')'
  •, Wikipedia:WikiProject_Biography/Deletion_sorting
  •, Վիքիպեդիա:Նախագիծ:Վիքիընդլայնում
  •, User:JamesR/AdminStats
  •, Wikipedia:Auskunft
  •, Age_of_the_Vikings ... 3257335165771148493
  •, бэйда-Суміцкі
  •, User_talk:Lolomg ... 2207849253371058408
  •, Wikipedia:In_the_news/Candidates

wikipedia html

  •, User_talk: ... -793006042568050703
  •, Funkcja_Β; 314000445674974489 ... 314000545674974489
  •, Xestia_chosenbaja ... 3257335050961748689
  • 314000445674974489

wikimedia data-parsoid

  •, Commons:Quality_images_candidates/candidate_list

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke subscribed.
GWicke set Security to None.
GWicke updated the task description. (Show Details)

The most efficient way to resolve this should be to do a full run after most tombstones have been compacted away. That should be the case 2-3 weeks from now.

GWicke triaged this task as Medium priority.Jul 21 2015, 5:57 PM
GWicke claimed this task.