Page MenuHomePhabricator

Revisit failed token ranges in thin-out script
Closed, ResolvedPublic

Description

While thinning out old revisions with the thin-out script, some token offsets are failing. This might be caused by a tombstone overwhelm after many recent deletions.

The current solution is to skip over those ranges by:

  • decode the pagestate with new Buffer('<pagestate>', 'hex').toString(), look for the _domain and key
  • get the token for that _domain and key with select token('domain', 'key') in cqlsh
  • skip over that by adding a where token("_domain", 'key') > <token + n>

Since those failed ranges often correspond to extremely wide rows, it would be good to record and revisit those ranges later, in order to make sure that those super-wide rows are also thinned out successfully.

In order to do so, lets record the failed pagestates below:

wikipedia data-parsoid

  • token("_domain", key) > token('en.wikipedia.org', 'User:OlEnglish/Dashboard')'
  • en.wikipedia.org, Wikipedia:WikiProject_Biography/Deletion_sorting
  • hy.wikipedia.org, Վիքիպեդիա:Նախագիծ:Վիքիընդլայնում
  • en.wikipedia.org, User:JamesR/AdminStats
  • de.wikipedia.org, Wikipedia:Auskunft
  • en.wikipedia.org, Age_of_the_Vikings ... 3257335165771148493
  • be-x-old.wikipedia.org, бэйда-Суміцкі
  • en.wikipedia.org, User_talk:Lolomg ... 2207849253371058408
  • en.wikipedia.org, Wikipedia:In_the_news/Candidates

wikipedia html

  • en.wikipedia.org, User_talk:77.65.63.46 ... -793006042568050703
  • pl.wikipedia.org, Funkcja_Β; 314000445674974489 ... 314000545674974489
  • sv.wikipedia.org, Xestia_chosenbaja ... 3257335050961748689
  • 314000445674974489

wikimedia data-parsoid

  • commons.wikimedia.org, Commons:Quality_images_candidates/candidate_list

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a subscriber: GWicke.
GWicke set Security to None.
GWicke updated the task description. (Show Details)

The most efficient way to resolve this should be to do a full run after most tombstones have been compacted away. That should be the case 2-3 weeks from now.

GWicke triaged this task as Medium priority.Jul 21 2015, 5:57 PM
GWicke claimed this task.