Page MenuHomePhabricator

Cluster-wide major compactions: parsoid.data-parsoid table
Closed, ResolvedPublic

Description

As noted elsewhere, the droppable tombstone ratio is abnormal quite high. Major compactions are not a viable long-term strategy for this problem, but do result in a significant reduction to the droppable ratio, as well as reclaiming significant disk space.

The parsoid.html compactions are nearly complete, and since parsoid.data-parsoid is another high utilization table, we should run a pass of major compactions there as well.

  • eqiad
    • a
      • restbase1007.eqiad.wmnet
        • a
        • b
        • c
      • restbase1010.eqiad.wmnet
        • a
        • b
        • c
      • restbase1011.eqiad.wmnet
        • a
        • b
        • c
    • b
      • restbase1008.eqiad.wmnet
        • a
        • b
        • c
      • restbase1012.eqiad.wmnet
        • a
        • b
        • c
      • restbase1013.eqiad.wmnet
        • a
        • b
        • c
    • d
      • restbase1009.eqiad.wmnet
        • a
        • b
        • c
      • restbase1014.eqiad.wmnet
        • a
        • b
        • c
      • restbase1015.eqiad.wmnet
        • a
        • b
        • c
  • codfw
    • b
      • restbase2001.codfw.wmnet
        • a (0.007)
        • b
        • c
      • restbase2002.codfw.wmnet
        • a (0.006)
        • b
        • c
      • restbase2007.codfw.wmnet
        • a (0.006)
        • b
        • c
    • c
      • restbase2003.codfw.wmnet
        • a (0.006)
        • b
        • c
      • restbase2004.codfw.wmnet
        • a (0.006)
        • b
        • c
      • restbase2008.codfw.wmnet
        • a (0.006)
        • b
        • c
    • d
      • restbase2005.codfw.wmnet
        • a (0.007)
        • b
        • c
      • restbase2006.codfw.wmnet
        • a (0.006)
        • b
        • c
      • restbase2009.codfw.wmnet
        • a (0.006)
        • b
        • c

Event Timeline

Eevans moved this task from Backlog to Next on the Cassandra board.
Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)

Over in T143226: Cluster-wide major compactions: parsoid.html table, I removed repaired-at timestamps (where they existed), since they split the compaction pool and prevented major compactions from being effective at reducing the droppable tombstone ratio. The same is true for wikipedia parsoid.data-parsoid (in addition to others, I suspect). These timestamps will need to be removed as well before continuing here.

Mentioned in SAL (#wikimedia-operations) [2016-10-05T15:39:44Z] <urandom> T146211: Restarting Cassandra on restbase1007-a.eqiad.wmnet to mark parsoid.data-parsoid tables unrepaired

Mentioned in SAL (#wikimedia-operations) [2016-10-05T15:48:06Z] <urandom> T146211: Restarting Cassandra on restbase1007-b.eqiad.wmnet to mark parsoid.data-parsoid tables unrepaired

Mentioned in SAL (#wikimedia-operations) [2016-10-05T15:54:15Z] <urandom> T146211: Restarting Cassandra on restbase1007-c.eqiad.wmnet to mark parsoid.data-parsoid tables unrepaired

Mentioned in SAL (#wikimedia-operations) [2016-10-05T17:31:59Z] <urandom> T146211: Performing rolling restart of restbase1010.eqiad.wmnet Cassandra instances, and marking SSTables unrepaired.

Mentioned in SAL (#wikimedia-operations) [2016-10-05T17:58:33Z] <urandom> T146211: Performing rolling restart of restbase1011.eqiad.wmnet Cassandra instances, and marking SSTables unrepaired.

Marking SSTables unrepaired with:

sudo c-foreach-restart --execute-post-shutdown "curl https://phab.wmfusercontent.org/file/data/uk5p7ehlbegir265rduu/PHID-FILE-52ba4hq35ljiymvcmxvg/Masterwork_From_Distant_Lands | bash -s {id}"
2016-10-05 17:59:48,236 INFO     [a] Disabling client ports...
2016-10-05 17:59:52,033 INFO     [a] Draining...
2016-10-05 18:01:19,562 INFO     [a] Stopping service cassandra-a
2016-10-05 18:01:22,275 INFO     [a] Executing post-shutdown command: curl https://phab.wmfusercontent.org/file/data/uk5p7ehlbegir265rduu/PHID-FILE-52ba4hq35ljiymvcmxvg/Masterwork_From_Distant_Lands | bash -s {id}
2016-10-05 18:01:54,919 INFO     [a] Found: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-13342-Data.db (repaired at 1426826797204)
2016-10-05 18:01:54,920 INFO     [a] -- Setting unrepaired...done
2016-10-05 18:01:54,920 INFO     [a] Starting service cassandra-a
2016-10-05 18:01:54,951 WARNING  [a] CQL (10.64.0.117:9042) not listening (will retry)...
2016-10-05 18:02:06,959 WARNING  [a] CQL (10.64.0.117:9042) not listening (will retry)...
2016-10-05 18:02:18,971 WARNING  [a] CQL (10.64.0.117:9042) not listening (will retry)...
2016-10-05 18:02:30,977 WARNING  [a] CQL (10.64.0.117:9042) not listening (will retry)...
2016-10-05 18:02:42,985 INFO     [a] CQL (10.64.0.117:9042) is UP
2016-10-05 18:02:43,060 INFO     [b] Disabling client ports...
2016-10-05 18:02:50,444 INFO     [b] Draining...
2016-10-05 18:04:27,981 INFO     [b] Stopping service cassandra-b
2016-10-05 18:04:30,848 INFO     [b] Executing post-shutdown command: curl https://phab.wmfusercontent.org/file/data/uk5p7ehlbegir265rduu/PHID-FILE-52ba4hq35ljiymvcmxvg/Masterwork_From_Distant_Lands | bash -s {id}
2016-10-05 18:05:06,698 INFO     [b] Found: la-21913-big-Data.db (repaired at 1426826797204)
2016-10-05 18:05:06,698 INFO     [b] -- Setting unrepaired...done
2016-10-05 18:05:06,699 INFO     [b] Starting service cassandra-b
2016-10-05 18:05:06,747 WARNING  [b] CQL (10.64.0.118:9042) not listening (will retry)...
2016-10-05 18:05:18,761 WARNING  [b] CQL (10.64.0.118:9042) not listening (will retry)...
2016-10-05 18:05:30,773 WARNING  [b] CQL (10.64.0.118:9042) not listening (will retry)...
2016-10-05 18:05:42,782 WARNING  [b] CQL (10.64.0.118:9042) not listening (will retry)...
2016-10-05 18:05:54,798 INFO     [b] CQL (10.64.0.118:9042) is UP
2016-10-05 18:05:54,800 INFO     [c] Disabling client ports...
2016-10-05 18:06:03,235 INFO     [c] Draining...
2016-10-05 18:07:32,517 INFO     [c] Stopping service cassandra-c
2016-10-05 18:07:35,346 INFO     [c] Executing post-shutdown command: curl https://phab.wmfusercontent.org/file/data/uk5p7ehlbegir265rduu/PHID-FILE-52ba4hq35ljiymvcmxvg/Masterwork_From_Distant_Lands | bash -s {id}
2016-10-05 18:08:12,688 INFO     [c] Found: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-61-Data.db (repaired at 1426826797204)
2016-10-05 18:08:12,689 INFO     [c] -- Setting unrepaired...done
2016-10-05 18:08:12,689 INFO     [c] Found: local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-299-Data.db (repaired at 1426826797204)
2016-10-05 18:08:12,689 INFO     [c] -- Setting unrepaired...done
2016-10-05 18:08:12,689 INFO     [c] Starting service cassandra-c
2016-10-05 18:08:12,733 WARNING  [c] CQL (10.64.0.119:9042) not listening (will retry)...
2016-10-05 18:08:24,743 WARNING  [c] CQL (10.64.0.119:9042) not listening (will retry)...
2016-10-05 18:08:36,755 WARNING  [c] CQL (10.64.0.119:9042) not listening (will retry)...
2016-10-05 18:08:48,767 WARNING  [c] CQL (10.64.0.119:9042) not listening (will retry)...
2016-10-05 18:09:00,777 INFO     [c] CQL (10.64.0.119:9042) is UP

Mentioned in SAL (#wikimedia-operations) [2016-10-05T18:17:41Z] <urandom> T146211: Performing rolling restart of RESTBase rack 'b' Cassandra instances, and marking SSTables unrepaired.

Mentioned in SAL (#wikimedia-operations) [2016-10-05T18:46:37Z] <urandom> T146211: Performing rolling restart of RESTBase eqiad rack 'd' Cassandra instances, and marking SSTables unrepaired.

These ad hoc manual compactions were completed (more than once, in fact); Closing