Page MenuHomePhabricator

Restore dse-k8s' rdf-streaming-updater from savepoint/improve bootstrapping process
Closed, ResolvedPublic3 Estimated Story Points

Description

During the course of deploying Zookeeper in T344614 , I left the service down too long and we lost the ability to restore from a checkpoint.

Fixing this requires bootstrapping a new savepoint , a process that is a bit ad-hoc at the moment.

Creating this ticket to:

  • Get the rdf-streaming-updater in dse-k8s up and running
  • Improve/document the bootstrapping process

Event Timeline

Gehel moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.
Gehel set the point value for this task to 3.
bking moved this task from Incoming to Needs Review on the Data-Platform-SRE board.

Thanks to @dcausse , the job has recovered and we've updated the docs. Moving to "Needs Review" so he can confirm the health of flink-app in dse-k8s and review the docs.

@bking thanks! I can confirm that the job is running fine, the dashboards show some activity the test stream wdqs_streaming_updater_test_T289836 is seeing all the mutations.
Regarding H/A with zookeeper I believe it's properly using as I don't see the usual k8s configmaps being created when the KUBERNETES H/A mode was used. I believe we should be good to do some testing of the various flink operations.

bking moved this task from Blocked / Waiting to Done on the Data-Platform-SRE board.

Thanks for all your help as well. I believe this is done, but please let us know if we need to revisit the docs or restore process.