Page MenuHomePhabricator

Backup opensearch dashboards data
Closed, ResolvedPublic

Description

At the moment we have redundancy for dashboards indices (i.e. dashboards are stored in opensearch) and we should add backups for disaster recovery purposes.

  • author scripts to dump/restore dashboards/indices to/from plain text files
  • run dumps periodically on hosts that run dashboards
  • instruct bacula to pick up said plain text dumps

Event Timeline

Grabbing exports direct from the Kibana API appears possible:

curl -X POST "http://localhost:5601/api/saved_objects/_export" -H 'Content-Type: application/json' -H 'kbn-xsrf: true' -d'{"type":["index-pattern","url","search","visualization","dashboard","config","query"],"includeReferencesDeep":true}' -o export.ndjson
[16:11]  <Lucas_WMDE> https://logstash.wikimedia.org/ only shows me an empty dashboard instead of the usual home page, is that an intentional change?
[16:11]  <Lucas_WMDE> (apparently there’s a default filter for host:ores2003, last 15 minutes, which yields no results…)
...
[16:13]  <    jynus> Lucas_WMDE: someone must have edited accidentally the home dashboard
[16:14]  <Lucas_WMDE> looks like it
[16:14]  <    jynus> Does anyone know if old versions are stored?
[16:14]  <Lucas_WMDE> now the ores2003 filter is gone
[16:14]  <    bd808> I just edited it to remove the filter
[16:14]  <Lucas_WMDE> I hope it wasn’t me >.<
[16:14]  <Lucas_WMDE> ok
[16:16]  <    bd808> did the "home" dashboard normally have some filters on it or has it historically just been a raw feed?
[16:16]  <    jynus> "The dashboards and visualizations are saved inside the .kibana index on your Elasticsearch cluster. If you had backup done to it, you can recover them from there, but otherwise there is no way"
[16:17]  <    jynus> bd808: it was an actually nice overview of the main dashboards
[16:17]  <Lucas_WMDE> yeah, it was quite useful
[16:17]  <    jynus> with links grouped by function
[16:17]  <    bd808> :((

I tracked down the markdown panel that had links to lots of dashboards, but it would have been ideal to just upload a known good version of the dashboard.

The logstash Home page dashboard was just deleted (probably accidentally). I am afraid the same thing could happen to other (or all) dashboards. Is this something that we can search someone to implement (hoping it should be just a script + a small amount of puppet).

Obviously, I can help with the bacula part. CC @lmata

colewhite renamed this task from Backup kibana indices to Backup opensearch dashboards data.May 19 2022, 7:39 PM
colewhite changed the task status from Open to In Progress.
colewhite claimed this task.
colewhite triaged this task as Medium priority.
colewhite updated the task description. (Show Details)

Change 798886 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] opensearch_dashboards: add backup script enable job

https://gerrit.wikimedia.org/r/798886

Change 798886 merged by Cwhite:

[operations/puppet@production] opensearch_dashboards: add backup script enable job

https://gerrit.wikimedia.org/r/798886

We're now capturing 30d of dashboards data locally at /srv/backups/opensearch_dashboards.

Obviously, I can help with the bacula part.

@jcrespo Let me know if there is any more info you need or if I can help set up off-host backups.

There is an issue, and that is that that model doesn't really work with Bacula- Bacula requires a static address (a set of paths -be it directories or files- included and excluded. E.g. include => '/srv', exclude => '/srv/tmp'), and while we could configure /srv/backups/opensearch_dashboards, that would end up with 30 copies of the same backups stored everyday, which is not very efficient.

It is ok to keep locally additional copies- we do in fact that with databases or gitlab- but we should have a way to backup only the "latest" one, be it by having a distinct, static name or being on a unique directory, so e.g. only a backup is sent to bacula e.g. every day or every week and stored there long term. For example (although it doesn't have to be exactly the same), for databases we have an "ongoing", "latest" and "archive" set of folders, and we backup only the latest, preventing from backing up in-progress ones, or older ones, and we rotate them after a successful run, always keeping locally at least 2 copies. We also sync bacula scraping and dumping to ensure compatibility.

Once that is solved, adding it to bacula is quite easy: https://wikitech.wikimedia.org/wiki/Bacula#Adding_a_new_client

Change 802149 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] opensearch_dashboards: copy latest backup to a predictable name

https://gerrit.wikimedia.org/r/802149

Change 802151 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] opensearch_dashboards: add and enable bacula backups

https://gerrit.wikimedia.org/r/802151

Change 802149 merged by Cwhite:

[operations/puppet@production] opensearch_dashboards: copy latest backup to a predictable name

https://gerrit.wikimedia.org/r/802149

Change 802151 merged by Cwhite:

[operations/puppet@production] opensearch_dashboards: add and enable bacula backups

https://gerrit.wikimedia.org/r/802151

Backups rolled out and tested.

logstash2023 backups failed, I am currently investigating:

backup1001	Backup freshness	CRITICAL 	2022-06-02 07:26:40 	0d 3h 8m 36s 	3/3 	All failures: 1 (logstash2023), Fresh: 115 jobs

False alarm- because it is the first of the month, full backups are running now and that is causing a bit of delay- the current status is "C" - the job scheduled but not yet executed, and the icinga check is strict because there is no previous run. It should run in a few hours, I will keep monitoring it to ensure it runs as expected.

root@backup1001:~$ check_bacula.py logstash2023.codfw.wmnet-Daily-production-opensearch-dashboards
id: 447132, ts: None, type: F, status: C, bytes: 0