Page MenuHomePhabricator

503 error in Netbox accounting report cause systemd alert
Closed, ResolvedPublic

Description

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=netbox1001&service=Check+systemd+state
Got triggered by:

ayounsi@netbox1001:~$ sudo service netbox_report_accounting_run status
● netbox_report_accounting_run.service - Run report accounting.Accounting in Netbox
   Loaded: loaded (/lib/systemd/system/netbox_report_accounting_run.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2020-05-04 07:44:55 UTC; 10min ago
  Process: 6992 ExecStart=/srv/deployment/netbox/venv/bin/python /srv/deployment/netbox/deploy/src/netbox/manage.py runreport accounting.Accounting (code=exited, status=1/FAILURE)
 Main PID: 6992 (code=exited, status=1/FAILURE)

May 04 07:44:55 netbox1001 python[6992]:     config["service-credentials"], config["accounting"]["sheet_id"], config["accounting"]["range"],
May 04 07:44:55 netbox1001 python[6992]:   File "/srv/deployment/netbox-extras//reports/accounting.py", line 57, in get_assets_from_accounting
May 04 07:44:55 netbox1001 python[6992]:     dateTimeRenderOption="FORMATTED_STRING",
May 04 07:44:55 netbox1001 python[6992]:   File "/srv/deployment/netbox/venv/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
May 04 07:44:55 netbox1001 python[6992]:     return wrapped(*args, **kwargs)
May 04 07:44:55 netbox1001 python[6992]:   File "/srv/deployment/netbox/venv/lib/python3.7/site-packages/googleapiclient/http.py", line 898, in execute
May 04 07:44:55 netbox1001 python[6992]:     raise HttpError(resp, content, uri=self.uri)
May 04 07:44:55 netbox1001 python[6992]: googleapiclient.errors.HttpError: <HttpError 503 when requesting https://sheets.googleapis.com/v4/spreadsheets/[REDACTED]
May 04 07:44:55 netbox1001 systemd[1]: netbox_report_accounting_run.service: Main process exited, code=exited, status=1/FAILURE
May 04 07:44:55 netbox1001 systemd[1]: netbox_report_accounting_run.service: Failed with result 'exit-code'.

I *think* the exception should be caught and not trigger a systemd alert. On the other hand we should somehow alert on stale data.

Not sure if the reports returns "failed" if the run is unsuccessful or if it keeps its previous state.
As well as if the "Last run" date can be used for that.

Event Timeline

crusnov raised the priority of this task from Medium to Needs Triage.
crusnov triaged this task as Medium priority.
crusnov claimed this task.

This appears to have been a transient error, as it has resolved.