Page MenuHomePhabricator

grafana-loki.service Failed on grafana2001
Closed, ResolvedPublicBUG REPORT

Description

This issue seems to be related with the Write-Ahead Log (WAL) segments used by Loki for logging data. Here's the relevant part of the log:

Feb 07 16:59:55 grafana2001 loki[1060812]: level=error ts=2024-02-07T16:59:55.085085428Z caller=log.go:100 msg="error running loki" err="segments are not sequential\ngithub.com/grafana/loki/vendor/github.com/prometheus/prometheus/tsdb/wal.listSegments\n\>

Event Timeline

I took these steps to fix the problem:

  1. Stop the grafana-loki.service on grafana2001.
  2. Delete the Corrupt Write-Ahead Log (WAL) at /srv/loki/wal/ on grafana2001.
  3. Synchronize the /srv/loki/ data directory from grafana1002 to grafana2001 using the rsync-loki-data.service
  4. Restart the grafana-loki.service.

The problem is currently resolved, though additional investigation into the root cause is necessary.

Thank you for taking a look! I believe this was caused by the fact that loki started on grafana2001 before the sync and wrote its own WAL, all good

Change 999820 had a related patch set uploaded (by Andrea Denisse; author: Andrea Denisse):

[operations/puppet@production] grafana: Prevent race condition by excluding 'wal' directory in Loki sync

https://gerrit.wikimedia.org/r/999820

Change 999820 merged by Andrea Denisse:

[operations/puppet@production] grafana: Prevent race condition by excluding 'wal' directory in Loki sync

https://gerrit.wikimedia.org/r/999820