Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| arclamp: reduce compress days | operations/puppet | production | +1 -1 |
Event Timeline
Noticed a change in the pattern since Nov 26 (more or less).
Custom log rotation seems to be working fine.
Took a look at the Arclamp Grafana dashboard ( https://grafana.wikimedia.org/goto/q6WP_PMvg?orgId=1 ) and found another perturbation, likely related to “Errors from MediaWiki sampler,” during June/July.
On Gerrit (https://gerrit.wikimedia.org/r/q/mergedbefore:2025-11-26+mergedafter:2025-11-24+excimer) , I found the patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1160800 with a comment by Timo (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1160800/comments/b5f4cad1_158b730c) stating that, theoretically, it does not affect Excimer.
Change #1218821 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] arclamp: reduce compress days
^ I've done this to free space in the past, should be ok as a stopgap while the root cause is being addressed
Change #1218821 merged by Herron:
[operations/puppet@production] arclamp: reduce compress days
Maybe we should look into implementing a way for arclamp to create tasks when this issue happens.
I learned some things.
- Each excimer stack frame is stored 4 times in uncompressed log files: daily-all, daily-tag, hourly-all, and hourly-tag.
- The compress job sorts each log file serially into a separate log file then compresses the sorted version. The sorting is done to enhance the gzip compression.
- Once the logs are compressed, the compressed log file is then uploaded to swift.
There was an increase in log volume comparing Nov 11-20 (avg 8,660,451) to Dec 1-10 (avg 9,909,823). (These averages are deduplicated, so 4x these values are on disk.)



