Page MenuHomePhabricator

Improve MediaWiki session logging
Closed, ResolvedPublic

Description

Logging was of limited use for {T390514} and should be improved:

  • all session writes should be logged, not just session creations
  • differentiate between writes to new keys and rewrites
  • gain more confidence that session creations are actually logged (seems like only 10-15% of session writes create a new session, which seems too low given how many sessions seem to be associated with scrapers)
  • be able to filter by session provider type
  • be able to filter by app, if easily doable
  • make sure client-identifying information (like IP/UA) are logged

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1140683 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] session: Add logging for session store writes

https://gerrit.wikimedia.org/r/1140683

Eevans added a parent task: Restricted Task.May 2 2025, 9:15 PM

Change #1140683 merged by jenkins-bot:

[mediawiki/core@master] session: Add logging for session store writes

https://gerrit.wikimedia.org/r/1140683

Dashboard: https://logstash.wikimedia.org/app/dashboards#/view/174d68c0-2c45-11f0-a1ab-6f85ab09a17d
Not super informative so far. Almost all sessions are from CentralAuth, not OAuth etc. (expected, but good to verify). Anonymous sessions are about 80% of total. Login/signup are about 70% of total, autologin is 5%. About 10-15% of writes are for API requests, the rest for web. There are a few users who do something weird with bots and have tens of thousands of session writes per day, but that's still just a drop in the bucket. IP and user agent look like a normal spread.

The number of events is exactly 1080000 for every 3 hour block on the histogram, which looks very fake. Logstash might be hitting some sort of throttling there?

Change #1143959 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] Improve session logging

https://gerrit.wikimedia.org/r/1143959

The number of events is exactly 1080000 for every 3 hour block on the histogram

which is exactly 100 / second, so yeah. The Prometheus stats imply there are something like 800 writes per second. So I guess the dashboard should be thought of as having been downsampled by one magnitude?

Not sure if this is a limitation of the dashboard, or the log event intake per normalized message (not by channel for sure, the session channel has way more stuff).

Change #1143959 merged by jenkins-bot:

[mediawiki/core@master] Improve session logging

https://gerrit.wikimedia.org/r/1143959

Change #1144532 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@wmf/1.44.0-wmf.28] Improve session logging

https://gerrit.wikimedia.org/r/1144532

Change #1144532 merged by jenkins-bot:

[mediawiki/core@wmf/1.44.0-wmf.28] Improve session logging

https://gerrit.wikimedia.org/r/1144532

Mentioned in SAL (#wikimedia-operations) [2025-05-12T13:59:41Z] <tgr@deploy1003> Started scap sync-world: Backport for [[gerrit:1144532|Improve session logging (T393038)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-12T14:04:35Z] <tgr@deploy1003> tgr: Backport for [[gerrit:1144532|Improve session logging (T393038)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-05-12T14:17:05Z] <tgr@deploy1003> Finished scap sync-world: Backport for [[gerrit:1144532|Improve session logging (T393038)]] (duration: 17m 24s)

I think this is done, the current logs are good enough.

Logstash might be hitting some sort of throttling there?

This was fixed in T395967: Allow sampling of Logstash events.