Page MenuHomePhabricator

Improve logging, monitoring and test coverage for MediaWiki Platform team authentication extensions
Open, Needs TriagePublic

Description

MediaWiki Platform team owned authentication extensions should have good logging and monitoring coverage to help debugging and incident response, and to allow periodic health checks and alerts to spot issues early. This is a tracking/planning task to organize subtasks.

  • MediaWiki-Core-AuthManager
    • Has very good unit and integration test coverage, but no end-to-end tests. AuthManager/SessionManager itself isn't really end-to-end-testable as it's just a framework, but the the default providers (password, temp password, cookie) and the CentralAuth providers should have such tests (both API and browser). See also T125599#2005924.
    • Maybe create smoke tests for Set-Cookie poisoning (T274514 T264370 T264369 T256395 etc) detection in the caching layers.
    • AFAIK it has decent logging. We should probably ask Security-Team if they need any improvements. Maybe look into T246471/T246462.
      • Make sure bot password management is logged per T151590. IIRC this is done but we should double-check.
    • Logging is extensive but for SessionManager hard to understand. We should improve the text or documentation of the cryptic session-related log messages: T181869 T158365 T204459 T204787 T292812
      • Maybe review logging for the Set-Cookie poisoning issues.
    • The main monitoring mechanism is the authevents channel, which is partially broken (autocreation, specifically) due to T275085 / gerrit 658443. (There is also the authmanager channel, basically the same data.) Other than the breakage, I think it is in okay shape.
      • Once T240685: MediaWiki Prometheus support is done and it's possible to monitor multi-dimensional stats, we should rethink what information to add. (Wiki? Browser family? Authn provider name?)
  • MediaWiki-extensions-CentralAuth (split by functionality/component because it is large)
    • SSO (authentication providers, CentralIdLookup, SpecialCentralLogin, SpecialCentralAutoLogin)
    • Authorization (global groups, wiki sets)
      • No tests for wiki sets.
      • It's about publicly logged actions so I don't think logging or monitoring is particularly important.
    • User management (locking, supression, merge/unmerge, delete, create local account)
      • Local account creation has tests, locking has some basic tests but the rest doesn't seem to.
      • Used by power users so logging/monitoring is not that important.
    • Global rename
      • Has decent test coverage and logging. Probably too infrequent for monitoring to be meaningful.
  • MediaWiki-extensions-OAuth
    • High-stakes extension, about 30% of all edits happen via OAuth. It badly needs end-to-end tests.
    • Has decent PHPUnit test coverage, but some important things not covered. E.g. Control, RefreshTokenRepository, ScopeRepository, MWOAuthServer, MWOAuthDataStore.
    • Should log consumer creation/management events: {T151590}
    • Should log authorizations, revocations and the use of the identify/user profile endpoint. This was done in T208007 but that was before OAuth 2 support was added, so it probably needs to be redone.
    • Error messages given to clients are in general pretty bad and confusing (T245477). That should be included, but in cases when we don't want to tell the user the error details, or it's not easy to get them from the error site to the output (since we use multiple external libraries), they should be at least logged so the support desk can look into the issue.
    • There should probably be monitoring of OAuth request rate, split by whether authentication was successful, by OAuth version, and whether it was owner-only. This would help catch major breakage quickly via alerts.
  • MediaWiki-extensions-OAuthRateLimiter
    • Seems to have decent PHPUnit coverage. Should probably have an end-to-end test that actually tests the presence of a rate limit in the JWT
    • Doesn't really need logging or monitoring.
  • MediaWiki-extensions-OATHAuth
    • Lacks unit tests. There should probably be tests for the user repository, more tests for the module registry, more tests for TOTPKey, a test for the authentication provider. Probably also an AuthManager integration test for the authentication provider.
    • Should probably have a browser test for the main workflow, if we can reliably mock or simulate the TOTP logic. Maybe also tests for enable/disable.
    • Should have monitoring for successful/unsuccessful checks, so it can be used for alerts (T150903: Alert sre/security on many 2FA failures)
  • WebAuthn (also MediaWiki-extensions-OATHAuth; see T303495 about merging the two extensions)

Related Objects

Event Timeline

Tgr updated the task description. (Show Details)
Tgr updated the task description. (Show Details)
Tgr removed Tgr as the assignee of this task.Oct 23 2023, 4:19 PM

There is also T346327: Track impressions, success and abandonment rate on the signup form being done by the Growth team, which is focused on usability rather than operations, but there's probably some overlap.