Page MenuHomePhabricator

Implement Enterprise auth token refresh logic
Open, Needs TriagePublicSpike

Description

The Enterprise auth layer is strict about token freshness. The refresh_token is valid for 90 days, and the access_token is valid for 24 hours. Our client will be used in a long-running periodic job that will probably be launched once per month, and can take as long as one week to complete. This means there's a high chance of either token timing out during or between job runs.

Our client must be designed to proactively keep track of these timeouts, and recover from failures:

  • Store the auth token in application memory (eg. in "erlang term storage" :ets), ideally this is only refreshed when necessary. We want it to last the full 24h.
  • Parse expires_in time limit returned with authentication responses, and store as a future datetime.
  • If the job is started within 6 weeks of a refresh_token timeout, send an alert email.
  • If expiry is coming in less than eg. 2h, then proactively get a fresh token.
  • If an API call results in a token error, or if the expires_in time is in the past, then get a new access token and store it.
  • If the refresh token fails to authenticate, send a very loud alert email.

Implementation notes

  • If possible, rely on the Airflow capabilities to send alert emails.
  • In Elixir, ideally we would manage the API expiries using a standalone process which can refresh the tokens as needed, and keep some global state. This belongs inside of the mediawiki_client library.

Open questions

  • Could have two refresh_tokens available, and switch over when the first expires. These would be renewed eg. 45 days apart.