Page MenuHomePhabricator

Code Stewardship Review: OAuth extension
Closed, ResolvedPublic

Description

Intro

The OAuth extension is currently without a Code Steward and is exclusively maintained by the community. Recently this extensions unit tests seem to have stopped working.

Number, severity, and age of known and confirmed security issues

See https://phabricator.wikimedia.org/maniphest/query/8ylEU1K_uP9M/#R for those who have access to Security issues

Was it a cause of production outages or incidents? List them.

TBD

Does it have sufficient hardware resources for now and the near future (to take into account expected usage growth)?

n/a

Is it a frequent cause of monitoring alerts that need action, and are they addressed timely and appropriately?

No

When it was first deployed to Wikimedia production

August 2013 in rOMWC65b517e132eae567a5bf7be8e1912c63fd0da38f

Usage statistics based on audience(s) served

TBD

Changes committed in last 1, 3, 6, and 12 months

Reliance on outdated platforms (e.g. operating systems)

n/a

Number of developers who committed code in the last 1, 3, 6, and 12 months

Number and age of open patches

See https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/OAuth

Number and age of open bugs

See https://phabricator.wikimedia.org/maniphest/query/FPbf.0Tdtg6C/#R

Number of known dependencies?

TBD

Is there a replacement/alternative for the feature? Is there a plan for a replacement?

No plan for replacement.

Submitter's recommendation (what do you propose be done?)

Find Code Steward.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I found this after seeing https://www.mediawiki.org/w/index.php?diff=3261645&oldid=3261642 and searching, as the relevant project tags don't seem to have been added to this task. If this is about MediaWiki-extensions-OAuth as that edit indicates, the summary seems incorrect. If it's not about that extension, then the confusion should be cleared up.

The OAuth extension is currently without a Code Steward and is exclusively maintained by the community.

Mostly it has been being maintained lately by @Tgr, I believe. I had been thinking that was in his work with the Reading Infrastructure team, but I don't know for sure.

I also watch for tasks reporting bugs, but I haven't had time lately to pay attention to feature enhancements. The extension is stable and there aren't many bugs.

Recently this extensions unit tests seem to have stopped working.

{{citation needed}}. Tests on recent patches such as https://gerrit.wikimedia.org/r/c/mediawiki/extensions/OAuth/+/513737 (on the same day this task was created) passed.

{{citation needed}}. Tests on recent patches such as https://gerrit.wikimedia.org/r/c/mediawiki/extensions/OAuth/+/513737 (on the same day this task was created) passed.

<ref> T223225: CI failing in CirrusSearch tests with “Undefined index: REQUEST_TIME_FLOAT” </ref>.

In summary: The tests for OAuth were failing for a while, then after we could not merge/deploy OAuth commits for ~ 3 weeks, the issue was fixed by https://gerrit.wikimedia.org/r/513746.

{{citation needed}}. Tests on recent patches such as https://gerrit.wikimedia.org/r/c/mediawiki/extensions/OAuth/+/513737 (on the same day this task was created) passed.

<ref> T223225: CI failing in CirrusSearch tests with “Undefined index: REQUEST_TIME_FLOAT” </ref>.

{{failed verification}} From that task it seems OAuth's tests weren't actually failing. They somehow (no one has determined how) caused something else to start failing, due to manipulation of $_SERVER.

As titled, I'd probably have overlooked that task because it claims to be about CirrusSearch unless someone @-mentioned me in a relevant comment.

Edit: Ok, looking through recent Gerrit it seems OAuth was failing too and people were just forcing it. Still, the task as written doesn't seem relevant when looking through a Phab notifications list.

Yeah, OAuth's PHPUnit test classes weren't failing because it didn't cover or integrate with any core or extension code that depended on the PHP feature it broke. However, OAuth's build (aka "its tests", including the shared gate it runs) was failing, making it not possible for anyone to merge or deploy trivial or hotfix commits in the extension through normal procedures.

This is an example of an issue that a "steward of the OAuth extension" would be expected to triage and perhaps perform the preliminary investigation of with relatively high urgency and/or to temporarily disable the OAuth test class were leak originated. Given that role is not currently held (officially), I recommended the extension last month for stewardship review.

I have been thinking about filing something like this for a while. Given the importance of OAuth (the majority of external tools rely on it, and it's a big part of power user workflows) it seems somewhat surprising how little support it has. Although the same could probably be said about the entire auth stack. And as Anomie says it's reasonably stable, what's missing is someone working on improving it. It has poor test coverage, poor mobile support for the authorization flow, the workflow for developing an OAuth app is terrible, there are a few parts that should be hardened as layered defense...

Mostly it has been being maintained lately by @Tgr, I believe. I had been thinking that was in his work with the Reading Infrastructure team, but I don't know for sure.

In the sense of managers looking the other way when I spend time on it, yes. It's not officially related to the team. (I think it was on the roadmap for the old RI team led by Bryan, but not since then.)

Number, severity, and age of known and confirmed security issues

Basically zero AFAIK. Security gets used both for vulnerabilities and for missing features in code doing some kind of security-related role, and for OAuth it tends to be the latter - logging could be improved, IP-based anti-abuse features are broken by the use of OAuth tools and so on. Of the linked list, T103023, T103022, T148600 and T124224 are the only actual vulnerabilities; two of those happened in 2015 and two in 2016.

Was it a cause of production outages or incidents? List them.

AFAIK there was one this year, T218608: OAuth doesn't work when $wgBlockDisablesLogin is true.

Is there a replacement/alternative for the feature? Is there a plan for a replacement?

There was some talk of including T125337: Add OAuth 2.0 support to MediaWiki to support Discourse in the MediaWiki-REST-API project, but no specifics so far. Although that would probably be a refactoring, not a replacement.

greg triaged this task as Medium priority.Jul 3 2019, 10:29 PM

Just a note in my role as concerned stakeholder, the OAuth functionality is used by a number of Wikimedia tools including things hosted on Toolforge and elsewhere. As far as I know it is the only secure way for a Wikimedia user to delegate account access to an external tool. It is central to Quarry and PAWS. It is used by Striker to securely associate SUL accounts with Developer accounts. It is also used by Phabricator as the preferred account creation and authentication system. See https://meta.wikimedia.org/wiki/Special:OAuthListConsumers for the ~900 currently approved OAuth applications.

I would *love* to see this project adopted by a Wikimedia Foundation or affiliate engineering team. Naively it seems like a natural fit for a team maintaining either the core AuthManager functionality or the Action API (and any future MediaWiki native APIs). The next most logical owner would be a group such as the Wikimedia Foundation's Technical Engagement team (/me waves to @Bmueller). Currently the Technical Engagement team is not staffed with sufficient software engineering staff to take this on, but adding staff could be a recommendation of the stewardship review!

CPT had plans to work on OAuth 2 (T229500: Add OAuth 2.0 support to MediaWiki) so maybe they are interested in taking ownership.

OTOH if you look at what kind of changes were needed in the extension, those are almost always bugs or missing features causing problems to tool authors, and in some cases the cloud team's own usage of OAuth, so in agile terms TechEng is the pig here, and would make a better owner. (Of course that might change if OAuth 2 does indeed become a fundamental part of the new REST API and does end up used widely. Although if OAuth 2 does become a key part of the API, maybe it should be in core and not a separate extension?)

@Jrbranaa Platform have done quite a bit on this I'm marking us as owners unless there is an objection and this task can likely resolve.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May26 and Jun17, and T270544). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be very welcome!

(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

For everyone's info, currently no Code-Stewardship-Reviews are taking place as there is no clear path forward and as this is not prioritized work.
(Entirely personal opinion: I also assume lack of decision authority due to WMF not having a CTO currently. However, discussing this is off-topic for this task.)

Should this close as by Platform Engineering is responsible for this extension?

Should this close as by Platform Engineering is responsible for this extension?

Does that mean firefighting responsibility or plans to actively improve it (e.g. the terrible app submission UX)?

AIUI both could be possible goals for a code stewardship review; I would argue that given the importance of OAuth in a huge variety of workflows, here the goal of the process should be a latter (although having clear responsibility for fixing regressions is certainly an improvement).

@Tgr That is an entirely reasonable question - maybe that is a reason for Platform to remove itself for now from the maintainers list. I think logically this would live near the API Platform team. Having said that, I think, we would need to maybe consider putting it back in stewardship review because we aren't resourced to support it in the manner you describe fully. Having that aspect as a key criteria for stewardship would be great - that taking responsibility should come with appropriate resourcing.

OAuth 2 access token generation was down (to some extent?) on group 1 for 24 hours due to T321160: Lcobucci\JWT\Signer\InvalidKeyProvided: Key cannot be empty. Per logstash a total of 167 requests failed, most on meta, some on Commons. (A surprisingly small amount, so maybe only code paths were broken? Or maybe some errors did not get logged? AccessTokenEntity::convertToJWT() was broken which I think should have blocked anything that involved creating new access tokens, ie. starting new sessions on OAuth 2 based web tools, but I can't navigate the codebase well enough to say that with confidence.) It impacted Wikimedia login (e.g. as used on Diff or the Movement Strategy Forum), generation of new API keys and probably more. There is no systematic monitoring of OAuth health so it only got noticed during train triage of the error log and only got fixed after a day. Fortunately most community tooling is still on the (older and less user-friendly) OAuth 1 protocol, otherwise this would have been massively disruptive.

OAuth 1 based logins have been randomly failing for the last several weeks (probably related to the DC switchover), causing disruption in a number of important WMF and community tools such as the Wikipedia Library or CopyPatrol: T332650: Frequent OAuth failures on Wikimedia wikis since eqiad was repooled due to db-mainstash replication lag

OAuth 1 based logins have been randomly failing for the last several weeks (probably related to the DC switchover), causing disruption in a number of important WMF and community tools such as the Wikipedia Library or CopyPatrol: T332650: Frequent OAuth failures on Wikimedia wikis since eqiad was repooled due to db-mainstash replication lag

This specific issue is at least fully-resolved now, correct?

This specific issue is at least fully-resolved now, correct?

Yes. I just wanted some record of it - OAuth is a key extension, I think this issue was indicative of the problematicness of the status quo.

Resolving this request: The MediaWiki Engineering Group takes on stewardship for OAuth, supported by Security-Team (cc @acooper). You can use MediaWiki-Platform-Team as tag if you'd like to get things surfaced to the team. Updates to the developers/maintainers page have been made!