Page MenuHomePhabricator

Investigate external API call error on Special:GlobalContributions
Closed, ResolvedPublic

Description

Account support has been deployed for Special:GlobalContributions and it seems like this unresolvable error is showing up on search results:

image.png (286×1 px, 44 KB)

This error occurs when an external permission check fails. We should investigate and see if all calls are failing or if only some calls are. It looks like external revisions can be seen (see screenshot), but I'm not sure if any hidden ones the user may have access to are available at the moment:

image.png (894×374 px, 108 KB)

Acceptance Criteria:

  • Log a warning when this error is hit. It should probably detail the request and the call results (user/wiki if that's not already included too perhaps, so it can be replicated or checked against if necessary)
  • The error no longer shows up as frequently (the frequency seems to be "always" at the moment)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Log a warning when this error is hit. It should probably detail the request and the call results (user/wiki if that's not already included too perhaps, so it can be replicated or checked against if necessary)

We do log this already, from CheckUserApiRequestAggregator.

The CheckUserApiRequestAggregator is based on Echo's ForeignWikiRequest, which logs a similar error when lookups fail. Filtering the error logs to the normalized message that they both use, we see thousands in the last week from each: https://logstash.wikimedia.org/goto/e23c2cb3342a3d01864c4ae9c653cebd

image.png (300×908 px, 25 KB)

The contents of the response seem to differ between Echo and CheckUser:

  • Filtering for upstream connect error or disconnect/reset before headers. reset reason: connection failure retains >80% of the Echo logs and <1% of the CheckUser logs.
  • Filtering for (curl error: no status set) retains >98% of the CheckUser logs and <10% of the Echo logs.

Things to note:

  • Other errors are being encountered - these two don't account for 100% of the failures
  • These errors aren't encountered 100% of the time, suggesting something non-deterministic
  • The imbalance might just be to do with timing. Otherwise, could it be something to do with the different types of load that the different extensions add?
  • The timing of Echo errors follows a periodic pattern, whereas the CheckUser errors come in spikes. It could be that GlobalContributions is accessed less frequently, but causes more errors when it is accessed.

Examining the types of request to GlobalContributions further:

  • Many of these errors come from requests where someone is checking GlobalContributions for a registered user account that has edited many wikis. I checked this by visiting https://logstash.wikimedia.org/goto/e23c2cb3342a3d01864c4ae9c653cebd, choosing a random sample of spikes (large and small), and filtering out errors by URL until there were none left. All spikes I examined were caused by a single-digits number of requests to Special:GlobalContributions for named accounts, often causing many API lookups per single GlobalContributions request. E.g. one looked-up named user had edited >80 wikis.
  • Plenty of visits to Special:GlobalContributions aren't encountering this error. I checked this by filtering error logs where the URL contained Special:GlobalContributions, and there was a steady stream of other types of error.

Note also that the example given in T384717#10516298 is a GlobalContributions request with a wide IP range, that has edited 45 wiki according to XTools.

Summary

The spikes in this error seem to be caused by something to do with overloading connections when making too many API calls to different wikis, when looking up targets who have edited many wikis.

Recommendations

Don't do permission checks up-front when looking up a named user

  • The API calls check the IP reveal permissions and the permissions to see hidden users/revisions. For named users, we don't need to check IP reveal permissions, and in most cases the permissions to see hidden information won't change the results.
  • We could therefore remove the API calls entirely, and assume the user does not have the rights to see hidden users/revisions.
  • We could offer a way for users who do have those rights to manually trigger a permissions look-up, and add a warning that if the user has edited many wikis, they may not see complete results. @KColeman-WMF perhaps we could discuss this and come up with a workflow for this situation?

If the target is a wide IP range when the error is encountered, advise narrowing the range

  • The error message currently says: Error loading data from some wikis. These results are incomplete. It may help to try again. Trying again may not help if the user has edited many wikis. E.g. the error is consistently encountered when trying the example from T384717#10516298.
  • We can't remove the permissions API calls for IP ranges, because we need to know if the user can reveal IPs at a given wiki before showing any results from that wiki.
  • Instead, we can explain the the error is due to making API calls to external wikis, and suggest that they narrow the range.

Update the error message always to explain that the problem is due to looking up permissions at other wikis

  • Having a more informative error might help the user decide what else to try.

Other ideas

I considered these ideas, but wouldn't recommend them right now:

  • If a user has global IP reveal rights, don't check the permissions at each wiki. We do need to check whether the user is blocked at each wiki, so we still need to check this.
  • If a user has global IP reveal rights, show them a rough summary at the top of the page, pre-permissions check. Perhaps we trust a user with global IP reveal rights enough to let them know some basic information, like which wikis were edited by an IP, even if they don't have full reveal rights at some wiki because they are blocked. In this case we could show all users with global rights some basic information even when we encounter the permissions API error. However, this would require further discussion and possibly a change to the access policy.

Update the error message always to explain that the problem is due to looking up permissions at other wikis

Does something like this communicate everything we need to? Error verifying permissions from some wikis. Contributions from these wikis may not be shown.

That error isn't very useful to the user. What wiki's had a problem?

Workflow:
a) Try to use this utility
b) Get that error
c) ???

I suppose it would be "open a bug report" (as the standard workflow should result in no errors). In which case this should include all the necessary information needed to open a bug report such that whomever is going to respond to that report has enough information to be able to remedy the problem.

Don't do permission checks up-front when looking up a named user

  • The API calls check the IP reveal permissions and the permissions to see hidden users/revisions. For named users, we don't need to check IP reveal permissions, and in most cases the permissions to see hidden information won't change the results.
  • We could therefore remove the API calls entirely, and assume the user does not have the rights to see hidden users/revisions.
  • We could offer a way for users who do have those rights to manually trigger a permissions look-up, and add a warning that if the user has edited many wikis, they may not see complete results. @KColeman-WMF perhaps we could discuss this and come up with a workflow for this situation?

I suggest we include a checkbox for users who wish to also see deleted revisions, which would trigger a permission lookup if the user selects the checkbox. We can include a warning in the description for users who do not have permissions.

image.png (758×1 px, 93 KB)

Sidenote: What terminology is best? The other checkboxes say latest revisions, minor edits and revision deleted. Should we be using edit or revision? Should it be revision deleted or deleted revisions?

That error isn't very useful to the user. What wiki's had a problem?

Workflow:
a) Try to use this utility
b) Get that error
c) ???

I suppose it would be "open a bug report" (as the standard workflow should result in no errors). In which case this should include all the necessary information needed to open a bug report such that whomever is going to respond to that report has enough information to be able to remedy the problem.

The main problem here is that it's a known error that we can't currently do much about. There isn't an elegant way of looking up permissions at other wikis without making API requests (T380867), and if we have to do too many, they may fail.

We can't reveal which wikis the user has edited at until we know whether the user has the permission to know that, so if the API fails we can't reveal which wiki it failed from.

Unfortunately there isn't much a user can do, except narrow an IP range or narrow the date range (or check another tool, if the target is not an IP address).

Can we cache the permission lookup requests? And when building the cache, support multiple attempts at the API requests until the cache is fully populated?

That error isn't very useful to the user. What wiki's had a problem?

Workflow:
a) Try to use this utility
b) Get that error
c) ???

I suppose it would be "open a bug report" (as the standard workflow should result in no errors). In which case this should include all the necessary information needed to open a bug report such that whomever is going to respond to that report has enough information to be able to remedy the problem.

The main problem here is that it's a known error that we can't currently do much about. There isn't an elegant way of looking up permissions at other wikis without making API requests (T380867), and if we have to do too many, they may fail.

We can't reveal which wikis the user has edited at until we know whether the user has the permission to know that, so if the API fails we can't reveal which wiki it failed from.

Unfortunately there isn't much a user can do, except narrow an IP range or narrow the date range (or check another tool, if the target is not an IP address).

If the caller has global-temporary-account-viewer bundled access, are the hundreds of API calls still needed?

If the caller has global-temporary-account-viewer bundled access, are the hundreds of API calls still needed?

The tricky part is that a user is not allowed to see IP addresses if blocked from the wiki, according to the policy. So for a user with global access, we're mainly looking up whether they are blocked.

Can we cache the permission lookup requests? And when building the cache, support multiple attempts at the API requests until the cache is fully populated?

We'd probably need to check this with legal. Normally when looking up permissions to actually perform an action we need up-to-date info in case the permissions were changed.

Perhaps we might be allowed a short TTL, in which case this could allow a user to keep refreshing the page in one sitting, but if they came back later, they'd need to do the same again.

Can we cache the permission lookup requests? And when building the cache, support multiple attempts at the API requests until the cache is fully populated?

We'd probably need to check this with legal. Normally when looking up permissions to actually perform an action we need up-to-date info in case the permissions were changed.

Perhaps we might be allowed a short TTL, in which case this could allow a user to keep refreshing the page in one sitting, but if they came back later, they'd need to do the same again.

After some discussion, we're fairly confident the permissions are only updated by means that we could hook into to invalidate the cache, so this may be feasible after all. Let's explore this as a solution.

Don't do permission checks up-front when looking up a named user

  • The API calls check the IP reveal permissions and the permissions to see hidden users/revisions. For named users, we don't need to check IP reveal permissions, and in most cases the permissions to see hidden information won't change the results.
  • We could therefore remove the API calls entirely, and assume the user does not have the rights to see hidden users/revisions.
  • We could offer a way for users who do have those rights to manually trigger a permissions look-up, and add a warning that if the user has edited many wikis, they may not see complete results. @KColeman-WMF perhaps we could discuss this and come up with a workflow for this situation?

I suggest we include a checkbox for users who wish to also see deleted revisions, which would trigger a permission lookup if the user selects the checkbox. We can include a warning in the description for users who do not have permissions.

image.png (758×1 px, 93 KB)

Sidenote: What terminology is best? The other checkboxes say latest revisions, minor edits and revision deleted. Should we be using edit or revision? Should it be revision deleted or deleted revisions?

The way it looks in the screenshot looks fine to me. "Deleted revisions" might be confused with revision to a since-deleted article (i.e. what shows up on Special:DeletedContributions), where as "revision deleted" more clearly refers to the specific revision.

I think since the other checkbox says "Only show revision deleted", it makes sense for this one to say "Show revision deleted" since they refer to the same thing.

Change #1123674 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] http: report curl_multi_exec() errors in MultiHttpClient

https://gerrit.wikimedia.org/r/1123674

Change #1123687 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] http: remove outdated workaround for PHP bug 63411

https://gerrit.wikimedia.org/r/1123687

Change #1123674 merged by jenkins-bot:

[mediawiki/core@master] http: report curl_multi_exec() errors in MultiHttpClient

https://gerrit.wikimedia.org/r/1123674

Change #1123687 merged by jenkins-bot:

[mediawiki/core@master] http: remove outdated workaround for PHP bug 63411

https://gerrit.wikimedia.org/r/1123687

Change #1124771 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/CheckUser@master] GlobalContributions: Log unexpected 200 responses

https://gerrit.wikimedia.org/r/1124771

Change #1124771 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] GlobalContributions: Log unexpected 200 responses

https://gerrit.wikimedia.org/r/1124771

Change #1125992 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] http: Promote MultiHttpClient warnings to errors

https://gerrit.wikimedia.org/r/1125992

Change #1125992 merged by jenkins-bot:

[mediawiki/core@master] http: Promote MultiHttpClient warnings to errors

https://gerrit.wikimedia.org/r/1125992

Change #1126982 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@wmf/1.44.0-wmf.20] http: Promote MultiHttpClient warnings to errors

https://gerrit.wikimedia.org/r/1126982

Change #1126982 merged by jenkins-bot:

[mediawiki/core@wmf/1.44.0-wmf.20] http: Promote MultiHttpClient warnings to errors

https://gerrit.wikimedia.org/r/1126982

Mentioned in SAL (#wikimedia-operations) [2025-03-12T15:18:31Z] <mszabo@deploy2002> Started scap sync-world: Backport for [[gerrit:1126979|GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125)]], [[gerrit:1126982|http: Promote MultiHttpClient warnings to errors (T384717)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-12T15:22:21Z] <mszabo@deploy2002> mszabo: Backport for [[gerrit:1126979|GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125)]], [[gerrit:1126982|http: Promote MultiHttpClient warnings to errors (T384717)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-12T15:30:33Z] <mszabo@deploy2002> Finished scap sync-world: Backport for [[gerrit:1126979|GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125)]], [[gerrit:1126982|http: Promote MultiHttpClient warnings to errors (T384717)]] (duration: 12m 01s)

Change #1127145 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] http: add happy-path test for MultiHttpClient

https://gerrit.wikimedia.org/r/1127145

Change #1127146 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] http: ensure curl message queue is emptied in MultiHttpClient

https://gerrit.wikimedia.org/r/1127146

Change #1127145 merged by jenkins-bot:

[mediawiki/core@master] http: add happy-path test for MultiHttpClient

https://gerrit.wikimedia.org/r/1127145

Change #1127146 merged by jenkins-bot:

[mediawiki/core@master] http: ensure curl message queue is emptied in MultiHttpClient

https://gerrit.wikimedia.org/r/1127146

Tchanders added subscribers: mszabo, Tchanders.

Reassigning to @mszabo, who has picked up the work on this recently.

Change #1128387 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/CheckUser@master] GlobalContributions: Use unique CentralAuth tokens per wiki

https://gerrit.wikimedia.org/r/1128387

Change #1128387 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] GlobalContributions: Use unique CentralAuth tokens per request

https://gerrit.wikimedia.org/r/1128387

Change #1128493 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/CheckUser@wmf/1.44.0-wmf.20] GlobalContributions: Use unique CentralAuth tokens per request

https://gerrit.wikimedia.org/r/1128493

Change #1128493 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@wmf/1.44.0-wmf.20] GlobalContributions: Use unique CentralAuth tokens per request

https://gerrit.wikimedia.org/r/1128493

Mentioned in SAL (#wikimedia-operations) [2025-03-17T20:39:49Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1128493|GlobalContributions: Use unique CentralAuth tokens per request (T384717)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-17T20:43:41Z] <tgr@deploy2002> tgr, mszabo: Backport for [[gerrit:1128493|GlobalContributions: Use unique CentralAuth tokens per request (T384717)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-17T20:57:49Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1128493|GlobalContributions: Use unique CentralAuth tokens per request (T384717)]] (duration: 18m 00s)

I think this is done, and follow-up work is captured in tasks linked from here. @mszabo sounds ok?

I think this is done, and follow-up work is captured in tasks linked from here. @mszabo sounds ok?

Yeah, I think we can close this now.