Page MenuHomePhabricator

User not found by actor ID: [id]
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: 1.36.0-wmf.35

message
User not found by actor ID: [id]

Impact

Unknown.

Actual IDs can be found in logstash.

Notes

Looks like 11 of these since deploy of wmf.35 (T274939) to group2.

Details

Request ID
8250daf8-146d-4856-8851-59af4771f4c8
Request URL
https://en.wikipedia.org/w/api.php
Stack Trace
exception.trace
from /srv/mediawiki/php-1.36.0-wmf.35/includes/changes/RecentChange.php(1255)
#0 /srv/mediawiki/php-1.36.0-wmf.35/includes/changes/RecentChange.php(1051): RecentChange->getUserIdentityFromAnyId(NULL, string, integer)
#1 /srv/mediawiki/php-1.36.0-wmf.35/includes/changes/RecentChange.php(155): RecentChange->loadFromRow(stdClass)
#2 /srv/mediawiki/php-1.36.0-wmf.35/includes/specials/SpecialRecentChanges.php(473): RecentChange::newFromRow(stdClass)
#3 /srv/mediawiki/php-1.36.0-wmf.35/includes/specialpage/ChangesListSpecialPage.php(1681): SpecialRecentChanges->outputChangesList(Wikimedia\Rdbms\ResultWrapper, FormOptions)
#4 /srv/mediawiki/php-1.36.0-wmf.35/includes/specialpage/ChangesListSpecialPage.php(681): ChangesListSpecialPage->webOutput(Wikimedia\Rdbms\ResultWrapper, FormOptions)
#5 /srv/mediawiki/php-1.36.0-wmf.35/includes/specials/SpecialRecentChanges.php(203): ChangesListSpecialPage->execute(string)
#6 /srv/mediawiki/php-1.36.0-wmf.35/includes/specialpage/SpecialPage.php(646): SpecialRecentChanges->execute(string)
#7 /srv/mediawiki/php-1.36.0-wmf.35/includes/specialpage/SpecialPageFactory.php(1375): SpecialPage->run(string)
#8 /srv/mediawiki/php-1.36.0-wmf.35/includes/specialpage/SpecialPageFactory.php(1433): MediaWiki\SpecialPage\SpecialPageFactory->executePath(Title, RequestContext, boolean, MediaWiki\Linker\LinkRenderer)
#9 /srv/mediawiki/php-1.36.0-wmf.35/includes/parser/Parser.php(3139): MediaWiki\SpecialPage\SpecialPageFactory->capturePath(Title, RequestContext, MediaWiki\Linker\LinkRenderer)
#10 /srv/mediawiki/php-1.36.0-wmf.35/includes/parser/PPFrame_Hash.php(263): Parser->braceSubstitution(array, PPFrame_Hash)
#11 /srv/mediawiki/php-1.36.0-wmf.35/includes/parser/Parser.php(2880): PPFrame_Hash->expand(PPNode_Hash_Tree, integer)
#12 /srv/mediawiki/php-1.36.0-wmf.35/includes/parser/Parser.php(1560): Parser->replaceVariables(string)
#13 /srv/mediawiki/php-1.36.0-wmf.35/includes/parser/Parser.php(653): Parser->internalParse(string)
#14 /srv/mediawiki/php-1.36.0-wmf.35/includes/content/WikitextContent.php(375): Parser->parse(string, Title, ParserOptions, boolean, boolean, integer)
#15 /srv/mediawiki/php-1.36.0-wmf.35/includes/content/AbstractContent.php(591): WikitextContent->fillParserOutput(Title, integer, ParserOptions, boolean, ParserOutput)
#16 /srv/mediawiki/php-1.36.0-wmf.35/includes/Revision/RenderedRevision.php(266): AbstractContent->getParserOutput(Title, integer, ParserOptions, boolean)
#17 /srv/mediawiki/php-1.36.0-wmf.35/includes/Revision/RenderedRevision.php(235): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(WikitextContent, boolean)
#18 /srv/mediawiki/php-1.36.0-wmf.35/includes/Revision/RevisionRenderer.php(215): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string, array)
#19 /srv/mediawiki/php-1.36.0-wmf.35/includes/Revision/RevisionRenderer.php(152): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
#20 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#21 /srv/mediawiki/php-1.36.0-wmf.35/includes/Revision/RenderedRevision.php(197): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
#22 /srv/mediawiki/php-1.36.0-wmf.35/includes/poolcounter/PoolWorkArticleView.php(137): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#23 /srv/mediawiki/php-1.36.0-wmf.35/includes/poolcounter/PoolCounterWork.php(162): PoolWorkArticleView->doWork()
#24 /srv/mediawiki/php-1.36.0-wmf.35/includes/poolcounter/PoolCounterWork.php(173): PoolCounterWork->execute(boolean)
#25 /srv/mediawiki/php-1.36.0-wmf.35/includes/page/ParserOutputAccess.php(260): PoolCounterWork->execute()
#26 /srv/mediawiki/php-1.36.0-wmf.35/includes/page/WikiPage.php(1314): MediaWiki\Page\ParserOutputAccess->getParserOutput(WikiPage, ParserOptions, MediaWiki\Revision\RevisionStoreCacheRecord, integer)
#27 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiParse.php(83): WikiPage->getParserOutput(ParserOptions, NULL, boolean)
#28 /srv/mediawiki/php-1.36.0-wmf.35/includes/poolcounter/PoolCounterWorkViaCallback.php(74): ApiParse::{closure}()
#29 /srv/mediawiki/php-1.36.0-wmf.35/includes/poolcounter/PoolCounterWork.php(162): PoolCounterWorkViaCallback->doWork()
#30 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiParse.php(90): PoolCounterWork->execute()
#31 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiParse.php(694): ApiParse->getPageParserOutput(WikiPage, NULL, ParserOptions, boolean)
#32 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiParse.php(218): ApiParse->getParsedContent(WikiPage, ParserOptions, boolean, NULL, NULL, boolean)
#33 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiMain.php(1646): ApiParse->execute()
#34 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiMain.php(616): ApiMain->executeAction()
#35 /srv/mediawiki/php-1.36.0-wmf.35/includes/api/ApiMain.php(587): ApiMain->executeActionWithErrorHandling()
#36 /srv/mediawiki/php-1.36.0-wmf.35/api.php(90): ApiMain->execute()
#37 /srv/mediawiki/php-1.36.0-wmf.35/api.php(45): wfApiMain()
#38 /srv/mediawiki/w/api.php(3): require(string)
#39 {main}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Interesting. Definitely a fallout of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/669946/16/includes/changes/RecentChange.php#1226

I've looked over a few of real actor IDs in logstash and checked that rows exist in the database with these IDs. Also interesting that it seems to be happening on during a parse if a SpecialRecentChanges is transcluded into the page.

Interesting. Definitely a fallout of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/669946/16/includes/changes/RecentChange.php#1226

I've looked over a few of real actor IDs in logstash and checked that rows exist in the database with these IDs. Also interesting that it seems to be happening on during a parse if a SpecialRecentChanges is transcluded into the page.

Could this be a race condition? If an (anon?) user for whom we don't have an actor ID yet performs an edit, the parsing happens right after we assign the actor ID and save the RC entry - and we might fail to look up the actor while trying to render the RC entry belonging to the edit itself.

That would be in the same request, and normally, the cache in ActorStore would prevent the race condition. But if we are looping over RecentChanges, we would be looking up many actors, and the cache is small, so the entry for the new actor might have been evicted.

This seems like a rare edge case, but I can see how it would happen when transcluding RecentChanges...

If the above is the reason, I can see two solutions:

  • make sure we remember all actors we inserted in the current request, by putting them into a separate cache
  • try harder to look up actors by falling back to master of the ID isn't found. Since an actor with a known ID should always exist, this should be rarely needed, so we could always try it when an ID isn't found. We do something similar in RevisionStore.

It seems like it happens for recent changes for anons who just made their first contribution. Before the patch referenced above we were using User::newFromAnyId which would just instantiate a new User no matter whether it's found in the DB or not, and then upon ->load it will just show the user defaults, while now we hard-crash if the actor is not found in the actor table.

try harder to look up actors by falling back to master of the ID isn't found. Since an actor with a known ID should always exist, this should be rarely needed, so we could always try it when an ID isn't found. We do something similar in RevisionStore.

In User we've been checking LoadBalancer::hasOrMadeRecentMasterChanges and if that's the case we were reading from master. Actor store can do the same. We don't have that method on the DB that's passed, but maybe we can just look on the injected load balancer - it's for the same domain as the injected DB. (we need to expand the use of checkDatabaseDomain in ActorStore..

Now up to 43 of these since deploy. I'm inclined to think this is probably a train blocker for .35 unless we have a patch shortly.

cc: @dancy

Change 673363 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] ActorStore::getActorById - fall back to master.

https://gerrit.wikimedia.org/r/673363

Change 673115 had a related patch set uploaded (by Brennen Bearnes; owner: Ppchelko):
[mediawiki/core@wmf/1.36.0-wmf.35] ActorStore::getActorById - fall back to master.

https://gerrit.wikimedia.org/r/673115

Change 673363 merged by jenkins-bot:
[mediawiki/core@master] ActorStore::getActorById - fall back to master.

https://gerrit.wikimedia.org/r/673363

Change 673115 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.35] ActorStore::getActorById - fall back to master.

https://gerrit.wikimedia.org/r/673115

Mentioned in SAL (#wikimedia-operations) [2021-03-18T22:53:04Z] <brennen@deploy1002> Synchronized php-1.36.0-wmf.35/includes/specials/SpecialContributions.php: Backport: [[gerrit:673115|ActorStore::getActorById - fall back to master. (T277795)]] (duration: 01m 07s)

If the above is the reason, I can see two solutions:

  • make sure we remember all actors we inserted in the current request, by putting them into a separate cache
  • try harder to look up actors by falling back to master of the ID isn't found. Since an actor with a known ID should always exist, this should be rarely needed, so we could always try it when an ID isn't found. We do something similar in RevisionStore.

RevisionStore has a fairly elaborate and known external use of it, because rev IDs can be publicly found in API responses and RCFeeds in a very quick way and sometimes a client can submit a follow-up query (without a write session or chronology protector) that hits a different replica that doesn't have it yet. That's still something we'd like to fix long-term since it's a bit of a load risk imho as it stands, but it's an old one so we don't care as much.

For ActorStore, it seems indeed like it can only be a logic bug, state bug, or an own-inserted row that we can get a hold of without a master query (or limited to after we have done writes already, and thus not query master on GET). The current patch inserts it without either of the proposed solutions (insertion cache, and hasWrites-check-for-master). Would it make sense to do one or both of those next, and then try reverting the hotfix?

Mentioned in SAL (#wikimedia-operations) [2021-03-18T23:06:44Z] <brennen> train status: 1.36.0-wmf.35 (T274939) stable on all wikis after deploy of hotfix for T277795

Mentioned in SAL (#wikimedia-operations) [2021-03-18T23:06:44Z] <brennen> train status: 1.36.0-wmf.35 (T274939) stable on all wikis after deploy of hotfix for T277795

I may have spoken too soon. 2 more of these since 23:05 UTC.

Mentioned in SAL (#wikimedia-operations) [2021-03-18T23:35:42Z] <brennen@deploy1002> Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115|ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 58s)

I may have spoken too soon. 2 more of these since 23:05 UTC.

Disregard; synced wrong file. Now corrected. I'll watch logs for a bit, but I think this is a good indication I should back away from the keyboard for the day.

Mentioned in SAL (#wikimedia-operations) [2021-03-18T23:38:51Z] <brennen@deploy1002> Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115|ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 57s)

For ActorStore, it seems indeed like it can only be a logic bug, state bug, or an own-inserted row that we can get a hold of without a master query (or limited to after we have done writes already, and thus not query master on GET). The current patch inserts it without either of the proposed solutions (insertion cache, and hasWrites-check-for-master). Would it make sense to do one or both of those next, and then try reverting the hotfix?

We could - if you think that would be valuable from a performance perspective, please file a ticket and put it on the Platform Team Workboards (MW Expedition) workboard (in this case - generally, please use the general Platform Engineering inbox). But I'm not convinced it's worthwhile:

  • The issue is expected to be relatively rare (minutes between occurrences)
  • It will happen only in requests that are already talking to master anyway (assumption, to be validated, e.g. by checking hasWrites)
  • If the above is wrong, and we are having this issue e.g. in GET request immediately following an edit (or other write operation), neither of the optimizations will help, and we will end up doing what we are doing now: just fall back to master.

So I think we stand to gain little by optimizing this. Am I overlooking anything?

RevisionStore has a fairly elaborate and known external use of it, because rev IDs can be publicly found in API responses and RCFeeds in a very quick way and sometimes a client can submit a follow-up query (without a write session or chronology protector) that hits a different replica that doesn't have it yet. That's still something we'd like to fix long-term since it's a bit of a load risk imho as it stands, but it's an old one so we don't care as much.

For actor store, we don't expose the actor ID externally. But the same issue applies for looking up actors by name, which we will do routinely for the current actor. If we fail to find the actor ID for a name because we are hitting a stale replica, we'll just assume there is no actor ID yet; if we need one, we'll try to insert it, the insert will fail, and we'll read it from master. Effectively, that's also falling back to master, but via an attempted insert...

We could avoid that by making lookups by name also fall back to master, but that's expensive, since it would happen very frequently for names we really don't have an ID for yet. As opposed to lookups by ID, which should never fail, unless the replica is stale.

Still seeing User not found by actor ID: messages at a lot rate (< 8 / hour). Most are from jobrunner.discovery.wmnet with en.wikipedia.org coming in second.

Can someone knowledgeable tell me if these errors have any ill effects other than on logs?

Can someone knowledgeable tell me if these errors have any ill effects other than on logs?

Depends on where it happens. Generally, it leads to whatever is currently happening to fail. It will probably make a job fail, or cause Special:Watchlist to error out, etc.

Still seeing User not found by actor ID: messages at a lot rate (< 8 / hour). Most are from jobrunner.discovery.wmnet with en.wikipedia.org coming in second.

I am looking at one of the instances, actor ID: 25670078 on dewiki. Three failures from the jobrunner, via RefreshLinksJob, within a couple of seconds. The stack trace indicates that this is another instance of RecentChanges being included as a template on a page.

The timestamp of the last error is 2021-03-19T16:09:31 (CET), the timestamp of the edit in the database is 2021-03-19T15:09:27 (UTC). Adjusting for time zone, the error happened three times within four seconds after the edit. This was the first edit recorded for the user. The user-account was auto-created from a global account in 2018, but there had not been any activity since, so I assume the actor ID was freshly assigned when the edit happened.

So, it's a race condition caused by replication lag. But why isn't it fixed by falling back to master?

Is it possible that the job runner has a "repeatable read" snapshot on master that is outdated? Can we somehow flush any such snapshots between jobs? @aaron, do you know?

Change 673534 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Try even harder if we can't find an actor by ID.

https://gerrit.wikimedia.org/r/673534

dancy triaged this task as Unbreak Now! priority.Mar 22 2021, 2:58 PM

Change 674084 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Permanently cache actors we've just created in this request

https://gerrit.wikimedia.org/r/674084

After @brennen deployed PEtr's fix, I see exactly one occurrence that wasn't triggered via runSingleJob.php: fNYcTHgBGiM4niWIGqTc is an action=parse API request, which apparently tried to parse a page that transcludes Special:RecentChanges. I can see how this would fail the actor of a very recent edit on a replica, but I don't really see how it can fail to find it on the master database. But I don't see how this could happen during a runSingleJob.php query either...

It is curious to note that all failures happen when trying to transclude Special:RecentChanges. Never when just looking at Special:RecentChanges. This has to tell us something, but I'm stumped.

Looking at when these errors happen, I think my patch that changes the lock mode on the query will not help. It would only help to avoid a repeatable-read snapshot. I don't see how or why the queries in question would have such a stale snapshot.

Change 674105 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Do not crash if RecentChange performer not found.

https://gerrit.wikimedia.org/r/674105

Change 674131 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] ActorStore: clean cache if transaction aborted.

https://gerrit.wikimedia.org/r/674131

This may have the same root cause as T224949: A transaction that inserted into the actor table was rolled back, but the actor ID produced in that transaction is still used and recorded into the database. When a subsequent requests try to find the row in the actor table, they fail - but only briefly, until a different actor is inserted into the actor table, with the same ID.

The fundamental issue is that any auto-increment ID from an insert is only valid within the same transaction. It must not be remembered or cached until the transaction has been committed, or appropriate steps must be taken to clean the cache when the transaction is rolled back. This implies in particular that DeferredUpdates and Jobs that rely on the outcome of a database update must not be enqueued until the transaction has been committed.

Change 674131 merged by jenkins-bot:
[mediawiki/core@master] ActorStore: clean cache if transaction aborted.

https://gerrit.wikimedia.org/r/674131

Status update: we expect https://gerrit.wikimedia.org/r/674131 to completely mitigate the issue once deployed.
There's some followups we would like to make maybe, but they're not required to unblock the train.

Shouldn't https://gerrit.wikimedia.org/r/674131 be cherry picked and deployed for 1.36.0-wmf.35?

It is at least included in 1.36.0-wmf.36 which I am rolling this week.

I guess it's not fixed:

@hashar wrote at T278479
message
User not found by actor ID: <id>
exception.trace
from /srv/mediawiki/php-1.36.0-wmf.36/includes/changes/RecentChange.php(1253)
#0 /srv/mediawiki/php-1.36.0-wmf.36/includes/changes/RecentChange.php(1049): RecentChange->getUserIdentityFromAnyId(NULL, string, integer)
#1 /srv/mediawiki/php-1.36.0-wmf.36/includes/changes/RecentChange.php(155): RecentChange->loadFromRow(stdClass)
#2 /srv/mediawiki/php-1.36.0-wmf.36/includes/specials/SpecialRecentChanges.php(473): RecentChange::newFromRow(stdClass)
#3 /srv/mediawiki/php-1.36.0-wmf.36/includes/specialpage/ChangesListSpecialPage.php(1681): SpecialRecentChanges->outputChangesList(Wikimedia\Rdbms\ResultWrapper, FormOptions)
#4 /srv/mediawiki/php-1.36.0-wmf.36/includes/specialpage/ChangesListSpecialPage.php(681): ChangesListSpecialPage->webOutput(Wikimedia\Rdbms\ResultWrapper, FormOptions)
#5 /srv/mediawiki/php-1.36.0-wmf.36/includes/specials/SpecialRecentChanges.php(203): ChangesListSpecialPage->execute(string)
#6 /srv/mediawiki/php-1.36.0-wmf.36/includes/specialpage/SpecialPage.php(646): SpecialRecentChanges->execute(string)
#7 /srv/mediawiki/php-1.36.0-wmf.36/includes/specialpage/SpecialPageFactory.php(1382): SpecialPage->run(string)
#8 /srv/mediawiki/php-1.36.0-wmf.36/includes/specialpage/SpecialPageFactory.php(1440): MediaWiki\SpecialPage\SpecialPageFactory->executePath(Title, RequestContext, boolean, MediaWiki\Linker\LinkRenderer)
#9 /srv/mediawiki/php-1.36.0-wmf.36/includes/parser/Parser.php(3138): MediaWiki\SpecialPage\SpecialPageFactory->capturePath(Title, RequestContext, MediaWiki\Linker\LinkRenderer)
#10 /srv/mediawiki/php-1.36.0-wmf.36/includes/parser/PPFrame_Hash.php(263): Parser->braceSubstitution(array, PPFrame_Hash)
#11 /srv/mediawiki/php-1.36.0-wmf.36/includes/parser/Parser.php(2879): PPFrame_Hash->expand(PPNode_Hash_Tree, integer)
#12 /srv/mediawiki/php-1.36.0-wmf.36/includes/parser/Parser.php(1549): Parser->replaceVariables(string)
#13 /srv/mediawiki/php-1.36.0-wmf.36/includes/parser/Parser.php(639): Parser->internalParse(string)
#14 /srv/mediawiki/php-1.36.0-wmf.36/includes/content/WikitextContent.php(375): Parser->parse(string, Title, ParserOptions, boolean, boolean, integer)
#15 /srv/mediawiki/php-1.36.0-wmf.36/includes/content/AbstractContent.php(591): WikitextContent->fillParserOutput(Title, integer, ParserOptions, boolean, ParserOutput)
#16 /srv/mediawiki/php-1.36.0-wmf.36/includes/Revision/RenderedRevision.php(266): AbstractContent->getParserOutput(Title, integer, ParserOptions, boolean)
#17 /srv/mediawiki/php-1.36.0-wmf.36/includes/Revision/RenderedRevision.php(235): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(WikitextContent, boolean)
#18 /srv/mediawiki/php-1.36.0-wmf.36/includes/Revision/RevisionRenderer.php(215): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string, array)
#19 /srv/mediawiki/php-1.36.0-wmf.36/includes/Revision/RevisionRenderer.php(152): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
#20 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#21 /srv/mediawiki/php-1.36.0-wmf.36/includes/Revision/RenderedRevision.php(197): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
#22 /srv/mediawiki/php-1.36.0-wmf.36/includes/jobqueue/jobs/RefreshLinksJob.php(259): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput(array)
#23 /srv/mediawiki/php-1.36.0-wmf.36/includes/jobqueue/jobs/RefreshLinksJob.php(179): RefreshLinksJob->getParserOutput(MediaWiki\Revision\RevisionRenderer, ParserCache, WikiPage, BufferingStatsdDataFactory)
#24 /srv/mediawiki/php-1.36.0-wmf.36/includes/jobqueue/jobs/RefreshLinksJob.php(126): RefreshLinksJob->runForTitle(Title)
#25 /srv/mediawiki/php-1.36.0-wmf.36/extensions/EventBus/includes/JobExecutor.php(79): RefreshLinksJob->run()
#26 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#27 {main}

I went over a few instances of this.

The data in the database is consistent between revision, revision_actor_temp, actor and recent changes tables - they all reference each other correctly. And yet, 3 seconds after it's been written to the database, we crash not being able to fetch the actor even though we fallback to master now.

15:17 Pchelolo: Krinkle: what do you think about this theory: RefreshLinks is a loooong running job, so we establish a master connection when we begin. At some point, we read RecentChanges from a 'recentchanges' group replica - it's well caught up. Then we try to read 'actor' form a replica from another group - that one is not caught up, we fail. The we fallback to master connection, but it was opened looong time ago, and we read repeatable read s
15:17 Pchelolo: napshot - not find anything as well
15:17 Pchelolo: and thus we crash entirely.
15:18 Krinkle: Pchelolo: is ActorStore in master same as this weeks' wmf branch?
15:18 Pchelolo: same
15:18 Krinkle: k
15:20 Krinkle: Pchelolo: yeah, that seems plausible

In this case we actually have all the data in the row since we've joined it with actor when fetching RecentChange, so we can just instantiate UserIdentityValue directly.

Change 674975 had a related patch set uploaded (by Ppchelko; author: Ppchelko):
[mediawiki/core@master] RecentChange: directly build the user identity if we have the data

https://gerrit.wikimedia.org/r/674975

Change 674836 had a related patch set uploaded (by Ppchelko; author: Ppchelko):
[mediawiki/core@wmf/1.36.0-wmf.36] RecentChange: directly build the user identity if we have the data

https://gerrit.wikimedia.org/r/674836

Change 674975 merged by jenkins-bot:
[mediawiki/core@master] RecentChange: directly build the user identity if we have the data

https://gerrit.wikimedia.org/r/674975

@Pchelolo awesome! I am deploying this fix today given 1.36.0-wmf.36 is only on testwikis for now and this way we are settled for a train deployment on monday. Thank you!

Change 674836 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.36] RecentChange: directly build the user identity if we have the data

https://gerrit.wikimedia.org/r/674836

Mentioned in SAL (#wikimedia-operations) [2021-03-26T17:44:53Z] <hashar@deploy1002> Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - T277795 (duration: 01m 06s)

Pchelolo claimed this task.

Doesn't seem to happen anymore on wmf.36

Change 674105 abandoned by Ppchelko:

[mediawiki/core@master] Do not crash if RecentChange performer not found.

Reason:

not needed anymore

https://gerrit.wikimedia.org/r/674105

Change #673534 abandoned by Daniel Kinzler:

[mediawiki/core@master] Try even harder if we can't find an actor by ID.

Reason:

no longer needed

https://gerrit.wikimedia.org/r/673534