Page MenuHomePhabricator

[LEX] OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2
exception.trace
from /srv/mediawiki/php-1.40.0-wmf.18/extensions/WikibaseLexeme/src/Domain/Model/Lexeme.php(310)
#0 /srv/mediawiki/php-1.40.0-wmf.18/extensions/WikibaseLexeme/src/Domain/Diff/LexemePatcher.php(174): Wikibase\Lexeme\Domain\Model\Lexeme->getForm(Wikibase\Lexeme\Domain\Model\FormId)
#1 /srv/mediawiki/php-1.40.0-wmf.18/extensions/WikibaseLexeme/src/Domain/Diff/LexemePatcher.php(92): Wikibase\Lexeme\Domain\Diff\LexemePatcher->patchForms(Wikibase\Lexeme\Domain\Model\Lexeme, Wikibase\Lexeme\Domain\Diff\LexemeDiff)
#2 /srv/mediawiki/php-1.40.0-wmf.18/extensions/Wikibase/lib/packages/wikibase/data-model-services/src/Diff/EntityPatcher.php(40): Wikibase\Lexeme\Domain\Diff\LexemePatcher->patchEntity(Wikibase\Lexeme\Domain\Model\Lexeme, Wikibase\Lexeme\Domain\Diff\LexemeDiff)
#3 /srv/mediawiki/php-1.40.0-wmf.18/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditEntity.php(464): Wikibase\DataModel\Services\Diff\EntityPatcher->patchEntity(Wikibase\Lexeme\Domain\Model\Lexeme, Wikibase\Lexeme\Domain\Diff\LexemeDiff)
#4 /srv/mediawiki/php-1.40.0-wmf.18/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditEntity.php(730): Wikibase\Repo\EditEntity\MediawikiEditEntity->fixEditConflict(Wikibase\Lexeme\Domain\Model\Lexeme)
#5 /srv/mediawiki/php-1.40.0-wmf.18/extensions/Wikibase/repo/includes/EditEntity/StatsdSaveTimeRecordingEditEntity.php(77): Wikibase\Repo\EditEntity\MediawikiEditEntity->attemptSave(Wikibase\Lexeme\Domain\Model\Lexeme, string, integer, boolean, boolean, array)
#6 /srv/mediawiki/php-1.40.0-wmf.18/extensions/WikibaseLexeme/src/MediaWiki/Api/RemoveForm.php(185): Wikibase\Repo\EditEntity\StatsdSaveTimeRecordingEditEntity->attemptSave(Wikibase\Lexeme\Domain\Model\Lexeme, string, integer, boolean, NULL, array)
#7 /srv/mediawiki/php-1.40.0-wmf.18/includes/api/ApiMain.php(1901): Wikibase\Lexeme\MediaWiki\Api\RemoveForm->execute()
#8 /srv/mediawiki/php-1.40.0-wmf.18/includes/api/ApiMain.php(878): ApiMain->executeAction()
#9 /srv/mediawiki/php-1.40.0-wmf.18/includes/api/ApiMain.php(849): ApiMain->executeActionWithErrorHandling()
#10 /srv/mediawiki/php-1.40.0-wmf.18/api.php(90): ApiMain->execute()
#11 /srv/mediawiki/php-1.40.0-wmf.18/api.php(45): wfApiMain()
#12 /srv/mediawiki/w/api.php(3): require(string)
#13 {main}
Impact
Notes

Event Timeline

Interesting – when I call wblremoveform with a form ID that doesn’t exist (e.g. L123-F1000000), I get an ordinary error:

{
    "error": {
        "code": "not-found",
        "info": "Field \"id\" contains an ID of a Form \"L123-F1000000\" that could not be found.",
        "parameterName": "id",
        "fieldPath": [],
        "docref": "See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
    },
    "servedby": "mw1470"
}

So what’s different about the request that produced the OutOfRangeException?

Still can’t reproduce it when I try to make two concurrent wblremoveform requests locally – one succeeds and the other returns a proper “not found” error (edit: or an “editconflict” error – both seem about equally likely), without crashing:

( async function() {
  const api = new mw.Api();
  const lexemeId = ( await api.postWithEditToken( {
    action: 'wbeditentity',
    new: 'lexeme',
    data: JSON.stringify( {
      lexicalCategory: 'Q11',
      language: 'Q11',
      lemmas: {
        en: { language: 'en', value: `T326768 ${new Date().toISOString()}` },
      },
      forms: [ {
        add: '',
        representations: {
          en: { language: 'en', value: `T326768 ${new Date().toISOString()}` },
        },
        grammaticalFeatures: [],
      } ],
    } ),
    summary: 'T326768',
  } ) ).entity.id;
  const deleteForm = () => api.postWithEditToken( {
    action: 'wblremoveform',
    id: `${lexemeId}-F1`,
  } );
  await Promise.all( [ deleteForm(), deleteForm() ] );
} )();

(You might need to change the language and lexical category to a different item ID than Q11 depending on what your local wiki looks like.)

Upon closer inspection, this doesn’t look like two concurrent requests – L136328-F2 was removed 23:51, while the logstash error is from 23:59:46.647, about eight minutes later.

So I tried making a second non-concurrent wblremoveform request instead, with the same baserevid, but that still only gives me a proper error (always “editconflict” this time), not a crash:

(async function() {
  const api = new mw.Api();
  const { id: lexemeId, lastrevid: revisionId } = ( await api.postWithEditToken( {
    action: 'wbeditentity',
    new: 'lexeme',
    data: JSON.stringify( {
      lexicalCategory: 'Q11',
      language: 'Q11',
      lemmas: {
        en: { language: 'en', value: `T326768 ${new Date().toISOString()}` },
      },
      forms: [ {
        add: '',
        representations: {
          en: { language: 'en', value: `T326768 ${new Date().toISOString()}` },
        },
        grammaticalFeatures: [],
      } ],
    } ),
    summary: 'T326768',
  } ) ).entity;
  const deleteForm = () => api.postWithEditToken( {
    action: 'wblremoveform',
    id: `${lexemeId}-F1`,
    baserevid: revisionId,
  } );
  await deleteForm();
  await deleteForm();
})();

Also haven’t had luck trying to produce an edit conflict by editing the form and then removing it:

(async function() {
  const api = new mw.Api();
  const { id: lexemeId, lastrevid: revisionId } = ( await api.postWithEditToken( {
    action: 'wbeditentity',
    new: 'lexeme',
    data: JSON.stringify( {
      lexicalCategory: 'Q11',
      language: 'Q11',
      lemmas: {
        en: { language: 'en', value: `T326768 ${new Date().toISOString()}` },
      },
      forms: [ {
        add: '',
        representations: {
          en: { language: 'en', value: `T326768 ${new Date().toISOString()}` },
        },
        grammaticalFeatures: [],
      } ],
    } ),
    summary: 'T326768',
  } ) ).entity;
  const formId = `${lexemeId}-F1`;
  await api.postWithEditToken( {
    action: 'wbleditformelements',
    formId: formId,
    data: JSON.stringify( {
      representations: {
        en: { language: 'en', value: `T326768 (edited) ${new Date().toISOString()}` },
      },
      grammaticalFeatures: [],
    } ),
    baserevid: revisionId,
  } );
  await api.postWithEditToken( {
    action: 'wblremoveform',
    id: formId,
    baserevid: revisionId,
  } );
})();

Those edits actually all succeed.

The strange thing is that line 174 of LexemePatcher.php (the second line of the stack trace) is the second line here:

case $formDiff instanceof ChangeFormDiffOp:
	$form = $lexeme->getForm( $formDiff->getFormId() );
	if ( $form !== null ) {
		$this->formPatcher->patch( $form, $formDiff );
	}
	break;

Lexeme::getForm() can’t actually return null, so maybe that if ( $form !== null ) should just be turned into a try/catch for the OutOfRangeException we’re seeing here. But I don’t understand why the wblremoveform request would end up in this code at all – $formDiff should be a RemoveFormDiff, not a ChangeFormDiffOp, I think.

This has also happened a few other times in the last 7 days – 30 times on L136328-F2 (the form ID mentioned before), and once each on L679424-F5 (Jan 12, 2023 @ 15:01:16.442) and L942105-F1 (Jan 11, 2023 @ 17:49:23.770). The latter two lexemes were edited by a non-anonymous user, عُثمان, so I’ve contacted him to see if he can enlighten us about what might cause this error.

عُثمان reports no user impact: apparently his requests to remove the two forms still succeeded, without even needing a retry? Very confusing.

Lexeme::getForm() can’t actually return null, so maybe that if ( $form !== null ) should just be turned into a try/catch for the OutOfRangeException we’re seeing here.

@thiemowmde actually suggested this change years ago: Fix unchecked errors in LexemePatcher::patchForms and Lexeme::getForm (after a previous change, Remove convenience function Lexeme::hasForm, had produced the OutOfRangeException in the first place). However, without being able to reproduce this bug, I’m not sure if we should try this fix or not, to be honest.

It was the other way around. I suggested to remove the problematic hasForm after the "unchecked errors" patch got stale. Reading the discussion I still don't understand why it wasn't merged. (Well, not being linked to a ticket definitely played a role.) The patcher just throws the wrong exception. It could be so easy. Please feel free to restore or redo the patch.

Oh, sorry, then I didn’t pay attention to the timestamps.

Change 879855 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/WikibaseLexeme@master] Add more logging to unexpected exception being thrown

https://gerrit.wikimedia.org/r/879855

This adds logging close to where the exception is being thrown and where we know we're in the wrong place. Not sure how useful it will be, though? I hope at that point, all the diffs we get should actually be serializable 🤞.

But maybe we want to add this logging instead somewhere further up the stack trace, where we have different information available?

Change 879855 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Add more logging to unexpected exception being thrown

https://gerrit.wikimedia.org/r/879855

ItamarWMDE renamed this task from OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2 to [LEX] OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2.Nov 8 2023, 10:51 AM
ItamarWMDE moved this task from Incoming to [DOT] By Project on the wmde-wikidata-tech board.
ItamarWMDE added a project: Wikidata Dev Team.

Prio Notes

Impact AreaAffected
production / end usersmaybe
monitoringyes
development effortsno
onboarding effortsno
additional stakeholdersmaybe
ItamarWMDE renamed this task from [LEX] OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2 to [SW] [LEX] OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2.Nov 21 2023, 3:56 PM
ItamarWMDE renamed this task from [SW] [LEX] OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2 to [LEX] OutOfRangeException: Lexeme L136328 doesn't have Form L136328-F2.Nov 22 2023, 10:06 AM

I think this is almost a duplicate of T284061: [LEX] [TECH] OutOfRangeException: Lexeme L493003 doesn't have sense L493003-S7, by the way – this task is about forms, that one about senses, but it’s likely that the senses code is just a copy of the forms code, and we should see if we can fix both at once IMHO.

Looking at logstash, the sense version of this error seems to be somewhat more common (which is annoying since we only added logging to the form version), but there’s still a couple of occurrences in logstash, such as this one (reqId:"4c2aaeea-5a28-4d57-afa8-47f447afed52"; @timestamp Dec 26, 2023 @ 18:11:25.115).

  • formId: L679512-F3 (was removed, incidentally by the same user as in T326768#8523696, but with a revision timestamp three minutes before the logstash timestamp)
  • representationDiff: a:4:{s:4:"data";a:0:{}s:5:"index";i:0;s:12:"typePointers";a:6:{s:3:"add";a:0:{}s:6:"remove";a:0:{}s:6:"change";a:0:{}s:4:"list";a:0:{}s:3:"map";a:0:{}s:4:"diff";a:0:{}}s:5:"assoc";s:1:"n";}
  • grammaticalFeaturesDiff: a:4:{s:4:"data";a:0:{}s:5:"index";i:0;s:12:"typePointers";a:6:{s:3:"add";a:0:{}s:6:"remove";a:0:{}s:6:"change";a:0:{}s:4:"list";a:0:{}s:3:"map";a:0:{}s:4:"diff";a:0:{}}s:5:"assoc";s:1:"n";}
  • statementsDiff: a:4:{s:4:"data";a:0:{}s:5:"index";i:0;s:12:"typePointers";a:6:{s:3:"add";a:0:{}s:6:"remove";a:0:{}s:6:"change";a:0:{}s:4:"list";a:0:{}s:3:"map";a:0:{}s:4:"diff";a:0:{}}s:5:"assoc";s:1:"t";}
  • existingForms: L679512-F1, L679512-F2, L679512-F4, L679512-F5

All three diffs are effectively just empty, and in fact they’re almost identical. Unserialized, they look like this:

[
    "data" => [],
    "index" => 0,
    "typePointers" => [
      "add" => [],
      "remove" => [],
      "change" => [],
      "list" => [],
      "map" => [],
      "diff" => [],
    ],
    "assoc" => "n",
]

Where the statementsDiff has "assoc" => "t" instead of "n". So we’ve ended up with a ChangeFormDiffOp for a form that should already be deleted, with no visible changes for that form. (Though it might be worth checking whether the diff contains any other $this->diffOps?)

It might also be worth logging (or getting from Hive, like in T255566#9484983?) the request parameters, so we know which request this error comes from. (And if we’re adding more logging, we should definitely also add it to the senses code, so we have more chances of seeing the problem.)

Change 993671 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@master] Log more information on LexemePatcher errors

https://gerrit.wikimedia.org/r/993671

Change 993671 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Log more information on LexemePatcher errors

https://gerrit.wikimedia.org/r/993671

waiting for deployment now, I guess, then we’ll see if we see any more detailed error messages in logstash.

Change 993503 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@wmf/1.42.0-wmf.15] Log more information on LexemePatcher errors

https://gerrit.wikimedia.org/r/993503

Looking at logstash, the sense version of this error seems to be somewhat more common (which is annoying since we only added logging to the form version), but there’s still a couple of occurrences in logstash, such as this one (reqId:"4c2aaeea-5a28-4d57-afa8-47f447afed52"; @timestamp Dec 26, 2023 @ 18:11:25.115).

I found the request in Hive:

SELECT params FROM event.mediawiki_api_request WHERE year = 2023 AND month = 12 AND day = 26 AND hour = 18 AND meta.request_id = '4c2aaeea-5a28-4d57-afa8-47f447afed52' LIMIT 1;

The important params are:

  • action=wbsetclaim – this is not the request that deleted the form
  • claim=[json]
    • property=P11328Dehkhoda ID
    • id=L679512$22230cb5-4146-1d48-b3eb-1a01958e0b16 – this statement does not exist: the currently latest revision, whose timestamp matches the logstash timestamp, has a Dehkhoda ID with a different statement ID, L679512$06f7e04d-45b9-440f-5b29-d13c0a3d26c0; none of the old revisions of this lexeme, nor of the one that was merged into it, ever had a statement with this statement ID (I checked)
  • baserevid=2037662725a relatively old revision within the group of edits by the same user; specifically, this is the revision where another lexeme was merged into this one, and before L679512-F3 was removed

I’m quite confused about the statement ID – apparently, this request ended up creating a new statement, but why was the ID from the claim JSON not used for the new statement?

Looking at logstash, the sense version of this error seems to be somewhat more common (which is annoying since we only added logging to the form version), but there’s still a couple of occurrences in logstash, such as this one (reqId:"4c2aaeea-5a28-4d57-afa8-47f447afed52"; @timestamp Dec 26, 2023 @ 18:11:25.115).

Stack trace for this one, by the way:

from /srv/mediawiki/php-1.42.0-wmf.10/extensions/WikibaseLexeme/src/Domain/Model/Lexeme.php(310)
#0 /srv/mediawiki/php-1.42.0-wmf.10/extensions/WikibaseLexeme/src/Domain/Diff/LexemePatcher.php(176): Wikibase\Lexeme\Domain\Model\Lexeme->getForm(Wikibase\Lexeme\Domain\Model\FormId)
#1 /srv/mediawiki/php-1.42.0-wmf.10/extensions/WikibaseLexeme/src/Domain/Diff/LexemePatcher.php(93): Wikibase\Lexeme\Domain\Diff\LexemePatcher->patchForms(Wikibase\Lexeme\Domain\Model\Lexeme, Wikibase\Lexeme\Domain\Diff\LexemeDiff)
#2 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/lib/packages/wikibase/data-model-services/src/Diff/EntityPatcher.php(40): Wikibase\Lexeme\Domain\Diff\LexemePatcher->patchEntity(Wikibase\Lexeme\Domain\Model\Lexeme, Wikibase\Lexeme\Domain\Diff\LexemeDiff)
#3 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/repo/includes/EditEntity/MediaWikiEditEntity.php(455): Wikibase\DataModel\Services\Diff\EntityPatcher->patchEntity(Wikibase\Lexeme\Domain\Model\Lexeme, Wikibase\Lexeme\Domain\Diff\LexemeDiff)
#4 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/repo/includes/EditEntity/MediaWikiEditEntity.php(720): Wikibase\Repo\EditEntity\MediaWikiEditEntity->fixEditConflict(Wikibase\Lexeme\Domain\Model\Lexeme)
#5 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/repo/includes/EditEntity/StatsdSaveTimeRecordingEditEntity.php(77): Wikibase\Repo\EditEntity\MediaWikiEditEntity->attemptSave(Wikibase\Lexeme\Domain\Model\Lexeme, string, integer, string, boolean, array)
#6 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/repo/includes/Api/EntitySavingHelper.php(394): Wikibase\Repo\EditEntity\StatsdSaveTimeRecordingEditEntity->attemptSave(Wikibase\Lexeme\Domain\Model\Lexeme, string, integer, string, NULL, array)
#7 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/repo/includes/Api/SetClaim.php(215): Wikibase\Repo\Api\EntitySavingHelper->attemptSaveEntity(Wikibase\Lexeme\Domain\Model\Lexeme, string, array, DerivativeContext)
#8 /srv/mediawiki/php-1.42.0-wmf.10/extensions/Wikibase/repo/includes/Api/SetClaim.php(163): Wikibase\Repo\Api\SetClaim->executeInternal()
#9 /srv/mediawiki/php-1.42.0-wmf.10/includes/api/ApiMain.php(1931): Wikibase\Repo\Api\SetClaim->execute()
#10 /srv/mediawiki/php-1.42.0-wmf.10/includes/api/ApiMain.php(908): ApiMain->executeAction()
#11 /srv/mediawiki/php-1.42.0-wmf.10/includes/api/ApiMain.php(879): ApiMain->executeActionWithErrorHandling()
#12 /srv/mediawiki/php-1.42.0-wmf.10/api.php(95): ApiMain->execute()
#13 /srv/mediawiki/php-1.42.0-wmf.10/api.php(48): wfApiMain()
#14 /srv/mediawiki/w/api.php(3): require(string)
#15 {main}

None of the lines above the general API framework have any relevant catch clauses, as far as I can tell, and indeed Hive reports that the error failed with an internal_api_error_OutOfRangeException.

hive (default)> SELECT api_error_codes FROM event.mediawiki_api_request WHERE year = 2023 AND month = 12 AND day = 26 AND hour = 18 AND meta.request_id = '4c2aaeea-5a28-4d57-afa8-47f447afed52' LIMIT 1;
["internal_api_error_OutOfRangeException"]

So I have no idea how the edit was still made, unless we… transparently retry requests in JS?

So I have no idea how the edit was still made, unless we… transparently retry requests in JS?

Nope, locally it doesn’t look like we do that.

Let’s wait for the logging to be deployed and see if it enlightens us at all, I guess…

Change 993503 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/WikibaseLexeme@wmf/1.42.0-wmf.15] Log more information on LexemePatcher errors

Reason:

not needed anymore I think, this can just roll out with the train

https://gerrit.wikimedia.org/r/993503

So I have no idea how the edit was still made, unless we… transparently retry requests in JS?

Nope, locally it doesn’t look like we do that.

There are in fact three wbsetclaim API requests for L679512 and P11328 in Hive (coming from the same IP as the one found earlier): one at 2023-12-26T18:11:20Z, one at 2023-12-26T18:11:25Z, one at 2023-12-26T18:11:37Z. (I.e., five seconds later and then twelve seconds after that.) The first two requests have the same baserevid (2037662725), the third has a newer one (2037664634). I’m guessing the user saw the error (i.e. it was in fact uncaught), tried again, got the error again, then reloaded the page and tried once more, at which point it worked and the statement was added (because it now used a newer baserevid). So that at least explains the mystery of how an edit was still made. (The third request also specified the statement ID that was mentioned above as the one the lexeme actually has, L679512$06f7e04d-45b9-440f-5b29-d13c0a3d26c0.)

(Found by querying Hive for SELECT meta.dt, params FROM event.mediawiki_api_request WHERE year = 2023 AND month = 12 AND day = 26 AND hour = 18 AND http.client_ip = '[redacted]' AND http.method = 'POST' and then grepping the results manually for L679512 and P11328 because I didn’t know how to filter on the params in Hive.)

Apparently the error hasn’t reoccurred since 22 January, so our new logging hasn’t logged anything yet. Let’s move it to In Task Breakdown while we wait.

And a quick recap from memory: The error seems to happen when resolving an edit conflict, where we somehow get an empty diff for a form or sense that was removed in the meantime. It’s unclear where the empty diff comes from, and we’re hoping that the added logging will tell us more. It should be possible to find occurrences in Logstash (mediawiki-errors) by searching for just OutOfRangeException, or OutOfRangeException AND Wikibase if any unrelated OORE shows up in the meantime.

2024-02-14 update: still no new occurrences in logstash.

2024-02-28 update: still nothing.

2024-04-04 update: still nothing. In fact at this point there are only five matches for “OutOfRangeException AND Wikibase” in Logstash left (four on 2024-01-07, one on 2024-01-22) – the other ones presumably expired or something like that. This is intensely frustrating ^^