Page MenuHomePhabricator

DBReplication warning: expected but failed to find position index {cpPosIndex}
Closed, DuplicatePublic5 Estimated Story PointsPRODUCTION ERROR

Description

Since approximately 2022-09-06 03:00, errors regarding replication have appeared on db-related logs (in particular, this is Chronology Protector and found on logs for DBReplication dashboard: https://logstash.wikimedia.org/goto/a822d07342bb4833ada55f3501ae1f67

seems to have started with deployment of php-1.39.0-wmf.28, continued on .29 .

It seems related to the train and not to any database operation, as it is happening for all wikis with traffic-relative frequency.

Example error ( https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2022.09.13?id=kJ9RNoMBnIkU__s9I4CZ ):

{
  "_index": "logstash-mediawiki-1-7.0.0-1-2022.09.13",
  "_type": "_doc",
  "_id": "kJ9RNoMBnIkU__s9I4CZ",
  "_score": null,
  "_source": {
    "exception": {
      "message": "",
      "code": 0,
      "trace": "from /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/ChronologyProtector.php(455)\n#0 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/ChronologyProtector.php(404): Wikimedia\\Rdbms\\ChronologyProtector->lazyStartup()\n#1 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/ChronologyProtector.php(263): Wikimedia\\Rdbms\\ChronologyProtector->getStartupSessionPositions()\n#2 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/lbfactory/LBFactory.php(761): Wikimedia\\Rdbms\\ChronologyProtector->applySessionReplicationPosition(Wikimedia\\Rdbms\\LoadBalancer)\n#3 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1454): Wikimedia\\Rdbms\\LBFactory->Wikimedia\\Rdbms\\{closure}(Wikimedia\\Rdbms\\LoadBalancer)\n#4 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1394): Wikimedia\\Rdbms\\LoadBalancer->lazyLoadReplicationPositions()\n#5 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1262): Wikimedia\\Rdbms\\LoadBalancer->reallyOpenConnection(integer, Wikimedia\\Rdbms\\DatabaseDomain, array)\n#6 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/loadbalancer/LoadBalancer.php(968): Wikimedia\\Rdbms\\LoadBalancer->getForeignConnection(integer, string, integer)\n#7 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/loadbalancer/LoadBalancer.php(944): Wikimedia\\Rdbms\\LoadBalancer->getServerConnection(integer, string, integer)\n#8 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/database/DBConnRef.php(95): Wikimedia\\Rdbms\\LoadBalancer->getConnectionInternal(integer, array, string, integer)\n#9 /srv/mediawiki/php-1.39.0-wmf.28/includes/objectcache/SqlBagOStuff.php(1689): Wikimedia\\Rdbms\\DBConnRef->ensureConnection()\n#10 /srv/mediawiki/php-1.39.0-wmf.28/includes/objectcache/SqlBagOStuff.php(403): SqlBagOStuff->getConnectionViaLoadBalancer()\n#11 /srv/mediawiki/php-1.39.0-wmf.28/includes/objectcache/SqlBagOStuff.php(574): SqlBagOStuff->getConnection(integer)\n#12 /srv/mediawiki/php-1.39.0-wmf.28/includes/objectcache/SqlBagOStuff.php(213): SqlBagOStuff->modifyBlobs(array, double, array, integer)\n#13 /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(177): SqlBagOStuff->doSet(string, array, integer, integer)\n#14 /srv/mediawiki/php-1.39.0-wmf.28/extensions/ConfirmEdit/includes/Store/CaptchaCacheStore.php(28): MediumSpecificBagOStuff->set(string, array, integer)\n#15 /srv/mediawiki/php-1.39.0-wmf.28/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php(1139): MediaWiki\\Extension\\ConfirmEdit\\Store\\CaptchaCacheStore->store(string, array)\n#16 /srv/mediawiki/php-1.39.0-wmf.28/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php(1226): MediaWiki\\Extension\\ConfirmEdit\\SimpleCaptcha\\SimpleCaptcha->storeCaptcha(array)\n#17 /srv/mediawiki/php-1.39.0-wmf.28/extensions/ConfirmEdit/includes/Auth/CaptchaPreAuthenticationProvider.php(76): MediaWiki\\Extension\\ConfirmEdit\\SimpleCaptcha\\SimpleCaptcha->createAuthenticationRequest()\n#18 /srv/mediawiki/php-1.39.0-wmf.28/includes/auth/AuthManager.php(2275): MediaWiki\\Extension\\ConfirmEdit\\Auth\\CaptchaPreAuthenticationProvider->getAuthenticationRequests(string, array)\n#19 /srv/mediawiki/php-1.39.0-wmf.28/includes/auth/AuthManager.php(2253): MediaWiki\\Auth\\AuthManager->getAuthenticationRequestsInternal(string, array, array, User)\n#20 /srv/mediawiki/php-1.39.0-wmf.28/includes/specialpage/AuthManagerSpecialPage.php(275): MediaWiki\\Auth\\AuthManager->getAuthenticationRequests(string, User)\n#21 /srv/mediawiki/php-1.39.0-wmf.28/includes/specialpage/LoginSignupSpecialPage.php(144): AuthManagerSpecialPage->loadAuth(NULL)\n#22 /srv/mediawiki/php-1.39.0-wmf.28/includes/specialpage/LoginSignupSpecialPage.php(235): LoginSignupSpecialPage->load(NULL)\n#23 /srv/mediawiki/php-1.39.0-wmf.28/includes/specialpage/SpecialPage.php(701): LoginSignupSpecialPage->execute(NULL)\n#24 /srv/mediawiki/php-1.39.0-wmf.28/includes/specialpage/SpecialPageFactory.php(1428): SpecialPage->run(NULL)\n#25 /srv/mediawiki/php-1.39.0-wmf.28/includes/MediaWiki.php(316): MediaWiki\\SpecialPage\\SpecialPageFactory->executePath(string, RequestContext)\n#26 /srv/mediawiki/php-1.39.0-wmf.28/includes/MediaWiki.php(904): MediaWiki->performRequest()\n#27 /srv/mediawiki/php-1.39.0-wmf.28/includes/MediaWiki.php(562): MediaWiki->main()\n#28 /srv/mediawiki/php-1.39.0-wmf.28/index.php(50): MediaWiki->run()\n#29 /srv/mediawiki/php-1.39.0-wmf.28/index.php(46): wfIndexMain()\n#30 /srv/mediawiki/w/index.php(3): require(string)\n#31 {main}",
      "class": "RuntimeException",
      "file": "/srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/ChronologyProtector.php:455"
    },
    "cpPosIndex": 1,
    "channel": "DBReplication",
    "monolog_level": 300,
    "logsource": "mw1387",
    "referrer": "https://sv.wikibooks.org/w/index.php?title=Special:Inloggning&returnto=Kokboken",
    "indexReached": null,
    "url": "/w/index.php?returnto=Kokboken&title=Special:Inloggning",
    "server": "sv.wikibooks.org",
    "reqId": "9a23f646-8c48-4e0b-a2dc-24ec0f5aaba7",
    "servergroup": "appserver",
    "severity": "warning",
    "timestamp": "2022-09-13T12:07:09+00:00",
    "message": "expected but failed to find position index 1",
    "@timestamp": "2022-09-13T10:07:09.570Z",
    "host": "mw1387",
    "clientIP": "[snipped]",
    "tags": [
      "input-kafka-rsyslog-udp-localhost",
      "rsyslog-udp-localhost",
      "kafka",
      "es",
      "es"
    ],
    "mwversion": "1.39.0-wmf.28",
    "clientAgent": "[snipped]",
    "clientId": "b79c6821caa305839a9a72ab511ce3d4",
    "shard": "s3",
    "wiki": "svwikibooks",
    "type": "mediawiki",
    "program": "mediawiki",
    "@version": 1,
    "http_method": "POST",
    "phpversion": "7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf5",
    "facility": "user",
    "normalized_message": "expected but failed to find position index {cpPosIndex}",
    "level": "WARNING"
  },
  "fields": {
    "@timestamp": [
      "2022-09-13T10:07:09.570Z"
    ]
  },
  "sort": [
    1663063629570
  ]
}

Could it be multi-dc related? E.g. searching for local writes on the other master? It happens more on codfw, but it happens on eqiad, too. Probably a minor issue, but reporting it mostly to discard user impact.

Details

MediaWiki Version
php-1.39.0-wmf.28
Request URL
https://sv.wikibooks.org/w/index.php?returnto=Kokboken&title=Special:Inloggning

Event Timeline

jcrespo triaged this task as Medium priority.Sep 13 2022, 10:21 AM
jcrespo created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
jcrespo changed Release Version from php-1.39.0-wmf.27 to php-1.39.0-wmf.28.
jcrespo updated the task description. (Show Details)
jcrespo raised the priority of this task from Medium to Needs Triage.Sep 13 2022, 12:35 PM

Changing to untriaged- it was inherited from its parent. I think this is probably low, but leaving to owners to decide.

Krinkle renamed this task from Exception warning: /srv/mediawiki/php-1.39.0-wmf.28/includes/libs/rdbms/ChronologyProtector.php:455: expected but failed to find position index {cpPosIndex} to DBReplication warning: expected but failed to find position index {cpPosIndex}.Sep 14 2022, 9:00 AM

Exceptions are logged to the "exception" channel. This is not a production error. Diagnostic messages in other channels are by themselves not production errors. We sometimes add stacktraces to log messages by creating an unnamed exception object and appending it. These are not actual exceptions.

This log warning being very common is indicative of a train-blocking regression however, but it needs a story as for what the impact is. The story is that the Pos object stored in memcached is failing to unserialise, and as result we are not doing wait-for replication but just picking a replica as-is.

This has another side-effect, which happens first and has a task already: T317606: PHP Notice: Undefined index: asOfTime

Shouldn't this be merged into task if they have the same origin?