Page MenuHomePhabricator

If replica is lagged, an attempt is made to insert rows which violate unique index constraint on cu_useragent_clienthints
Closed, ResolvedPublic

Description

Steps to re-produce:

  1. Use MediaWiki docker - https://www.mediawiki.org/wiki/MediaWiki-Docker
  2. Follow the steps in https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/Alternative_databases for mariadb with replication before running docker compose up -d
  3. Make an edit using a non-chromium based browser (such as Firefox).
  4. Make an edit using Chrome with Network tab open.
  5. Right click on the REST API call and copy the request as "fetch".
  6. Open a terminal for the "mariadb-replica" container
  7. Login to mysql using the user root and the password main_root_password
  8. Run the following (based on https://mariadb.com/kb/en/delayed-replication/):
STOP SLAVE;
CHANGE MASTER TO MASTER_DELAY=3600;
START SLAVE;
  1. Load http://localhost:8080/w/api.php?action=query&format=json&meta=siteinfo&formatversion=2&siprop=dbrepllag
  2. Check that "lag" is set to "3600". If it is, then your replica is lagged by 60 minutes.
  3. Open the console for Firefox
  4. Paste the "fetch" command and modify the end of the URL to use the revision ID for the edit made in the Firefox browser

To finish, follow the following commands:

  1. Open a terminal for the "mariadb-replica" container
  2. Login to mysql using the user root and the password main_root_password
  3. Run the following:
STOP SLAVE;
CHANGE MASTER TO MASTER_DELAY=0;
START SLAVE;
  1. Load http://localhost:8080/w/api.php?action=query&format=json&meta=siteinfo&formatversion=2&siprop=dbrepllag
  2. Check that "lag" is set to "0". If it is, then your replica is no longer lagged.

Because the check for existing Client Hints data rows is done on a replica DB, if this replica DB is lagged enough the replica DB could report no such Client Hints row. This means that the UserAgentClientHintsManager will attempt to insert with rows that would violate the unique constraint via the index uach_name_value. This will normally cause the insert query to fail. However, there may be rows that need inserting.

Adding the IGNORE option to the insert query should address the issue by ignoring rows that violate the unique constraint on combinations of uach_name and uach_value.

Event Timeline

Dreamy_Jazz renamed this task from If replica is lagged attempt is made to insert rows which violate unique index constraint on cu_useragent_clienthints to If replica is lagged, an attempt is made to insert rows which violate unique index constraint on cu_useragent_clienthints.Jul 26 2023, 11:16 PM

Change 942031 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Use IGNORE option when inserting into cu_useragent_clienthints

https://gerrit.wikimedia.org/r/942031

Steps to re-produce:

  1. Use a DB with replication
  2. Make the replica DB(s) lag

Just wondered whether you did this locally and if so whether you could link to any helpful docs?

I didn't try to get this to actually happen, but it was based on a inspection of the code to solve T342828: Purging orphaned rows in cu_useragent_clienthints while inserting a mapping row can cause a race condition.

I saw that the exclusion of data in UserAgentClientHintsManager::excludeExistingClientHintData occurs on a replica, but the insert happens on primary.

Because I was interested to see if I could simulate the lag, I've found a configuration for mariadb that allows lagging the replicas. I've added this to the task description.

Applying this I can get the error:

{
  "message": "Error: exception of type Wikimedia\\Rdbms\\DBQueryError: Error 1062: Duplicate entry 'architecture-x86' for key 'uach_name_value'\nFunction: MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager::insertClientHintValues\nQuery: INSERT INTO `cu_useragent_clienthints` (uach_name,uach_value) VALUES ('architecture','x86'),('bitness','64'),('brands','Not/A)Brand 99'),('brands','Google Chrome 115'),('brands','Chromium 115'),('fullVersionList','Not/A)Brand 99.0.0.0'),('fullVersionList','Google Chrome 115.0.5790.110'),('fullVersionList','Chromium 115.0.5790.110'),('mobile','0'),('platform','Windows'),('platformVersion','15.0.0')\n",
  "exception": {
    "id": "34480cc13e87da396b3a1884",
    "type": "Wikimedia\\Rdbms\\DBQueryError",
    "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
    "line": 1296,
    "message": "Error 1062: Duplicate entry 'architecture-x86' for key 'uach_name_value'\nFunction: MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager::insertClientHintValues\nQuery: INSERT INTO `cu_useragent_clienthints` (uach_name,uach_value) VALUES ('architecture','x86'),('bitness','64'),('brands','Not/A)Brand 99'),('brands','Google Chrome 115'),('brands','Chromium 115'),('fullVersionList','Not/A)Brand 99.0.0.0'),('fullVersionList','Google Chrome 115.0.5790.110'),('fullVersionList','Chromium 115.0.5790.110'),('mobile','0'),('platform','Windows'),('platformVersion','15.0.0')\n",
    "code": 0,
    "url": "/w/rest.php/checkuser/v0/useragent-clienthints/revision/414",
    "caught_by": "other",
    "backtrace": [
      {
        "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
        "line": 1280,
        "function": "getQueryException",
        "class": "Wikimedia\\Rdbms\\Database",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
        "line": 1254,
        "function": "getQueryExceptionAndLog",
        "class": "Wikimedia\\Rdbms\\Database",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
        "line": 739,
        "function": "reportQueryError",
        "class": "Wikimedia\\Rdbms\\Database",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
        "line": 1560,
        "function": "query",
        "class": "Wikimedia\\Rdbms\\Database",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/libs/rdbms/database/DBConnRef.php",
        "line": 119,
        "function": "insert",
        "class": "Wikimedia\\Rdbms\\Database",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/libs/rdbms/database/DBConnRef.php",
        "line": 397,
        "function": "__call",
        "class": "Wikimedia\\Rdbms\\DBConnRef",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/extensions/CheckUser/src/Services/UserAgentClientHintsManager.php",
        "line": 99,
        "function": "insert",
        "class": "Wikimedia\\Rdbms\\DBConnRef",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/extensions/CheckUser/src/Api/Rest/Handler/UserAgentClientHintsHandler.php",
        "line": 57,
        "function": "insertClientHintValues",
        "class": "MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/Rest/SimpleHandler.php",
        "line": 38,
        "function": "run",
        "class": "MediaWiki\\CheckUser\\Api\\Rest\\Handler\\UserAgentClientHintsHandler",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/Rest/Router.php",
        "line": 517,
        "function": "execute",
        "class": "MediaWiki\\Rest\\SimpleHandler",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/Rest/Router.php",
        "line": 422,
        "function": "executeHandler",
        "class": "MediaWiki\\Rest\\Router",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/Rest/EntryPoint.php",
        "line": 195,
        "function": "execute",
        "class": "MediaWiki\\Rest\\Router",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/includes/Rest/EntryPoint.php",
        "line": 135,
        "function": "execute",
        "class": "MediaWiki\\Rest\\EntryPoint",
        "type": "->"
      },
      {
        "file": "/var/www/html/w/rest.php",
        "line": 31,
        "function": "main",
        "class": "MediaWiki\\Rest\\EntryPoint",
        "type": "::"
      }
    ]
  },
  "httpCode": 500,
  "httpReason": "Internal Server Error"
}

With the patch this does not error out.

Dreamy_Jazz updated the task description. (Show Details)
Dreamy_Jazz updated the task description. (Show Details)

Change 942031 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Use IGNORE option when inserting into cu_useragent_clienthints

https://gerrit.wikimedia.org/r/942031

@Dreamy_Jazz I can get a similar exception for the cu_useragent_clienthints_map if I attempt to submit client hints data a second time:

{
  "message": "Error: exception of type Wikimedia\\Rdbms\\DBQueryError: Error 1062: Duplicate entry '9-0-14' for key 'PRIMARY'\nFunction: MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager::insertMappingRows\nQuery: INSERT INTO `cu_useragent_clienthints_map` (uachm_uach_id,uachm_reference_type,uachm_reference_id) VALUES (9,0,14),(10,0,14),(11,0,14),(12,0,14),(13,0,14),(14,0,14),(15,0,14),(16,0,14)\n",
  "exception": {
    "id": "8a27b7d90b138daaaf9fd713",
    "type": "Wikimedia\\Rdbms\\DBQueryError",
    "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
    "line": 1236,
    "message": "Error 1062: Duplicate entry '9-0-14' for key 'PRIMARY'\nFunction: MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager::insertMappingRows\nQuery: INSERT INTO `cu_useragent_clienthints_map` (uachm_uach_id,uachm_reference_type,uachm_reference_id) VALUES (9,0,14),(10,0,14),(11,0,14),(12,0,14),(13,0,14),(14,0,14),(15,0,14),(16,0,14)\n",
    "code": 0,
    "url": "/w/rest.php/checkuser/v0/useragent-clienthints/revision/14",
    "caught_by": "other",
...

My reproduction steps:

  1. Setup replica DBs like you describe
  2. Make edit on Firefox[1]
  3. In mariadb-replica run:
STOP SLAVE;
CHANGE MASTER TO MASTER_DELAY=3600;
START SLAVE;
  1. Make the curl/fetch request with the revision ID from step 2 (this will be successful, and will enter data only into the mariadb-main database)
  2. Repeat the curl/fetch request (this will return the above exception, because mariadb-replica won't contain the up-to-date rows in cu_useragent_clienthints_map)

Notes

  1. I have to make the edit before setting the replica delay as otherwise the request to /rest.php/checkuser/v0/useragent-clienthints/revision/$id will return The specified revision ($id) does not exist. I have to do the curl/fetch request after setting the delay as otherwise there will be the appropriate entries in cu_useragent_clienthints_* on the replica database. I also find that, after setting a delay, Chromium won't send the client hints data after editing. I haven't investigated exactly why.

...I also find that, after setting a delay, Chromium won't send the client hints data after editing. I haven't investigated exactly why.

Setting MASTER_DELAY=10 I find that, after editing, the request will not send straight away. But if I wait ~10 seconds and refresh the page it will. It is also at this point I see the "Your edit has been saved" toast/popup. I assume that is how postHook works.

@Dreamy_Jazz I can get a similar exception for the cu_useragent_clienthints_map if I attempt to submit client hints data a second time:

{
  "message": "Error: exception of type Wikimedia\\Rdbms\\DBQueryError: Error 1062: Duplicate entry '9-0-14' for key 'PRIMARY'\nFunction: MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager::insertMappingRows\nQuery: INSERT INTO `cu_useragent_clienthints_map` (uachm_uach_id,uachm_reference_type,uachm_reference_id) VALUES (9,0,14),(10,0,14),(11,0,14),(12,0,14),(13,0,14),(14,0,14),(15,0,14),(16,0,14)\n",
  "exception": {
    "id": "8a27b7d90b138daaaf9fd713",
    "type": "Wikimedia\\Rdbms\\DBQueryError",
    "file": "/var/www/html/w/includes/libs/rdbms/database/Database.php",
    "line": 1236,
    "message": "Error 1062: Duplicate entry '9-0-14' for key 'PRIMARY'\nFunction: MediaWiki\\CheckUser\\Services\\UserAgentClientHintsManager::insertMappingRows\nQuery: INSERT INTO `cu_useragent_clienthints_map` (uachm_uach_id,uachm_reference_type,uachm_reference_id) VALUES (9,0,14),(10,0,14),(11,0,14),(12,0,14),(13,0,14),(14,0,14),(15,0,14),(16,0,14)\n",
    "code": 0,
    "url": "/w/rest.php/checkuser/v0/useragent-clienthints/revision/14",
    "caught_by": "other",
...

My reproduction steps:

  1. Setup replica DBs like you describe
  2. Make edit on Firefox[1]
  3. In mariadb-replica run:
STOP SLAVE;
CHANGE MASTER TO MASTER_DELAY=3600;
START SLAVE;
  1. Make the curl/fetch request with the revision ID from step 2 (this will be successful, and will enter data only into the mariadb-main database)
  2. Repeat the curl/fetch request (this will return the above exception, because mariadb-replica won't contain the up-to-date rows in cu_useragent_clienthints_map)

Thanks. I will investigate this further.

Notes

  1. I have to make the edit before setting the replica delay as otherwise the request to /rest.php/checkuser/v0/useragent-clienthints/revision/$id will return The specified revision ($id) does not exist. I have to do the curl/fetch request after setting the delay as otherwise there will be the appropriate entries in cu_useragent_clienthints_* on the replica database. I also find that, after setting a delay, Chromium won't send the client hints data after editing. I haven't investigated exactly why.

Thanks for noticing that. I think in my testing I mocked the response of the revision existing check which is why the steps I followed were not exactly as I wrote.

...I also find that, after setting a delay, Chromium won't send the client hints data after editing. I haven't investigated exactly why.

Setting MASTER_DELAY=10 I find that, after editing, the request will not send straight away. But if I wait ~10 seconds and refresh the page it will. It is also at this point I see the "Your edit has been saved" toast/popup. I assume that is how postHook works.

This is what I saw too. I presumed that the postEdit hook only ran once the revision had been saved to the replica DBs. This in theory could be a bit of an issue (in that the client hints data could not be sent because the user did not refresh the page), but the replicas being lagged is something that happens very rarely. Plus, if it was enough for this to occur the lag would likely be fixed quickly (which limits how much data would have been lost).

Setting MASTER_DELAY=10 I find that, after editing, the request will not send straight away. But if I wait ~10 seconds and refresh the page it will. It is also at this point I see the "Your edit has been saved" toast/popup. I assume that is how postHook works.

This is what I saw too. I presumed that the postEdit hook only ran once the revision had been saved to the replica DBs. This in theory could be a bit of an issue (in that the client hints data could not be sent because the user did not refresh the page), but the replicas being lagged is something that happens very rarely. Plus, if it was enough for this to occur the lag would likely be fixed quickly (which limits how much data would have been lost).

I also think if, before refreshing the page, another user makes an edit to the page the request is not sent. I don't see a toast notification either, so I assume the postEdit hook is not fired. I don't know if that is a bug in postEdit or as designed.

  1. Set replication lag to 5 seconds (say)
  2. Make an edit on Chromium
  3. Quickly make an edit to the same page with Firefox
  4. Wait ~5 seconds
  5. Refresh Chromium

@Dreamy_Jazz I can get a similar exception for the cu_useragent_clienthints_map if I attempt to submit client hints data a second time:

I raised this as T345144.

I also think if, before refreshing the page, another user makes an edit to the page the request is not sent. I don't see a toast notification either, so I assume the postEdit hook is not fired. I don't know if that is a bug in postEdit or as designed.

I may raise this separately as a bug in postEdit.

@Dreamy_Jazz I can get a similar exception for the cu_useragent_clienthints_map if I attempt to submit client hints data a second time:

I raised this as T345144.

Thanks. I had forgot to follow-up on this.

I also think if, before refreshing the page, another user makes an edit to the page the request is not sent. I don't see a toast notification either, so I assume the postEdit hook is not fired. I don't know if that is a bug in postEdit or as designed.

I may raise this separately as a bug in postEdit.

I think this may be due to postEdit not being fired until the revision ID can be updated. Because the replica is lagged, the revision ID is not available on the replica until 10 seconds later. If this revision ID is being read from a replica, then the postEdit hook may not be run until the replicas catch up. This could be fixed by telling the server to read from primary after an edit.