Page MenuHomePhabricator

Determine cause of historical UIDs colliding
Open, HighPublic

Description

The UNIX time in seconds + milliseconds is used as a key part of generating our UUIDs. For current UUIDs, we use the actual such time (from microtime).

However, for imports we don't have access to millisecond-level (database timestamps in MW don't include milliseconds). So we randomly generate the millisecond part of the timestamp. There can be collisions, which may be more likely if there are multiple events at the same second (quite possible with normal activity, but also bots).

I've been investigating this, and so far found https://gerrit.wikimedia.org/r/#/c/251265/ and https://gerrit.wikimedia.org/r/#/c/251256/ . The Flow one is merged.

The difficulty here is that there should be ~52 bits of entropy (32-bit random node ID changed each call plus log2(999) random millisecond bits plus 10-bit counter), *not* counting the second level. (Counter is randomly initialized, then incremented).

Will add calculations of expected number in a second before collisions, then script testing actual behavior.

We don't use the same random number generator for all these calls, but not sure that is a problem (the real issue is whether there is some artifact).

Event Timeline

Mattflaschen-WMF added a comment.EditedNov 5 2015, 10:41 PM

If there are 52 free bits, there should be 2^52 possibilities. Assume those are random (not entirely true since one of them starts random and is incremented).

Then using the equation from https://en.wikipedia.org/w/index.php?title=Birthday_problem&oldid=683536746#Approximation_of_number_of_people ,

to have a 50% chance of a collision, there would need to be:

http://www.wolframalpha.com/input/?i=%281+%2F+2+%29+%2B+sqrt%28+1+%2F+4+%2B+2+*+ln%28+2+%29+*+2^52+%29

7.9 × 10^7 generated.

Stay tuned for how many it actually takes.

The following script is running on terbium. So far it's at 653000.

1<?php
2
3require_once ( getenv( 'MW_INSTALL_PATH' ) !== false
4 ? getenv( 'MW_INSTALL_PATH' ) . '/maintenance/Maintenance.php'
5 : dirname( __FILE__ ) . '/../../../maintenance/Maintenance.php' );
6
7class CollisionScript extends Maintenance {
8 public function execute() {
9 $timestamp = wfTimestamp();
10 $usedUUIDs = array();
11 $i = 0;
12 do {
13 $uuid = Flow\Import\HistoricalUIDGenerator::historicalTimestampedUID88( $timestamp );
14 $duplicate = isset( $usedUUIDs[$uuid] );
15 $usedUUIDs[$uuid] = true;
16 $i++;
17 if ( $i % 1000 === 0 ) {
18 echo "$i UIDs generated.\n";
19 }
20 } while( !$duplicate );
21
22 echo "Duplicate found after $i iterations\n";
23 }
24}
25
26$maintClass = 'CollisionScript';
27require_once RUN_MAINTENANCE_IF_MAIN;

I stopped it at 1300000. If it's not generating duplicate UIDs, something may be causing it to actually try to insert the same entities twice.

I also got:

vagrant@mediawiki-vagrant:~$ mwscript extensions/Flow/maintenance/importDump.php --wiki=wiki /vagrant/full_Flow_wiki_dump_all_history_abef2046e70b4440d11078f238e3cf9eb03bdfb8.xml
A database query error has occurred.
Query: INSERT  INTO `flow_workflow` (workflow_id,workflow_type,workflow_wiki,workflow_page_id,workflow_namespace,workflow_title_text,workflow_lock_state,workflow_last_update_timestamp,workflow_name) VALUES ('4GE�����B','topic','wiki','112','3','New_board_2015-05-01','0','20150501233052','')
Function: Flow\Data\Storage\BasicDbStorage::insert (flow_workflow)
Error: 1062 Duplicate entry '\x054GE\xFF\xD2\x1F\xE9\x93\xE2B' for key 'PRIMARY' (127.0.0.1)

Backtrace:
#0 /vagrant/mediawiki/includes/db/Database.php(1047): DatabaseBase->reportQueryError('Duplicate entry...', 1062, 'INSERT  INTO `f...', 'Flow\Data\Stora...', false)
#1 /vagrant/mediawiki/includes/db/Database.php(1935): DatabaseBase->query('INSERT  INTO `f...', 'Flow\Data\Stora...')
#2 /vagrant/mediawiki/extensions/Flow/includes/Data/Storage/BasicDbStorage.php(68): DatabaseBase->insert('flow_workflow', Array, 'Flow\Data\Stora...')
#3 /vagrant/mediawiki/extensions/Flow/includes/Data/ObjectManager.php(245): Flow\Data\Storage\BasicDbStorage->insert(Array)
#4 /vagrant/mediawiki/extensions/Flow/includes/Data/ObjectManager.php(170): Flow\Data\ObjectManager->insert(Array, Array)
#5 /vagrant/mediawiki/extensions/Flow/includes/Data/ObjectManager.php(143): Flow\Data\ObjectManager->multiPut(Array, Array)
#6 /vagrant/mediawiki/extensions/Flow/includes/Data/ManagerGroup.php(80): Flow\Data\ObjectManager->put(Object(Flow\Model\Workflow), Array)
#7 /vagrant/mediawiki/extensions/Flow/includes/Dump/Importer.php(121): Flow\Data\ManagerGroup->put(Object(Flow\Model\Workflow), Array)
#8 /vagrant/mediawiki/extensions/Flow/includes/Dump/Importer.php(188): Flow\Dump\Importer->put(Object(Flow\Model\Workflow))
#9 /vagrant/mediawiki/extensions/Flow/includes/Dump/Importer.php(75): Flow\Dump\Importer->handleTopic()
#10 /vagrant/mediawiki/extensions/Flow/maintenance/importDump.php(97): Flow\Dump\Importer->doImport()
#11 /vagrant/mediawiki/extensions/Flow/maintenance/importDump.php(73): FlowBackupReader->importFromHandle(Resource id #313)
#12 /vagrant/mediawiki/extensions/Flow/maintenance/importDump.php(54): FlowBackupReader->importFromFile('/vagrant/full_F...')
#13 /vagrant/mediawiki/maintenance/doMaintenance.php(103): FlowBackupReader->execute()
#14 /vagrant/mediawiki/extensions/Flow/maintenance/importDump.php(102): require_once('/vagrant/mediaw...')
#15 /var/www/w/MWScript.php(89): require_once('/vagrant/mediaw...')
#16 {main}

when testing full-wiki import (this is supposed to use IDs from the XML file, not generate them).

That supports the idea that it's not really the generation that is an issue. But it could be something weird with the state of the wiki (my Vagrant test wiki) it was exported from, or a bug in export/import.

The problem with the import may have been unrelated. Since that comment, I've found a couple of errors in how the XML was parsed, so it's quite plausible that in certain circumstances, it actually did try to insert the same node multiple times (by error of the import script)

Mattflaschen-WMF added a comment.EditedNov 23 2015, 6:02 PM

The problem with the import may have been unrelated. Since that comment, I've found a couple of errors in how the XML was parsed, so it's quite plausible that in certain circumstances, it actually did try to insert the same node multiple times (by error of the import script)

Yeah, I haven't seen it on the XML import since. I think your addition of merge fixed it.

By turning the ids from MySQL's exception into UUID objects[1] and fetching the timestamp, I found these were some of the failed imports (up until when I aborted the script):

mysql:wikiadmin@db1035 [ptwikibooks]> SELECT * FROM revision WHERE rev_timestamp IN ('20110321130600','20151103141305','20151103141305','20110701140553','20151103070141','20101008144351','20101007213047','20110322005355','20110321130600','20151103141305','20151103141305','20101008120453','20101204122718');
+--------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
| rev_id | rev_page | rev_text_id | rev_comment                                                                                                                                                                                                    | rev_user | rev_user_text          | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        | rev_content_model | rev_content_format |
+--------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
| 191463 |    37315 |      184968 | moveu [[Tópico:Wikilivros:LiquidThreads/Blah]] para [[Tópico:Wikilivros:LiquidThreads/Blah2]]: ok?                                                                                                           |     2780 | He7d3r                 | 20101007213047 |              1 |           0 |       5 |        191462 | o5nimovdeg6mdmpr2aawo9w4eadraas | NULL              | NULL               |
| 191487 |    35583 |      176799 | moveu [[Tópico:Wikilivros:Diálogos comunitários/Tipo de avisos devemos colocar no Site Notice]] para [[Tópico:Wikilivros:Diálogos comunitários/Tipo de avisos que devemos colocar no Site Notice]]: typo |     6436 | He7d3r.bot             | 20101008120453 |              1 |           0 |     621 |        191485 | czpp8zie6l8q7u2tkgo5vtr8wuvgjid | NULL              | NULL               |
| 191488 |    37324 |      184992 | moveu [[Tópico:Wikilivros:Diálogos comunitários/Tipo de avisos devemos colocar no Site Notice]] para [[Tópico:Wikilivros:Diálogos comunitários/Tipo de avisos que devemos colocar no Site Notice]]: typo |     6436 | He7d3r.bot             | 20101008120453 |              0 |           0 |     114 |             0 | 2ap4k9kj3k7kuibj7ksbs0r7o9yk0qa | NULL              | NULL               |
| 191528 |    36770 |      182430 | moveu [[Tópico:Wikilivros:LiquidThreads/Testes/resposta (4)]] para [[Tópico:Wikilivros:LiquidThreads/Blah2/resposta]]: teste                                                                                 |     6436 | He7d3r.bot             | 20101008144351 |              1 |           0 |       7 |        188905 | ox6hw8rvhg9dt6zsnzsmzgc27yjor1g | NULL              | NULL               |
| 191529 |    37333 |      185029 | moveu [[Tópico:Wikilivros:LiquidThreads/Testes/resposta (4)]] para [[Tópico:Wikilivros:LiquidThreads/Blah2/resposta]]: teste                                                                                 |     6436 | He7d3r.bot             | 20101008144351 |              0 |           0 |      69 |             0 | aodvnwxu69ef19zw8f3bsp1xmbog5js | NULL              | NULL               |
| 199981 |    38585 |      193126 | moveu [[Tópico:Wikilivros:Comentários/Comentários]] para [[Tópico:Wikilivros:Diálogos comunitários/Sistema de avaliação e premiação/resposta (30)]]: reverter                                        |     5421 | Raylton P. Sousa       | 20101204122718 |              1 |           0 |     628 |        199980 | mx1lm8l5l6j3ny63w0vv77bj9yi355y | NULL              | NULL               |
| 215618 |    37432 |      185499 | moveu [[Tópico:Usuário Discussão:Heldergeovane/AWB]] para [[Tópico:Usuário Discussão:Helder.wiki/AWB]]: conta renomeada                                                                                  |     2780 | He7d3r                 | 20110321130600 |              1 |           0 |     157 |        192013 | i17h79nphkibq6x5hrgx5s9h4hvy5yi | NULL              | NULL               |
| 215731 |    40661 |      208845 | moveu [[Tópico:Wikilivros:LiquidThreads/abcd]] para [[Tópico:Wikilivros:LiquidThreads/abcde]]: test                                                                                                          |     6436 | He7d3r.bot             | 20110322005355 |              1 |           0 |      14 |        215729 | s3pkel2yev60ozuzan1jlwnzl21833a | NULL              | NULL               |
| 222764 |    40414 |      205702 | moveu [[Wikilivros Discussão:Diálogos comunitários/Apresente-se]] para [[Tópico:Wikilivros:Diálogos comunitários/Apresente-se/Oi (2)]]                                                                   |     2780 | He7d3r                 | 20110701140553 |              1 |           0 |      97 |        212584 | l4qh91kpxtur8k5qmt9qe9p3gdszndv | NULL              | NULL               |
| 316171 |    35691 |      303954 | Redirecionando uma publicação com LiquidThreads redirecionada para a publicação com Flow correspondente                                                                                                    |    31660 | Flow talk page manager | 20151103070141 |              0 |           0 |      36 |        182854 | pujd5z6x87iuy88o7aakjk93qptvdpp | NULL              | NULL               |
| 316172 |    35691 |      303955 | Redirecionando uma publicação com LiquidThreads redirecionada para a publicação com Flow correspondente                                                                                                    |    31660 | Flow talk page manager | 20151103070141 |              0 |           0 |      63 |        316171 | dhpfqrxv62sviqe4h6ls3q92irhnryz | NULL              | NULL               |
| 316170 |    68295 |      303953 | /* This page has been converted into a Flow discussion board */                                                                                                                                                |    31660 | Flow talk page manager | 20151103070141 |              0 |           0 |       1 |             0 | sog2plmaegseh8no5k2vw9jsnq7f3kn | NULL              | NULL               |
| 337138 |    40583 |      321011 | Redirecionando uma publicação com LiquidThreads redirecionada para a publicação com Flow correspondente                                                                                                    |    31660 | Flow talk page manager | 20151103141305 |              0 |           0 |      36 |        312811 | nasszl1jbjm8gxwfx8tpbltgsglylaj | NULL              | NULL               |
| 337139 |    40608 |      321012 | Redirecionando uma publicação com LiquidThreads redirecionada para a publicação com Flow correspondente                                                                                                    |    31660 | Flow talk page manager | 20151103141305 |              0 |           0 |      36 |        312814 | 7zr84hq8v7rx2o30gvir7obz9c7y9hu | NULL              | NULL               |
+--------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
14 rows in set (0.00 sec)

It looks like all of these revisions are moves...

1:

// turn binary ID from MySQL exception message into UUID object
$hex = '\x054GE\xFF\xD2\x1F\xE9\x93\xE2B';
$binary = preg_replace_callback( '/\\\\x([a-f0-9]{2})/i', function( $match ) {
    return hex2bin( $match[1] );
}, $hex );
$uuid = Flow\Model\UUID::create( $binary );
Catrope added a subscriber: Catrope.

In T117786#1874896 I found collisions when generating a lot of UUIDs on terbium (using FlowPopulateRefId.php, which simply calls UUID::create()). These collisions were all very similar, which may be of interest.

Ah, so it's likely not a case of double-insertion, but actual duplicate UUIDs.
Judging from Matt's earlier script (which I also ran), duplicate UUID generation doesn't happen (or is much less likely to happen) on local machine than on terbium, where it's now happened a couple of times.

Ah, so it's likely not a case of double-insertion, but actual duplicate UUIDs.
Judging from Matt's earlier script (which I also ran), duplicate UUID generation doesn't happen (or is much less likely to happen) on local machine than on terbium, where it's now happened a couple of times.

I did run it on terbium, though (without duplicates).

I tried some simple tests on terbium, like running echo \Flow\Model\UUID::create(); in a loop 2000 times, and I didn't get any duplicates. I did see a lot of the same patterns I saw before, but every ID was in fact different.

Mattflaschen-WMF added a comment.EditedDec 14 2015, 9:39 PM

I tried some simple tests on terbium, like running echo \Flow\Model\UUID::create(); in a loop 2000 times, and I didn't get any duplicates. I did see a lot of the same patterns I saw before, but every ID was in fact different.

It's normal to have some components the same between certain UUIDs (e.g. node ID, parts of the timestamp). However, if the node ID were the same between machines I think that would cause problems.