Page MenuHomePhabricator

Recover history loss on zh.wikipedia.org as result of several page renames (page move)
Open, LowPublic

Details

Reference
bz37591

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:23 AM
bzimport set Reference to bz37591.
bzimport added a subscriber: Unknown Object (MLST).
Shizhao created this task.Jun 14 2012, 1:59 PM
Petrb added a comment.Jun 14 2012, 2:02 PM

this bug may need to be solved on db level +shell

Here're some check on Toolserver (with big replag):

$ sql zhwiki_p
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 88023986
Server version: 5.1.53 Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from page where page_title='特色条目候选' and page_namespace=4;
+---------+----------------+--------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+

page_idpage_namespacepage_titlepage_restrictionspage_counterpage_is_redirectpage_is_newpage_randompage_touchedpage_latestpage_len

+---------+----------------+--------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+

467284特色条目候选0000.46463995821120120606104339214169603421

+---------+----------------+--------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+
1 row in set (0.01 sec)

mysql> select * from revision where rev_page=46728 order by rev_id desc limit 10;
+----------+----------+-------------+-----------------------------------------------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

rev_idrev_pagerev_text_idrev_commentrev_userrev_user_textrev_timestamprev_minor_editrev_deletedrev_lenrev_parent_idrev_sha1

+----------+----------+-------------+-----------------------------------------------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

214169604672821613695consistency775182Cravix20120531141052103421213454610cu0xuhfnyfcgsfwowsww8qg7byyg6m
213454614672821539076/* 提名區 */642079Lixihan201205250928020033632126922004oty0im3wcbdarcd804v3ziwci7hbf
212692204672821458630522281Waihorace2012051906183900332421269208tlo1y3elpf11bw27y911rcvp9bwdp0v
212692084672821458618522281Waihorace20120519061755003142208153553s3656gdzau9xach8ppqbd7bs4qz7kd
2081535546728209868381047411风雨同舟2012041908431000314120813643c7gul0l1nnqnwlt6u3nx2j9zktix8ro
208136434672820984884441687Ai6z83xl3g20120419054402003184198229796c9ow3fbiwbx62tdbmpo9nibbdw9iap
198229794672819976799441687Ai6z83xl3g2012040505111600323319807560pn0lkty2amgxmwu6pavsu5jw9keol65
198075604672819960395{{Wikipedia:特色條目候選/苏州市}}:進行評選959311Mouse2008070620120403231200003184198049826c9ow3fbiwbx62tdbmpo9nibbdw9iap
1980498246728199577281106366Huhaoyu3212012040317565500314219804894201a73ud6onvuoqo080az0bqwznguqn
1980489446728199576381106366Huhaoyu32120120403175211003184181260336c9ow3fbiwbx62tdbmpo9nibbdw9iap

+----------+----------+-------------+-----------------------------------------------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+
10 rows in set (0.02 sec)

mysql> Bye

$ curl "http://zh.wikipedia.org/w/index.php?oldid=19804894&action=render&uselang=en"
<p>The database did not find the text of a page that it should have found, named "Wikipedia:首页" (revision#: 19804894).</p>
<p>This is usually caused by following an outdated diff or history link to a page that has been deleted.</p>
<p>If this is not the case, you may have found a bug in the software. Please report this to an <a href="//zh.wikipedia.org/wiki/Special:%E7%94%A8%E6%88%B7%E5%88%97%E8%A1%A8/sysop" title="Special:用户列表/sysop">administrator</a>, making note of the URL.</p>

<!--
NewPP limit report
Preprocessor node count: 1/1000000
Post-expand include size: 0/2048000 bytes
Template argument size: 0/2048000 bytes
Highest expansion depth: 1/40
Expensive parser function count: 0/500
-->

Someone may check where these rev_id's and rev_sha1's went on WMF cluster or after TS catch up.

Petrb added a comment.Jun 14 2012, 2:18 PM

What is name of original page

(In reply to comment #4)

What is name of original page

See my SQL above.

This is just some note.

mysql> select * from page where page_title='特色条目评选' and page_namespace=4;
+---------+----------------+--------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+

page_idpage_namespacepage_titlepage_restrictionspage_counterpage_is_redirectpage_is_newpage_randompage_touchedpage_latestpage_len

+---------+----------------+--------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+

2911544特色条目评选0100.42939550860320120531141052235374242

+---------+----------------+--------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+
1 row in set (0.02 sec)

mysql> select * from revision where rev_page=291154 limit 10;
+---------+----------+-------------+------------------------------------------------------------------------------+----------+-----------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

rev_idrev_pagerev_text_idrev_commentrev_userrev_user_textrev_timestamprev_minor_editrev_deletedrev_lenrev_parent_idrev_sha1

+---------+----------+-------------+------------------------------------------------------------------------------+----------+-----------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

19774862911541955773[[Wikipedia:特色条目评选]]已移动到[[Wikipedia:特色条目候选]]23701百家姓之四2006052303590200420ek74pn6yc6sdzdk4q0kmxmbhnoa8g02
23537012911542317697重定向到[[Wikipedia:特色條目候選]]23701百家姓之四2006080202173110421977486sk2aazj3a537cmwxq54jox860brntz7
23537422911542317736重定向到[[Wikipedia:特色条目候选]]23701百家姓之四2006080202240310422353701ek74pn6yc6sdzdk4q0kmxmbhnoa8g02

+---------+----------+-------------+------------------------------------------------------------------------------+----------+-----------------+----------------+----------------+-------------+---------+---------------+---------------------------------+
3 rows in set (0.04 sec)

mysql>

(In reply to comment #4)

What is name of original page

Original page is Wikipedia:特色条目候选

shizhao, can you describe your exact operating procedure?

Petrb added a comment.Jun 14 2012, 2:45 PM

when this problem happened? After the last move at 12:56, 14 June 2012? Or some other date? Was it after this move:

12:56, 14 June 2012 Shizhao moved page Wikipedia:特色条目候选 to Wikipedia:特色条目评选 (开始改版)

I can't understand why there is a page which redirects to itself

(In reply to comment #8)

shizhao, can you describe your exact operating procedure?

  1. move "wikipedia:特色条目候选" -> "wikipedia:特色条目评选"
  2. moving.....

....
....
3.server error page, "We have a technical problem.....blablabla"
4.wait few seconds
5.closed the tab.
6.open page "wikipedia:特色条目评选", now it just a redirect
7.restore history revision
8 end

PS:

4.wait few seconds
4.1 refresh the page
4.2 wait few seconds, but not respond
5.closed the tab.

Petrb added a comment.Jun 14 2012, 2:52 PM

This is indeed a bug, the move wasn't done using interface of mediawiki but some tool, there were many moves done by that user in the very same minute (more than 50 moves). The first one was this page and as you can see, every move consist of two log actions:

  1. minor edit on target page, where the page is moved
  2. new page creation on original page (redirect string)

in this case only the second occured. So I guess if you delete the page (original name) you should be able to restore the previous version which contains full history. In fact the page was never moved, but overwritten with redirect

Petrb added a comment.Jun 14 2012, 2:55 PM

it's even more complicated, the page was moved using the interface, however the number of subpages which were moved together was so large that mw probably crashed on execution timeout and left it unfinished

(In reply to comment #13)

it's even more complicated, the page was moved using the interface, however the
number of subpages which were moved together was so large that mw probably
crashed on execution timeout and left it unfinished

maybe

Petrb added a comment.Jun 14 2012, 3:22 PM

Last revision on that page was 21416960, text 21613695

selecting this data from database would help you recover the content of latest revision

I temp rewrite the page "wikipedia:特色条目评选". History revision no restore.

(In reply to comment #15)

Last revision on that page was 21416960, text 21613695
selecting this data from database would help you recover the content of latest
revision

Can't get page content (text 21613695) from toolserver database.

(In reply to comment #10)

(In reply to comment #8)

shizhao, can you describe your exact operating procedure?

  1. move "wikipedia:特色条目候选" -> "wikipedia:特色条目评选"
  2. moving.....

....
....
3.server error page, "We have a technical problem.....blablabla"
4.wait few seconds
5.closed the tab.
6.open page "wikipedia:特色条目评选", now it just a redirect
7.restore history revision
8 end

Let's expand it in detail:

0. There was a page Wikipedia:特色条目评选 which was a redirect to Wikipedia:特色条目候选
1a. You try to move Wikipedia:特色条目候选 to Wikipedia:特色条目评选
1b. The system gives you a warning saying that Wikipedia:特色条目评选 already exists and provides you a checkbox to delete it before moving
1c. You tick the checkbox and submit the form again.

Is this correct?

  1. When you check Wikipedia:特色条目评选, it's a redirect to Wikipedia:特色条目候选 (the same as this page before moving? did you check page history and what did it say?)
  2. Which title did you do this operation on? Does "restore history revision" (only one? what's it?) mean undeleting an archive (=deleted in usual words) revision? If yes, are there any revisions in action=history before undeletion?

(In reply to comment #18)

(In reply to comment #10)

(In reply to comment #8)

shizhao, can you describe your exact operating procedure?

  1. move "wikipedia:特色条目候选" -> "wikipedia:特色条目评选"
  2. moving.....

....
....
3.server error page, "We have a technical problem.....blablabla"
4.wait few seconds
5.closed the tab.
6.open page "wikipedia:特色条目评选", now it just a redirect
7.restore history revision
8 end

Let's expand it in detail:
0. There was a page Wikipedia:特色条目评选 which was a redirect to Wikipedia:特色条目候选
1a. You try to move Wikipedia:特色条目候选 to Wikipedia:特色条目评选
1b. The system gives you a warning saying that Wikipedia:特色条目评选 already exists
and provides you a checkbox to delete it before moving
1c. You tick the checkbox and submit the form again.
Is this correct?

yes, I do it.

  1. When you check Wikipedia:特色条目评选, it's a redirect to Wikipedia:特色条目候选 (the

same as this page before moving? did you check page history and what did it
say?)

When check Wikipedia:特色条目评选, Page does not exist.

  1. Which title did you do this operation on? Does "restore history revision"

(only one? what's it?) mean undeleting an archive (=deleted in usual words)
revision? If yes, are there any revisions in action=history before undeletion?

I restore 3 history revision.

priority +1, no one seems to care about this, added few people with shell to cc

Peter Bena, please review the following document:
https://www.mediawiki.org/wiki/Bug_management/Bugzilla_usage

A bug with an 'high' priority means 'should be fixed within the next month'.

You can check the guideliens for highest bugs at the following URL:
https://www.mediawiki.org/wiki/Bug_management/How_to_triage

You'll note there is a notify procedure to use for the highest/critical combination.

Consider also an highest prioritized bug will slowdown general development and ops tasks to fix this, and so is the equivalent of a "red alert".

(In reply to comment #2)

mysql> select * from revision where rev_page=46728 order by rev_id desc limit
10;
+----------+----------+-------------+-----------------------------------------------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

rev_idrev_pagerev_text_idrev_comment
rev_userrev_user_textrev_timestamprev_minor_edit

rev_deleted | rev_len | rev_parent_id | rev_sha1 |
+----------+----------+-------------+-----------------------------------------------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

214169604672821613695consistency
775182Cravix201205311410521
0 |    3421 |      21345461 | 0cu0xuhfnyfcgsfwowsww8qg7byyg6m |

Seems relevant revision / archive rows are missing while text is still there:

mysql> select * from archive where ar_rev_id=21416960;
Empty set (0.07 sec)

mysql> select * from archive where ar_timestamp='20120531141052' /* SLOW_OK */;
Empty set (40.81 sec)

mysql> select * from revision where rev_sha1='0cu0xuhfnyfcgsfwowsww8qg7byyg6m' /* SLOW_OK */;
Empty set (16 min 28.74 sec)

mysql> select * from revision where rev_text_id=21613695 /* SLOW_OK */;
Empty set (43 min 54.86 sec)

mysql> select * from text where old_id=21613695;
+----------+---------------+-----------+-----------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+

old_idold_namespaceold_titleold_textold_commentold_userold_user_textold_timestampold_minor_editold_flagsinverse_timestamp

+----------+---------------+-----------+-----------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+

216136950DB://cluster23/39888900utf-8,gzip,external

+----------+---------------+-----------+-----------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.03 sec)

Reedy added a comment.May 16 2013, 9:44 PM

Why do we need this bug? There's bug 39008 requesting an import of history, and then bug 39007 for the actual underlying problem

(In reply to comment #23)

Why do we need this bug? There's bug 39008 requesting an import of history,
and
then bug 39007 for the actual underlying problem

It might be ok to close bug 39007 and use this one to track the underlying issue. But this one is in Wikimedia product and the problem itself may belong to MediaWiki (where bug 39007 is placed).

Meno25 removed a subscriber: Meno25.Feb 8 2016, 7:39 PM
1978Gage2001 moved this task from Triage to In progress on the DBA board.Dec 11 2017, 9:45 AM
Marostegui moved this task from In progress to Triage on the DBA board.Dec 11 2017, 11:07 AM
Marostegui edited projects, added Wikimedia-Rdbms; removed DBA.Jun 10 2018, 9:35 AM
Krinkle renamed this task from History of a page was lost after several move actions on zhwp to History of a page was lost after several page renames (page move) on zh.wikipedia.org.Jul 28 2018, 9:47 PM
Krinkle renamed this task from History of a page was lost after several page renames (page move) on zh.wikipedia.org to Investigate history loss on zh.wikipedia.org as result of several page renames (page move).
Krinkle renamed this task from Investigate history loss on zh.wikipedia.org as result of several page renames (page move) to Recover history loss on zh.wikipedia.org as result of several page renames (page move).EditedJul 28 2018, 9:52 PM
Krinkle added subscribers: JEumerus, Matanya, Snowolf.
Krinkle added a subscriber: Krinkle.

Ughh... so the page ids are different now than they were at the time of the bug report (I checked this on the production db), the old text entries are indeed still there and are obviously orphaned now.
If we use the XML file, new text entries wil be created. This isn't awesome, but at least the missing revisions will be back in there. I have no idea how importDump responds to importing revisions for a page where a page with that title exists already; it will silently skip existing revisions however, and that's good for us.
If we wanted to try to construct the revision rows with the original text ids, there's a stub dump from around the beginning of June 2012 which would give us the info, and a check of the sql query on T39591 (bug 37591) shows that there was nothing new since then. Is it worth the work?

jcrespo removed a project: DBA.Aug 10 2018, 10:58 AM