Page MenuHomePhabricator

Some revisions on Chinese Wikisource have timestamps from January 1970 and January 2001
Open, LowestPublic

Description

About 11 000 revisions on the Chinese Wikisource have timestamps from January 1970 (which seems like a Unix epoch bug) and January 2001. Rev 254184 and rev 187693 are representative. The rev_ids of these revisions all seem to be six digits, suggesting that they were actually made in 2008 or later.

As best I can tell, the wiki was created around November 2003. Chinese Wikisource seems to be the only wiki with this kind of weird timestamping in the revision table

SELECT COUNT(*)
FROM zhwikisource.revision
WHERE rev_timestamp < "20030711085956";

+----------+
| COUNT(*) |
+----------+
|    11277 |
+----------+
1 row in set (0.00 sec)
SELECT LEFT(rev_timestamp, 6) AS month, rev_user_text, rev_user, COUNT(*)
FROM zhwikisource.revision
WHERE rev_timestamp < "20030711085956"
GROUP BY month, rev_user_text;

+--------+---------------+----------+----------+
| month  | rev_user_text | rev_user | COUNT(*) |
+--------+---------------+----------+----------+
| 197001 | Liangent      |     3087 |        1 |
| 197001 | Wmr-bot       |     4959 |      547 |
| 197001 | Wmrwiki       |     2632 |       30 |
| 199911 | Wmrwiki       |     2632 |        1 |
| 199912 | Wmrwiki       |     2632 |        2 |
| 200101 |               |        0 |     9456 |
| 200101 | Wmr-bot       |     4959 |     1239 |
| 200101 | Wmrwiki       |     2632 |        1 |
+--------+---------------+----------+----------+
8 rows in set (0.37 sec)

Event Timeline

nshahquinn-wmf raised the priority of this task from to Lowest.
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf added a subscriber: nshahquinn-wmf.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 11 2016, 11:23 PM
Reedy added a subscriber: Reedy.Jan 11 2016, 11:25 PM

They're not imported revisions are they? Well, the ones that aren't unix epoch that is

@Neil_P._Quinn_WMF: Could you answer the last question?

Andre, thanks for the reminder.

@Reedy, good point. It looks like almost all of these are imported revisions, so it was probably misuse of the import tool. I imagine there's no good way to fix it, then?

SELECT COUNT(*)
FROM revision
WHERE rev_timestamp < "2003";
---
COUNT(*)
11277
SELECT COUNT(*)
FROM revision 
INNER JOIN page
ON rev_page = page_id
INNER JOIN logging
ON log_title = page_title
WHERE 
	log_type = "import" AND
	rev_timestamp < "2003";
---
COUNT(*)
11156
Reedy added a comment.Jan 21 2016, 9:15 PM

If you can find out what it's supposed to be, and then provide a revision_id/timestamp pair, we could import and update them

The difficult part will be working out what they are supposed to be, I guess

Ah, interesting. I'll keep this on my to-do list, then, and if I ever get some time, I'll see if I can figure it out.

Wmrwiki added a subscriber: Wmrwiki.EditedApr 23 2016, 2:04 PM

I think it was because I did not include the timestamp information when I was importing the text. The import function is to import text from other wikis, but I used it to add text using manually created xml files. I sincerely express apologize to every Wikisource user, and to the Wikimedia Foundation. I apologize for the mess I've made, and the trouble it has brought.

Perhaps one way to deal with them is to delete these revisions? These pages were created when I (both wmrwiki and JerryofWong) imported them.

@Wmrwiki, don't worry too much about it! It hasn't been a problem for me; I just happened to notice the dates when I was doing some analysis and was curious to see what had caused them :)

Did you do all the imports at roughly the same time? If so, Reedy says it would be pretty easy to update the revisions to have that timestamp.

I thnk they were imported in several months, if not years.

Restricted Application added a subscriber: Cosine02. · View Herald TranscriptDec 21 2016, 9:58 AM

@Shizhao: Why did you add Wikisource (looking at the project desc) and MediaWiki-Export-or-Import (looking at T123313#2232273) here?

Having revisions older than the wiki is expected. The bug is having fake revision timestamps.

Nemo_bis renamed this task from Some revisions on Chinese Wikisource have timestamps from before the wiki was created to Some revisions on Chinese Wikisource have timestamps from January 1970 and January 2001.Dec 27 2016, 7:30 PM
Shizhao added a comment.EditedDec 28 2016, 6:56 AM

@Shizhao: Why did you add Wikisource (looking at the project desc) and MediaWiki-Export-or-Import (looking at T123313#2232273) here?

This problem is because user who import content with wrong timestamp into wikisource. (I remember that...)

yes! just T123313#2232273

This problem is because user who import content with wrong timestamp into wikisource. (I remember that...)

That did not answer my question, hence removing random tags. Please do not add tags before reading and understanding the project descriptions. Thanks!

This problem is because user who import content with wrong timestamp into wikisource. (I remember that...)

That did not answer my question, hence removing random tags. Please do not add tags before reading and understanding the project descriptions. Thanks!

The second tag that you removed is not really random, this affects zhwikisource, which is also one of zh.* projects, maybe you should remove Wikisource because it's just randomly added.

Shizhao added a comment.EditedDec 29 2016, 2:15 PM

This problem is because user who import content with wrong timestamp into wikisource. (I remember that...)

That did not answer my question, hence removing random tags. Please do not add tags before reading and understanding the project descriptions. Thanks!

To solve this problem need to start from two aspects:

  1. How to solve the current timestamps error,
  2. How to prevent the same problem again in the future

For 2, I think we should start from Export and Import:

  • Suggest and encourage wiki sites to use "Import from another wiki" instead of "Upload XML data", make sure that the imported content timestamp can't be modified
  • Add a new verification method. When importing content, compare the timestamp of the imported content with the first edit (such as rev 1) timestamp of the content source wiki

@Aklapper: so, I think this is a MediaWiki-Export-or-Import

Liuxinyu970226 added a subscriber: MF-Warburg.EditedDec 29 2016, 2:26 PM

@Shizhao:

For 2, I think we should start from Export and Import:

  • Suggest and encourage wiki sites to use "Import from another wiki" instead of "Upload XML data", make sure that the imported content timestamp can't be modified

Strongly oppose, it makes the New Wiki Importers' (e.g. @MF-Warburg ) work more and more hard.

  • Add a new verification method. When importing content, compare the timestamp of the imported content with the first edit (such as rev 1) timestamp of the content source wiki

Feasibility investigation needed.

@Aklapper: so, I think this is a MediaWiki-Export-or-Import

So I much more likely doubt the original meaning of this Component:

Issues relating to Special:Export and Special:Import.
This project is part of the core MediaWiki software itself.

IMO such tags which "just track some random Special pages, no code projects happen" should be merged with MediaWiki-Special-pages

To solve this problem need to start from two aspects:

  1. How to solve the current timestamps error,

Probably some maintenance script is required.

  1. How to prevent the same problem again in the future

The solution is what https://meta.wikimedia.org/wiki/Importer says: avoiding importupload as much as possible. There is almost never a good reason for importupload. I don't know who gave out so many import flags, but you could ask the stewards to remove them all. As long as new imports keep being done in the same (incorrect) way, there is little gain in working on a fix for the data.

TTO added a subscriber: TTO.Dec 30 2016, 12:22 PM
Liuxinyu970226 added a comment.EditedJan 3 2017, 4:58 AM

I don't know who gave out so many import flags, but you could ask the stewards to remove them all. As long as new imports keep being done in the same (incorrect) way, there is little gain in working on a fix for the data.

Jusjih? (unfortunately the steward right of this user was removed in Dec 2015)

To solve this problem need to start from two aspects:

  1. How to solve the current timestamps error,

Probably some maintenance script is required.

  1. How to prevent the same problem again in the future

The solution is what https://meta.wikimedia.org/wiki/Importer says: avoiding importupload as much as possible. There is almost never a good reason for importupload. I don't know who gave out so many import flags, but you could ask the stewards to remove them all. As long as new imports keep being done in the same (incorrect) way, there is little gain in working on a fix for the data.

see https://zh.wikisource.org/wiki/Wikisource:%E5%B0%8E%E5%85%A5%E8%80%85 (Requests for import), all permissions are community consensus.

To solve this problem need to start from two aspects:

  1. How to solve the current timestamps error,

Probably some maintenance script is required.

  1. How to prevent the same problem again in the future

The solution is what https://meta.wikimedia.org/wiki/Importer says: avoiding importupload as much as possible. There is almost never a good reason for importupload. I don't know who gave out so many import flags, but you could ask the stewards to remove them all. As long as new imports keep being done in the same (incorrect) way, there is little gain in working on a fix for the data.

see https://zh.wikisource.org/wiki/Wikisource:%E5%B0%8E%E5%85%A5%E8%80%85 (Requests for import), all permissions are community consensus.

Anyway I sent e-mail to @jusjih about this issue, but still no reply

see https://zh.wikisource.org/wiki/Wikisource:%E5%B0%8E%E5%85%A5%E8%80%85 (Requests for import), all permissions are community consensus.

The community should either

  • grant such permissions for a shorter time; or
  • operate under a strict policy to ensure that the permission is given only for specific, absolutely needed and limited imports, not for operations which disrupt edit histories; or
  • acknowledge that it's more important to preserve edit histories than to do whatever they're doing with importupload, and therefore give up all such local rights.