Page MenuHomePhabricator

Webcite date conversion incorrect
Closed, ResolvedPublic

Description

Example diff:

https://en.wikipedia.org/w/index.php?title=1994_Winter_Olympics&type=revision&diff=773210110&oldid=773210033

For Webcite URL:

http://www.webcitation.org/5uvzM3yHO

IABot changed the date from December 13 to December 12. However the correct date is 13. The reason I suspect is when IABot does the Unix time conversion it is using local time. Webcite is keyed to using GMT .. the PHP date library should have an option for GMT.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

IABot is set to UTC, so that's not the issue.

https://github.com/cyberpower678/Cyberbot_II/blob/master/IABot/deadlink.php#L19

I'll have to look, but definitely not an urgent issue.

It's very possible. I designed the base 62 decoder myself. I'll have to take a look.

The base 62 decoder is operating correctly.

Ah they're bad snapshot times. Before IABot began validating archive URLs, it used blindly trust the archive date of snapshots in cite template. Maybe it was a typo. These are all however now engrained in the DB until the snapshot gets reset.

If these occur again, this can be fixed with the IABot Management Interface.

The problem is actually large, involving thousands of articles, too big via the MI. I believe there was another bot a few years ago that was using local instead of GMT time and so it populated thousands of articles with wrong archivedate data. This bad data then got imported into the IAB database.

Can I suggest a couple solutions. 1. IABot checks the archivedate against the base62 code before any changes to the article. 2. Run a script on the IAB database for all WebCite links to make sure the archivedate matches the base62 result.

Option 1 cannot be done, as it is directly from the DB when it requests an archive URL. That being said option 2 can be done. Option 1 is moot since the newest code force validates all archives and doesn't blindly trust what's written.

Working on the maintenance script now.

I have the script ready to go, for the DB cleanup. I extensively tested this to ensure it works correctly. In order to move forward with it, I'm waiting for https://github.com/wikimedia/DeadlinkChecker/pull/16 to be merged.

With the code merged, I'm now going to start the script.