Page MenuHomePhabricator

InternetArchiveBot removed archive-url and replaced it with (outdated) Internet Archive snapshot
Closed, InvalidPublic

Description

Bot Version: v1.3.2.1 (old)
Page: https://en.wikipedia.org/wiki/Global_Positioning_System
Diff: https://en.wikipedia.org/w/index.php?title=Global_Positioning_System&type=revision&diff=781479255&oldid=781013068

Original Text:

As of January 2017, GPS time is 18 seconds ahead of UTC because of the leap second added to UTC on December 31, 2016.<ref>{{cite web|title=Notice Advisory to Navstar Users (NANU) 2016069|accessdate=January 2, 2017|url=http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt|archive-url=https://gps.afspc.af.mil/gps/archive/2016/nanus/2016069.nnu|archive-date=November 30, 2016|publisher=GPS Operations Center}}</ref>

Changed To:

As of January 2017, GPS time is 18 seconds ahead of UTC because of the leap second added to UTC on December 31, 2016.<ref>{{cite web|title=Notice Advisory to Navstar Users (NANU) 2016069 |accessdate=January 2, 2017 |url=http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt |archive-url=https://web.archive.org/web/20131909223500/http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt |archive-date=May 21, 2017 |publisher=GPS Operations Center |deadurl=yes |df= }}</ref>

Context: The currentNanus announcement file hosted by www.navcen.uscg.gov was cited as the authoritative source, but as the announcement file is updated over time, https://gps.afspc.af.mil/gps/archive/2016/nanus/2016069.nnu was provided as an archive-url of the announcement. (I'll admit that it was a bad idea for the human editor (me) to cite an announcement like this, but I still think InternetArchiveBot's behavior could be improved).

I feel that the InternetArchiveBot's actions were inappropriate because:

  • The bot shouldn't have changed the archive-url from gps.afspc.af.mil to web.archive.org without good reason (it's possible that it incorrectly assumed https://gps.afspc.af.mil to be invalid because of the DoD CA used to sign ".mil" domains)
  • It chose a very old archive from web.archive.org despite more recent archives being available: https://web.archive.org/web/*/http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt (although in this instance, web.archive.org doesn't have any snapshots of the currentNanus text file taken at the correct time to capture the announcement)

Coincidentally, it also incorrectly flagged http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt as a deadurl (is it possible that it is actually referring to the gps.afspc.af.mil archive-url that it removed?)

Event Timeline

Anjsimmo renamed this task from InternetArchiveBot replaced archive-url with an outdated Internet Archive snapshot to InternetArchiveBot removed archive-url and replaced it with (outdated) Internet Archive snapshot.Jun 9 2017, 2:31 AM
Anjsimmo updated the task description. (Show Details)

Also applies to Bot Version: v1.4beta
Diff: https://en.wikipedia.org/w/index.php?title=User%3AAnjsimmo%2Fsandbox&type=revision&diff=784801427&oldid=784801068 (tested in my sandbox)

Original Text:

As of January 2017, GPS time is 18 seconds ahead of UTC because of the leap second added to UTC on December 31, 2016.<ref>{{cite web|title=Notice Advisory to Navstar Users (NANU) 2016069|accessdate=January 2, 2017|url=http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt|archive-url=https://gps.afspc.af.mil/gps/archive/2016/nanus/2016069.nnu|archive-date=November 30, 2016|publisher=GPS Operations Center}}</ref>

Changed To:

As of January 2017, GPS time is 18 seconds ahead of UTC because of the leap second added to UTC on December 31, 2016.<ref>{{cite web|title=Notice Advisory to Navstar Users (NANU) 2016069 |accessdate=January 2, 2017 |url=http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt |archive-url=https://web.archive.org/web/20131909223500/http://www.navcen.uscg.gov/?pageName=currentNanus&format=txt |archive-date=January 1, 1970 |publisher=GPS Operations Center |deadurl=yes |df= }}</ref>

If it's not recognized as a valid archiving service, it will overwrite the URL with one that is recognized, or just ignore it if it doesn't think it has anything better. That URL in the archive-url field doesn't look like a web service that hosts snapshots.

You are right, it seems that I misunderstood the intended use of the archive-url field. Although in my defence, the guidelines were a bit unclear on whether the archive-url had to be a dedicated web archive service.

archive-url: ... Typically used to refer to services such as WebCite (see Wikipedia:Using WebCite) and Internet Archive ...