Author: wiki.bugzilla
Description:
Somehow related to bug 4937:
We've got several complaints about http://www.archive.org/web/web.php
accidentally storing specific offensive revisions of Wikipedia's pages, especially
of user and user_talk pages (mainly privacy issues because of publication of
personal data). The Wayback Machine stores years old data and may also keep it,
even if the original page is gone, for those few "bug readers" who don't know ...
So revisions were already deleted on Wikipedia, but are still stored by the Wayback
crawler (in opposite e.g. to Google's cache, which simply updates/overwrites old data).
According to
http://www.sims.berkeley.edu:8000/research/conferences/aps/removal-policy.html
our users normally don't have an easy and suitable possibility to request the removal
of "their" data, because they have to prove their identity and the identity of a Wikipedia
account (possible but complicated).
Also this is a more general problem, because most users are not aware of the
Wayback Machine at all. A Common argument in the discussion about this topic is,
that there is no need for such an external storage because we've got our own history
which is widely distributed ...
To exclude the Internet Archive's crawler (and remove old documents there)
robots.txt should say:
User-agent: ia_archiver
Disallow: /
I don't see any disadvantages in adding this, at least for NS:2 and NS:3, where
nearly all requests were reffering to, afaik.
Version: unspecified
Severity: normal