Page MenuHomePhabricator

[Errors 301] wikistats: Update entry URLs for wikis that redirect elsewhere or no longer exist
Closed, ResolvedPublic

Description

HTTP 301 entries

There are currently 121 wikis which return HTTP error 301 (moved permanently): they mostly are wikis which changed URL structure slightly; some work in a browser like http://www.wikilou.com/lou/api.php?action=query&meta=siteinfo&maxlag=5 and some don't because the redirection is not entirely correct, anyway they should be updated.
They are currently visible at page 8 (!) as in the URL; CSV export attached.

(The errors 302 are green in the table but are not the correct way to redirect and are in fact mostly dead wikis.)


Version: unspecified
Severity: normal
URL: http://wikistats.wmflabs.org/display.php?t=mw&s=http_desc&p=8

Attached:

Details

Reference
bz44756

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:40 AM
bzimport set Reference to bz44756.

Comment on attachment 11746
HTTP 301 entries

Fix mime type.

(I need the HTTP error in summary to quickly tell from summary what is what.)

as of today, number of wikis with status 301 is 540.

the update.php has a feature called "fixit" which is going through wikis with 301 or 302 and tries to get the new location, then offers it to the user who can semi-automatically update the URLs.

let me try at least reducing the number for now

actually, started with "302"s. there were 370 of them.

running this also finds some that are just not wikis at all anymore (deleting them on sight) and some duplicates that are showing up only if we try to change the URL to the redirect target.

i deleted a couple like "suspended page", ones where only the 'image' path was left but all else deleted.. and so on..

a lot of the ones that are legimate changes and still active are just "http->https", updating those.

also, what is this "wiki.smu.edu.sg" with the weird URLs and so many of them? we have 52 entries in the table. all of them were "http->https" so that fixed a few

down from 370 to 280 times 302 .. will continue later

went through a lot more 302's, many are 200 now (mostly http->https changes) and fixed, also many are deleted (mostly suspended domains, also stuff like moved to other type of wiki, just broken and whatnot)

there are now only 27 left (and some of them are "method 7" without any API URL).

wanna check those manually?

now to the "301s". when starting there were 548 of them (across all methods). I adjusted the update.php and functions.php to make the "fixit" method work with 301s instead of 302s.

after doing some manually i added PHP code for an "autofix" mode that attempts to detect all the ones where only the protocol differs and fix all the http->https ones

then even went a step further and had it delete the ones that would be duplicates when updating them..

down to 336 from 548 ...

..let it update all the remaining URLs automatically and ran another round of updates.. down to just 48 x 301. those would be left for manual inspection.

also note how on each update run that number changes slightly.

can't really invest much more time in this. there will always be some of them left at any given moment. it just needs a cleanup like this every once in a while

i'd like to close the bug at this point unless you guys want to check more manually. i have done what i could with reasonable effort. the rest can be done via "wsa" or you give me specific lists which URLs to change or delete.

(In reply to Daniel Zahn from comment #7)

can't really invest much more time in this. there will always be some of
them left at any given moment. it just needs a cleanup like this every once
in a while

i'd like to close the bug at this point unless you guys want to check more
manually. i have done what i could with reasonable effort. the rest can be
done via "wsa" or you give me specific lists which URLs to change or delete.

Yes, you've been wonderful. I'll do another round of checks in a couple weeks when I have more time.