Page MenuHomePhabricator

gutenberg.org returns a 403 on Travis
Closed, ResolvedPublic

Description

The Travis tests on tests.site_detect_tests.TestWikiSiteDetection.test_detect_site fail currently because http://www.gutenberg.org/wiki/$1 currently returns a 403. But when I do the test locally it works.

Event Timeline

XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise added a project: Pywikibot.
XZise subscribed.
XZise set Security to None.

Okay depending on how you interpret http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use#Audience it may be that they just don't want bots accessing their wiki in which case we should remove it from the test.

Okay I output the content in Travis:

<!DOCTYPE HTML>
<html><head><title>Error 403</title></head>
<body>
<h1>Error 403</h1>
<p>Maybe you have just a wrong url. Go to http://www.gutenberg.org/ebooks/ first to see if the error persists.</p>
<p>If you get the error again check that you:</p>
<ul>
  <li>Don't use anonymizers, open proxies, VPNs, or TOR to access Project Gutenberg. This includes the Google proxies that are used by Chrome.</li>
  <li>Don't access Project Gutenberg from hosted servers.</li>
  <li>Don't use automated software to download lots of books. We have a limit on how fast you can go while using this site. If you surpass this limit you get blocked for 24h.</li>
  <li>We have a daily limit on how many books you can download. If you exceeded this limit you get blocked for 24h.</li>
  <li>If you use the RSS feed, set your update interval to 24 hours.</li>
</ul>
<p>
If you are sure that none of the above applies to you, 
and wish us to investigate the problem,
we need to know your IP address.
Go to <a href="http://www.whatismyip.com/">this site</a>,
don't sign up, 
just copy the IP address 
(it looks like: 12.34.56.78 but your numbers will be different)
and
<a href="mailto:webmaster@gutenberg.org?subject=403%20help">mail it to us</a>.
If that page also shows a proxy address, we need that one too. 
</p>
</body>
</html>

Just noticing but just because we remove it from that one test doesn't mean that we have solved this problem. test_IWM will also scan that as it's in the IWM of the English Wikipedia. I have also contacted the webmaster. Maybe it's possible that it won't be blocked.

IMO this entry can be removed so that our tests go green after one of the weblib fixes are merged.
Or dont run it on travis, and adjust the expected expected success count accordingly. (i.e. passes == total)

Change 231013 had a related patch set uploaded (by XZise):
[FIX] site_detect_tests: Don't test Gutenberg wiki

https://gerrit.wikimedia.org/r/231013

Change 231013 merged by jenkins-bot:
[FIX] site_detect_tests: Don't test Gutenberg wiki

https://gerrit.wikimedia.org/r/231013

jayvdb claimed this task.