pwb.py reflinks.py -lang:ru -family:wikipedia -start:! -v -ignorepdf Traceback (most recent call last): File "pwb.py", line 262, in <module> if not main(): File "pwb.py", line 255, in main run_python_file(filename, [filename] + args, argvu, file_package) File "pwb.py", line 121, in run_python_file main_mod.__dict__) File "./scripts/reflinks.py", line 798, in <module> main() File "./scripts/reflinks.py", line 793, in main bot.run() File "./scripts/reflinks.py", line 588, in run linkedpagetext = f.content File "/home/user/python/core/pywikibot/comms/threadedhttp.py", line 181, in content return self.decode(self.encoding) File "/home/user/python/core/pywikibot/comms/threadedhttp.py", line 138, in encoding if not self.charset and not self.header_encoding: File "/home/user/python/core/pywikibot/comms/threadedhttp.py", line 121, in header_encoding content_type = self.response_headers['content-type'] File "/home/user/anaconda3/lib/python3.6/site-packages/requests/structures.py", line 54, in __getitem__ return self._store[key.lower()][1] KeyError: 'content-type' Dropped throttle(s). <class 'KeyError'> CRITICAL: Closing network session. Network session closed. (Pdb) self.response_headers {'Date': 'Tue, 30 May 2017 20:03:29 GMT', 'Server': 'Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips', 'Last-Modified': 'Wed, 08 Feb 2006 19:49:40 GMT', 'ETag': '"338b0-40c4dcbea7d00"', 'Accept-Ranges': 'bytes', 'Content-Length': '211120', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'}
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
threadedhttp: add default for content-type | pywikibot/core | master | +13 -1 |
Related Objects
Event Timeline
@Mpaa I see, it is caused by the following link: https://sbn.psi.edu/pds/asteroid/EAR_A_5_DDR_ALBEDOS_V1_1/data/albedos.tab
$ curl -I https://sbn.psi.edu/pds/asteroid/EAR_A_5_DDR_ALBEDOS_V1_1/data/albedos.tab HTTP/1.1 200 OK Date: Tue, 15 Aug 2017 08:41:16 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips Last-Modified: Wed, 08 Feb 2006 19:49:40 GMT ETag: "338b0-40c4dcbea7d00" Accept-Ranges: bytes Content-Length: 211120
But for me this is just a warning and is skipped:
$ python pwb.py reflinks -page:"(31) Евфросина" -lang:ru -user:Dvorapa Retrieving 1 pages from wikipedia:ru. No charset found for http://www.psi.edu/pds/asteroid/EAR_A_5_DDR_ALBEDOS_V1_1/data/albedos.tab No content-type found for http://www.psi.edu/pds/asteroid/EAR_A_5_DDR_ALBEDOS_V1_1/data/albedos.tab No changes were needed on [[(31) Евфросина]]
It is just a warning, but it should definitely end with No title found instead of current No content-type found
Wierd, for me it stops here, both in py2 ad py3:
File "/home/user/python/core/pywikibot/comms/threadedhttp.py", line 121, in header_encoding content_type = self.response_headers['content-type'] File "/usr/local/lib/python2.7/dist-packages/requests/structures.py", line 54, in __getitem__ return self._store[key.lower()][1]
@Mpaa I can reproduce this error only on 6 months old copy of pwb, current version only skips the link. Which version of pwb are you using?
I am always aligned with master (pull --recurse-submodules). Am I missing something?
user@pc:~/python/core {master}$ python scripts/version.py Pywikibot: [ssh] pywikibot-core.git (0fc98a7, g8516, 2017/08/15, 17:28:18, n/a) Release version: 3.0-dev requests version: 2.18.3 cacerts: /usr/local/lib/python2.7/dist-packages/certifi/cacert.pem certificate test: ok
@Dvorapa, how does it pass this line?!
Does self.response_headers contain the key?!
File "pywikibot/comms/threadedhttp.py", line 121, in header_encoding content_type = self.response_headers['content-type']
Change 371973 had a related patch set uploaded (by Dvorapa; owner: Dvorapa):
[pywikibot/core@master] [bugfix, i18n, PEP8] Make reflinks.py work smoothly
Change 372166 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] threadedhttp: add default for content-type
Change 372166 merged by jenkins-bot:
[pywikibot/core@master] threadedhttp: add default for content-type