See parent task and @Zoranzoki21's comment on https://gerrit.wikimedia.org/r/#/c/357849/
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Framawiki | T166934 weblinkchecker: Reporting dead links on talk page | |||
| Resolved | Xqt | T167463 weblinkchecker: Use https instead of http for web.archive.org | |||
| Resolved | Xqt | T185561 weblinkchecker.py slows down (itself, OS) to freeze after a while reaching 100% of RAM |
Event Timeline
I modifided code in Saturday, and I tested, but it do not work. Only work translate on Serbian which I add and my modification to not adding 3 blank lines per report, to add one blank line per report.
See my comment on gerit
As I can see the problem comes from the _get_closest_memento_url function that uses memento_client.MementoClient(), that is an extern library : https://github.com/mementoweb/py-memento-client
>>> import datetime
>>> import memento_client
>>> mc = memento_client.MementoClient()
>>> when = datetime.datetime.now()
>>> url = 'http://www.fallingrain.com/world/YI/2/Dunisice.html'
>>> memento_info = mc.get_memento_info(url, when)
>>> memento_info
{'mementos': {'last': {'uri': ['http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45)}, 'closest': {'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45), 'uri': [u'http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'http_status_code': 404}, 'first': {'uri': ['http://web.archive.org/web/20071001061940/http://www.fallingrain.com/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2007, 10, 1, 6, 19, 40)}}, 'original_uri': 'http://www.fallingrain.com/world/YI/2/Dunisice.html', 'timegate_uri': 'http://timetravel.mementoweb.org/timegate/http://www.fallingrain.com/world/YI/2/Dunisice.html'}
>>> mementos = memento_info.get('mementos')
>>> mementos
{'last': {'uri': ['http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45)}, 'closest': {'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45), 'uri': [u'http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'http_status_code': 404}, 'first': {'uri': ['http://web.archive.org/web/20071001061940/http://www.fallingrain.com/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2007, 10, 1, 6, 19, 40)}}
>>> mementos['closest']['uri'][0]
u'http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'Confirmed.
I'll create a task on the mementos's gihthub repo, and upload a temporary hacky patch for pywikibot.
Change 358053 had a related patch set uploaded (by Framawiki; owner: Framawiki):
[pywikibot/core@master] [bugfix] weblinkchecker.py: Use https for web.archive.org
Change 358053 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] weblinkchecker.py: Use https for web.archive.org
@Zoranzoki21 To get the last version of pywikibot, you have ti use Git: https://www.mediawiki.org/wiki/Manual:Pywikibot/Gerrit. Don't hesitate to tell me if I can help you to use it.
What are we supposed to see there? There doesn't seem to be a web.archive.org link there?
Hmmm. Now I downloaded script weblinkchecker from http://tools.wmflabs.org/pywikibot/ and.. There have not https. See here.
Aha. It seems getInternetArchiveURL uses https://archive.org/wayback/available?url=http://nl.wikipedia.org/, which also still returns an http link. So the search-and-replace should be moved to setLinkDead.
Good news: The issue is fixed with the library, so I'll look if we can revert my hacky patch.
I updated, but I not see efects. See: https://sr.wikipedia.org/wiki/%D0%A0%D0%B0%D0%B7%D0%B3%D0%BE%D0%B2%D0%BE%D1%80:.sj
I have not https. I replaced with my bot in all articles on serbian wikipedia http to https for webarchive. I started script, but i have not https.. See: https://sr.wikipedia.org/wiki/%D0%A0%D0%B0%D0%B7%D0%B3%D0%BE%D0%B2%D0%BE%D1%80:D%27Ilio,_Chieti
Change 380923 had a related patch set uploaded (by Zoranzoki21; owner: Zoranzoki21):
[pywikibot/core@master] [bugfix] weblinkchecker.py: Use https for web.archive.org
Change 380923 abandoned by Zoranzoki21:
[bugfix] weblinkchecker.py: Use https for web.archive.org
Reason:
All checks on github failed.. And I do not know how to make permanent fix for https
As the issue with memento is solved, is this issue solved too? Is the hacky patch reverted? Can this be marked as resolved?
Change 806554 had a related patch set uploaded (by Xqt; author: Xqt):
[pywikibot/core@master] Revert: [bugfix] weblinkchecker.py: Use https for web.archive.org
Change 806554 merged by jenkins-bot:
[pywikibot/core@master] Revert: [bugfix] weblinkchecker.py: Use https for web.archive.org
