Page MenuHomePhabricator

Create a API server on tool that can replace 404-links.txt's reflinks.py file
Open, LowestPublic

Description

In this change https://gerrit.wikimedia.org/r/#/c/393100/ @Zoranzoki21 showed the problem of downloading a large archive to use reflinks.py.
It is therefore proposed to create a web server, for which reflinks.py could interact by API.

Event Timeline

Framawiki created this task.Dec 4 2017, 6:58 PM
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptDec 4 2017, 6:58 PM
Kizule added a comment.EditedDec 4 2017, 7:00 PM

In each instalation, to user can use reflinks script, have to have this file in main folder.. I support idea for creating web server, with which reflinks.py will interact by API. Than user(s) will not need to every time download this file.

Kizule added a comment.Dec 4 2017, 7:07 PM

And too, file is not updated much years. Last update of file is 2007.

Isn't the dead link scanning now done by IABot?

What is reflinks?

Kizule added a comment.EditedDec 4 2017, 8:51 PM

Isn't the dead link scanning now done by IABot?

But why reflinks can not work without this file than?

Change 395094 had a related patch set uploaded (by Zoranzoki21; owner: jenkins-bot):
[pywikibot/core@master] Disable needing text file for running reflinks.py script

https://gerrit.wikimedia.org/r/395094

Change 395094 abandoned by Zoranzoki21:
Disable needing text file for running reflinks.py script

Reason:
What happened?

https://gerrit.wikimedia.org/r/395094

@Zoranzoki21, @zhuyifei1999, @Cyberpower678: reflinks.py is a Python script, which goes through bare links in a wiki page (pages) and finds out more details (page title, mime type). This script uses a pre-generated text file containing 404 links (gathered from wiki pages) to avoid marking temporarily unaccessible links as dead links. The text file could be downloaded from script-author's webpage, but it is old and unmaintained there and contains only enwiki articles. It can also be newly created by yourself using Python script weblinkchecker.py, but it takes a week to create this list (in order to eliminate temporarily unaccessible links). Currently we are looking for a better solution.

Change 395095 had a related patch set uploaded (by Zoranzoki21; owner: Zoranzoki21):
[pywikibot/core@master] reflinks.py: Disable needing 404-links.txt for running script

https://gerrit.wikimedia.org/r/395095

Kizule added a comment.EditedDec 4 2017, 9:20 PM

@Zoranzoki21, @zhuyifei1999, @Cyberpower678: reflinks.py is a Python script, which goes through bare links in a wiki page (pages) and finds out more details (page title, mime type). This script uses a pre-generated text file containing 404 links (gathered from wiki pages) to avoid marking temporarily unaccessible links as dead links. The text file could be downloaded from script-author's webpage, but it is old and unmaintained there and contains only enwiki articles. It can also be newly created by yourself using Python script weblinkchecker.py, but it takes a week to create this list (in order to eliminate temporarily unaccessible links). Currently we are looking for a better solution.

I thinking on one solution. To I run weblinkchecker from toolforge. And then we will got clean updated file.

I will create file with deadlinks from srwiki. I started: http://prntscr.com/hj4pjq

You know I have a container web script that can run checks on an array of links. I can set it up.

You know I have a container web script that can run checks on an array of links. I can set it up.

I know.

You know I have a container web script that can run checks on an array of links. I can set it up.

Tommorrow I will start rfc on srwiki to enable IAB.

Change 395095 abandoned by Zoranzoki21:
reflinks.py: Disable needing 404-links.txt for running script

https://gerrit.wikimedia.org/r/395095

Kizule added a comment.Dec 5 2017, 8:13 PM

@Cyberpower678 I started rfc on srwiki for enabling InternetArchiveBot there.

Kizule added a comment.Dec 5 2017, 8:14 PM

I will for 7 days, if is all ok, create task for it.

Xqt triaged this task as Lowest priority.Dec 6 2017, 8:16 AM