Page MenuHomePhabricator

Pywikibot client to load ISBN related data into Wikidata
Open, Needs TriagePublicFeature

Description

I have been writing a Pywikibot client to load ISBN related data into Wikidata, that is already functional from a Linux shell.

I thought it might be interesting to evaluate, test, or enhance it during the Wikimedia 2022 Hackathon?

Source code:

Prerequisite: a working Pywikibot environment.

Event Timeline

Xqt changed the subtype of this task from "Task" to "Feature Request".

@Geertivp: is your intention that your scrips becomes part of the Pywikibot framework. How can I support you?

If you believe my script is good enough, I would be delighted that you assist me?

Are you able to commit it to the Pywikibot repository on gerrit or do you need any “how to” support for it?

I have an account on Gerrit, but I have never used it... if you could provide me with a "next step", this would be great... Thanks for your support.

I have listed my proposed Pywikibot command line tool at https://wikimania.wikimedia.org/wiki/Hackathon/Showcase#Pywikibot_client_to_load_ISBN_related_data_into_Wikidata

There are two manuals for contributing code:

There is a Gerrit Patch Uploader tool which enables uploading code without git repository or gerrit knowledge

I can also upload the code from your github repository if it's ok for you

Thanks, @Xqt. I believe I might use https://gerrit-patch-uploader.toolforge.org, as a first step? In this case I do not need to install additional tools and learn complicated commands? I might later use the more advanced tools, once I get more experience?

Could you please let me know which project I should choose? Maybe: pywikibot/bots ?

pywikibot/core is the right repository. If the patch is uploaded every further edit can be made via the gerrit web interface. The patch itself must be located inside scripts directory

Change 826631 had a related patch set uploaded (by Xqt; author: Geertivp):

[pywikibot/core@master] Pywikibot client to load ISBN related data into Wikidata

https://gerrit.wikimedia.org/r/826631

Thank for uploading. I have moved it to the scrips folder and added the phab task. The patch is also editable through the gerrit web interface.

Change 826631 merged by Xqt:

[pywikibot/core@master] Pywikibot client to load ISBN related data into Wikidata

https://gerrit.wikimedia.org/r/826631

Reopened for some issues to solve, questions to be answered, problems found etc.

@Geertivp: working on your script I found the following issues:

  1. there are related phabricator task described in the documentation (T282719, T214802, T208134, T138911, T20814). In which way are they related? Are they solved with your script or can they be solved with your script?
  2. what is the meaning/function of mainlang global variable (default description language)?
  3. shouldn't we move the "Known problems" and "To do" list from scripts documentation to a phabricator task (e.g. this one) that is is to be solved?
  4. Are all of the "Documentation" links related to this script? For example https://doc.wikimedia.org/pywikibot/master/ is related for the whole framework and I do not see any reason to mention it explicitly within the script documentation.
  5. What about running the script under Windows platform? Why is it necessary to notify those environment restrictions. Aren't they identical for all scripts and can be placed inside the documentation in a more general context?
  6. your code is under GNU v3.0 Licence. Pywikibot is licensend under MIT and the credits are made as a whole for each script or module like (C) Pywikibot team, 2022, Distributed under the terms of the MIT license. Authors are published in the AUTORS.rst file. Are you able to change the license of your contribution? This is not mandatory and it is ok keeping GNU but it needs to change the license documentation of the framework on several places which has to be done then.

Any comments?

  1. I believe my script is actually solving T282719, and I have been taking care to implement the required data quality rules (only loading validated data, not creating duplicate statements). The other tasks are loosely related to ISBN, and could be omitted from the script, when you feel the need.
  2. The mainlang global variable is indeed the default description language, when the ISBN digital libraries would not return a language value. It determines in which language the label is written when creating a new instance. Therefore it is important to group ISBN numbers by language when exececuting the script, to ensure creating the label in the correct language. It is also used to search for items, and for displaying properties and items in the user language.
    • I would propose that you reinstate the original code line inputfile = sys.stdin.read(). This allowed to run the script for 1000s of ISBN numbers, when needed, on multiple lines (e.g. the full references section of any Wikipedia page containing ISBN numbers, via regex).
    • You changed it into: inputfile = pywikibot.input('Get list of item numbers'), which basically allows to process only 1 single ISBN number (one single input line), which I don't find a good solution...
  3. I am currently working on "Known problems" like ensuring the inverse relationship between "Written work" and "Edition", making sure that the "Is a written work" and "Edition" statements exists at the level of Written work.
    • Other known problems can't be solved by the script, because they are caused by external, or complex internal data quality problems, and should stay. Examples: a Publisher is not found, because the statement "Is a publisher" was not assigned to its item.
    • Some other known problems could be moved to "To do", when they would require additional development.
    • I would personally want to add here another interesting functionality: "Implement a webservice on toolforge.org, based on the current shell script", accepting input from a textbox instead of from stdin.
  4. The documentation links could be split into specific or general documentation (the general, non script specific documentation could also be removed -- I included it for myself to easily find the documentation).
  5. The environment restrictions are indeed common for all Pywikibot scripts, so could be left out of the script.
  6. Since I am still the only author at https://github.com/geertivp/Pywikibot I have changed the license to MIT. So you can change the license to MIT in the source code as well.

Seems that MIT, as opposed to GPL, allows more freedom for commercial implementations, does not require publishing the code, and does not enforce keeping the same license.

I will synchronise your and my changes in a future pull request. Thanks a lot for all your good work, and code review.