Page MenuHomePhabricator

Wikisource Export: Remove dependency on phetools/credits.py
Closed, ResolvedPublic5 Estimated Story Points

Description

Background: As a follow-up to T257543, CommTech will look into what it would take to remove WSExport's reliance on the tool that it uses to create books' lists of contributors. This could involve moving this logic into WSExport, or perhaps replacing the credits list with a link to the book on Wikisource, or maybe a dynamic credits list produced by WSExport. This would have the advantage of improving the speed of book generation (fewer API requests etc.). What are the legal implications of the credits list (i.e. is it a hard requirement)?

Acceptance Criteria:

  • Remove dependency on phetools/credits.py

Event Timeline

Restricted Application added a subscriber: โ€ข Aklapper. ยท View Herald Transcript

I believe that the credit list is not a "hard" requirement. For example, the common pattern for citing Wikipedia articles is only to link to the Wikipedia page and stating that the author lists could be found here. And Wikisource contributors have a much weaker authorship relation to the content than the Wikipedia contributors. So, I guess that a rewording of the credits page might to the job (but I'm not a lawyer...).

However this contributor list was by far the top improvement request when I made the first version of WSExport. There were at this time (2012) a very strong request for it from the Wikisource community. It's why it have been introduced even if it is quite costly on the implementation side.

Yes, definitely, good point. I don't really think we should remove the contributors list; it's not all that costly, and worth having if people do like it. I mention it only as one possibility. I think we could maybe improve the querying performance โ€“ even switch to querying the DB directly? I'm not sure. But this is mostly about reducing dependencies.

ifried renamed this task from Remove dependency on phetools/credits.py to Wikisource: Remove dependency on phetools/credits.py.Jul 16 2020, 3:53 PM
ifried updated the task description. (Show Details)

A related issue:

GH182: contributors name not picked properly from proofread page only from transclusion page1
balajijagadesh commented on 10 Jun 2019:
The issue has started again. in the downloaded ebooks, the name of people who have contributed is not coming up in Tamil wikisource. The contributor list is empty.
balajijagadesh commented 15 hours ago
The problem is intermittent. Sometimes it works properly. Sometimes some of the contributors are left out.

โ€ข Aklapper renamed this task from Wikisource: Remove dependency on phetools/credits.py to Wikisource Export: Remove dependency on phetools/credits.py.Nov 2 2020, 9:30 AM
ARamirez_WMF set the point value for this task to 5.Nov 19 2020, 7:10 PM
dom_walden added a subscriber: dom_walden.

In comparison to Phetools, we have a more complete list of credits including people who contributed to subpages, images and transcluded pages (e.g. transcribed pages in the Pages namespace).

I have been comparing the credits list in exported Epubs with my own script which uses various API endpoints.

They mostly match. A few (about 5 in about 130 ebooks in 13 languages) don't match, but I cannot rule out a bug in my own script.

Test Environment: https://ws-export.wmcloud.org version 2.5.1.

ifried added a subscriber: ifried.

This has been released to production, so I'm marking it as Done.