Thu, Mar 4
Fri, Feb 19
Jan 22 2021
Why is MOBI described as being for Calibre? I would've thought EPUB would be closer to Calibre's "default" format.
Jan 21 2021
Jan 20 2021
@Yash9265 wrote a change that removes navigation links when transcluding Special:IndexPages. I believe this problem is now solved.
I believe we want to display the export links only in namespaces where ProofreadPage allows the transclusion to happen and displays its navigation links.
For now it's only the main namespace but some Wikisource requested to be able to use it in other namespace (T53980).
Possible way to fix both problems at the same time is maybe create a per-wiki config inside of ProofreadPage and reuse it for the Wsexport link.
Jan 15 2021
We don't show the links on https://wikisource.org. I don't know if this matters. I don't know if users use that site.
Jan 13 2021
The change adding "thai" numerals support is now deployed on Wikisource.
Sorry, I confused this task with an other one.
My 2 cents: I like the idea of using Calibre ePub to ePub to do the file split in case of too big files. The current implementation in Wsexport is very bad. An other option would be to fix it inside of Wsexport by having a look of what Calibre is actually doing internally.
Jan 10 2021
So, the question is: Can the ProofreadPage extension be smarter about where the pagenum template is inserted to avoid putting it in "dead" table space?
Dec 16 2020
Dec 2 2020
Thank you! It looks like a great plan!
Nov 28 2020
I just had a look at the logs. The OPDS update cron job started to fail in September because of out of memory errors.
I just wrote a patch that should make the addition of all numbering system supported by CLDR/ICU easy: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/644010
Hi! It's definitely doable.
Oct 27 2020
Oct 21 2020
Yes, I believe it would be better.
Oct 15 2020
… then I probably still don't get how the scary output shown in T263371 was created. If this can not happen in production, why is there a ticket?
@thiemowmde T263371 is indeed an XSS attack vector if you output directly the file content in HTML. But ProofreadPage only uses the file content to prefill a Wikitext content area when a Page: page is created. So, I guess there are no extra threat here compared with just allowing anyone to edit the wiki.
Probably caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/628753
This seems to be indeed the problem cause. I have submitted a revert for review: https://gerrit.wikimedia.org/r/634105
Oct 9 2020
@Tderrick Sadly, the Wikisource transcription system does not keep word coordinates. There might be something to do by trying to match Wikisource transcription with an OCR with coordinates to attempt to fix the OCR with the transcription but it's not an easy task at all.
ALTO XML seems to be an XML format designed for OCR output. It encodes the text positioning data that we do not keep in Wikitext. It's closer to the DjVu OCR format.
Oct 7 2020
Oct 5 2020
Oct 4 2020
Oct 3 2020
Oct 2 2020
Aug 1 2020
It's strange. There should not be any cache on this page.
The special page we are talking about is Special:IndexPages. To display something you need to have already created some pages in the "Index:" namespace.
The code of this special page is in includes/Special/SpecialProofreadPages.php.php.
Jul 24 2020
As far as I’m aware, the real URL in RDF is more like http://commons.wikimedia.org/wiki/Special:FilePath/Leon%20Cogniet%20-%20Jean-Francois%20Champollion.jpg – the query service UI rewrites it to the file description URL (/wiki/File:) on display.
I also don’t understand your first example – is sdoc:P18 meant to be something like sdoc:M123 instead?
Sorry for this problem.
I guess that Wikidata concept URIs are using http:// because it is what is usually done by RDF datasets (DBpedia...), mostly for backward compatibility reasons.
I would be slightly in favor of using http:// URI for Commons entities in order to have all Wikibase entities and relations using http:// instead of having some with http:// and some with https;//.
Jul 21 2020
Jul 16 2020
@Samwilson Yes, it's exactly what I mean. This way, we don't need to install all locales in the Wsexport servers.
Jul 14 2020
It's a great idea. Thank you!
Some relevant links:
The PHP Intl extension is now much more common than it was in 2012. It might be relevant to use here the IntlDateFormatter class that allows to easily fix this problem.
I believe that the credit list is not a "hard" requirement. For example, the common pattern for citing Wikipedia articles is only to link to the Wikipedia page and stating that the author lists could be found here. And Wikisource contributors have a much weaker authorship relation to the content than the Wikipedia contributors. So, I guess that a rewording of the credits page might to the job (but I'm not a lawyer...).
Jul 13 2020
Jul 9 2020
It should work now. The lighttpd configuration was not updated for the ToolsForge URL change.
Jun 28 2020
Jun 11 2020
Since yesterday, if a Wikisource page is connected to a Wikidata item that states that it is an edition of the work using P629, the sitelinks of the work item are displayed on the page in the "In other languages" sidebar.
The next step is to look for the other editions on Wikidata using P747.
The changes have been merged last week
Jun 6 2020
May 31 2020
@Mahastama I have been bold and just created a beginning of script here: https://id.wikisource.org/wiki/Pengguna:Mahastama/OCR.js
I have put it in one of your user subpage to allow you to edit it. I hope it is fine for you.
It adds an "OCR" button to the page pages and calls the Trawaca API just like you presented.
However, currently the Trawaca API fails with an "authorization" error.
May you or one of the OCR developer have a look at it?
To reproduce it, you have to load the script (just like explained by @Xover, and try to run the OCR on any Page: page.
Apr 30 2020
I just tried to add recursiveTagParseFully and made a quick test.
Apr 29 2020
I have made an attempt to have one hook for sitelinks and other project sidebar here: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ProofreadPage/+/574224/
Mar 25 2020
Mar 19 2020
Old session. Let's close this task
Mar 3 2020
Fixed according to @beleg_tal
Mar 1 2020
Thank you for investigating. I'm going to fix this problem.
Feb 26 2020
Feb 14 2020
@Xover. Indeed, I should have read the code more carefully. The existing code seems to trim the line jumps at the end of the header and the line jumps at the end of the footer does not seem to be considered by the parser. So the change should not break anything hopefully. Sorry for the noise.
@WMDE-Fisch @awight a significant set of users in Wikisource introduces lines jump at the beginning of the Page: pages body to introduce or not the new paragraph identation. This behavior has been broken for a small amount of time in the past and a significant amount of editors complained. Maybe removing line jumps is the good way to go, but this needs some thought in order to at least get an idea of the number of affected pages.
Feb 11 2020
I have upgraded Calibre to 3.48 (version in Buster backports) on both wsexport-prod01.wikisource.eqiad.wmflabs and wsexport-dev01.wikisource.eqiad.wmflabs.
@Samwilson Only wikisource-dev01 was running calibre 3.48 from backports. I just upgraded wikisource-prod01 to the latest backport Calibre version.
Feb 5 2020
@Xover Thank you for the ping and the great summary for the problem.
A minor detail: ns250 is used by ProofreadPage only on the new wikis. Sadly, older wikis uses different ids for historical reasons. But for all wikis with ProofreadPage installed the canonical name of the Page: namespace is "Page".
Feb 4 2020
@MusikAnimal Thank you!
Jan 20 2020
@Urbanecm Thank you cat /home/tpt/two_factor_reset on ToolsForge bastion should display T243240
Jan 17 2020
T242517 seems to have fixed the problem. I believe that we could close this task. I don't see the point for ProofreadPage to have a workaround for this bug that is now solved and affects other parts of the MediaWiki platform.
Jan 16 2020
Note that wsexport did used to be on its own VPS, at wsexport.wmflabs.org. I can't find anything about why it was moved. I guess it was just easier to maintain (and maybe some required packages became available on toolforge?).
Indeed, the extension does not allow to create pages directly with the "validated" level.
Jan 7 2020
Thank you for the bug report! It should be fixed indeed.
Dec 19 2019
After spending an hour reading phe tools code and reading the tool logs, I believe I got an idea of the cause of the issue.
Dec 13 2019
Dec 12 2019
Deployed on all Wikisources \o/.
Thank you! Seems to work fine on enwikisource https://en.wikisource.org/wiki/The_Wind_in_the_Willows_(1913)