User Details
- User Since
- Apr 10 2024, 10:52 AM (48 w, 3 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Arcorann [ Global Accounts ]
Dec 10 2024
Oct 4 2024
For Wikisource, use the DjVu option "from original scans (JP2)" instead. This is currently preferred to uploading as PDF due to the various issues mentioned by me in T363619.
Aug 26 2024
On the comment "the original PDFs can be uploaded directly", currently there are enough issues with our handling of PDFs (notably bad text layer extraction -- see T242169 -- and bad thumbnail generation -- see e.g. T224355 and linked issues, also note the related issue T339845) that DjVu is still being recommended over PDF on enWS.
May 29 2024
While we're here, can we also implement something that doesn't have so many image issues when thumbnailing PDFs? Having run into yet another issue when proofreading for Wikisource in which the text somehow just fails to render on the image (https://commons.wikimedia.org/w/index.php?title=File%3AThe_sayings_of_Confucius%3B_a_new_translation_of_the_greater_part_of_the_Confucian_analects_(IA_sayingsofconfuci00confiala).pdf&page=28), only to then go to the original PDF hosted on Commons and easily read off the text from there, after having to do the same for a number of other works due to blurring of text, I really think we can do a lot better than what we have now.
Apr 16 2024
I'd like to add a point to the "overhead" comments -- while EditInSequence is intended to reduce much of that overhead, it's currently hampered by several bugs when using it to create pages (notably T340986, where the text layer doesn't appear, forcing the editor to OCR manually, and to a lesser extent T360282 where the index header/footer isn't loaded). I suspect caching OCR would be a nice option to add to EditInSequence, but these bugs ought to be fixed first IMO.