Page MenuHomePhabricator

Xover
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2017, 6:32 PM (218 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Xover [ Global Accounts ]

Recent Activity

Sun, Jun 20

Xover updated subscribers of T104566: Join hyphenated words across pages.

Any chance with this change working on cross-page references?

<ref name="note">hyphen-</ref><ref follow="note">ated</ref> currently outputs as hyphen- ated

Sun, Jun 20, 4:40 PM · Parsoid, MediaWiki-Language-converter, MW-1.32-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), Parsing-Team--ARCHIVED, ProofreadPage, All-and-every-Wikisource
Xover added a comment to T104566: Join hyphenated words across pages.

Hmm. Is this really an issue with LST as such? The example at frWS uses the pseudo-LST ## section name ## syntax provided by a local Gadget, which is what forces a newline. So far as I know, raw <section begin="section name" /> syntax does not force a newline and should work out of the box for this.

Sun, Jun 20, 4:11 PM · Parsoid, MediaWiki-Language-converter, MW-1.32-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), Parsing-Team--ARCHIVED, ProofreadPage, All-and-every-Wikisource
Xover added a comment to T285128: Unnecessary empty space below edit form.

Why Flexbox for this?

Sun, Jun 20, 9:50 AM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), ProofreadPage
Xover added a comment to T285128: Unnecessary empty space below edit form.

Related: T209939

Sun, Jun 20, 9:21 AM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), ProofreadPage

Sat, Jun 12

Xover updated subscribers of T284617: Please upload large files to Wikimedia Commons.

@Catrope @Reedy As someone who has handled similar requests in the past (5+ years ago), can you advice on who one might ping now to get eyes on this request? While I'm sure Mathieu can be persuaded to refresh as needed, the current download links expire in ~3 days and it would be good, if practical, to get the job started before then.

Sat, Jun 12, 7:38 AM · Commons, Wikimedia-Site-requests

Fri, Jun 11

Xover added a comment to T284827: Wikimedia OCR: 500 error with lang "equ".

I’m pretty sure the math pseudo-language is not supported in Tesseract 4.x.

Fri, Jun 11, 3:47 PM · Community-Tech (CommTech-Sprint-2), Wikimedia OCR

Sun, Jun 6

Xover added a watcher for css-sanitizer: Xover.
Sun, Jun 6, 2:18 PM
Xover added a watcher for TemplateStyles: Xover.
Sun, Jun 6, 2:17 PM
Xover updated subscribers of T200632: Allow template parameters to provide CSS to a templatestyles stylesheet.

That particular example wouldn't work in MediaWiki because MW forbids "url(" in inline style attributes, but css-sanitizer and TemplateStyles shouldn't rely on that.

Sun, Jun 6, 7:44 AM · css-sanitizer, TemplateStyles

May 15 2021

Xover updated subscribers of T282530: Wikisource Indonesia OCR: Not working so well.

@Mnafisalmukhdi1 The idWS OCR Gadget is outdated. You need to get a local interface administrator to apply this diff, or you could just cross-load OCR.js from Multilingual Wikisource. The only current interface admin I see on idWS is @Rachmat04, but if they are unavailable you can probably request assistance from the global interface admins by making a request on meta.

May 15 2021, 1:46 PM · Community-Tech, Wikimedia OCR, All-and-every-Wikisource
Xover added a comment to T282892: Bizarre collision between index/style.css and incomplete table notation.

Actually, this appears to affect headings, and unordered lists (the : markup that is generally used to offset text) also. :)

May 15 2021, 1:18 PM · MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), ProofreadPage, All-and-every-Wikisource
Xover added a comment to T28741: Migrate file tables to a modern layout (image/oldimage; file/file_revision; add primary keys).

I don't think that pulling the original every time is desirable, it would cause a lot of unnecessary internal network traffic. Some of those documents are in the hundreds of MB. … I also don't know how that metadata can be distributed inside PDFs, a format famous for having a lot of ways things can be done.

May 15 2021, 12:57 PM · Platform Engineering Roadmap, Patch-For-Review, Commons, Multimedia, Schema-change, MediaWiki-File-management
Xover added a comment to T275268: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata.

Just to note, treating the OCR text layer as metadata is conceptually a bit awkward: it is a separate representation of the file, and that happens to be automatically generated. It's more akin to a MIME email message that contains both a text/html representation and a text/plain representation. Still speaking conceptually, metadata about the text layer would be stuff like "Does this file have a text layer?" and "What format/text encoding is the text layer using?" and "Is the text layer for this file in a structured format?" and "What is the size in bytes of the text layer for this file?".

May 15 2021, 12:37 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Patch-For-Review, DBA, Commons
Xover updated subscribers of T282892: Bizarre collision between index/style.css and incomplete table notation.

T253072 strikes again!

May 15 2021, 9:39 AM · MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), ProofreadPage, All-and-every-Wikisource

May 2 2021

Xover added a comment to T163098: Fix the Watchlist visual layout.

@Jdlrobson The screenshots in the description are using the then-default view (the RC filters was still a beta feature, or needed explicit opt-in, until 2018-ish as I recall; T157642, maybe?), but it doesn't matter which variant you use. The problem is equally evident in the screenshot you provided:

May 2 2021, 7:52 PM · Readers-Web-Backlog (Tracking), Growth-Team-Filtering, MediaWiki-Watchlist, Growth-Team, MediaWiki-Interface

Apr 30 2021

Xover added a comment to T280848: Implement MVP of OCR in Wikisource extension.

Down for this approach-- or do we have any other loading state elements in our design components library that we could re-use?

Sounds like Nicolas will figure out the details here. For the first patch, I'll stick with only disabling the textarea.

Apr 30 2021, 6:05 AM · Community-Tech (CommTech-Sprint-2), MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Patch-For-Review, Wikimedia OCR, All-and-every-Wikisource
Xover added a comment to T281129: Wikimedia OCR: "Call to a member function getText() on null" when image has no text.

Getting an empty string back for an image that contains no recognizable text is not an error, that's just returning the correct output. There are any number of reasons people might ask for OCR of an image that would return no text: there is text but the OCR engine fails to recognize it; they are doing a page image in a sequence and hitting the OCR button by habit (or their gadget does more than just load the OCR); they have a gadget that automatically requests the OCR on page load; etc. And in a bulk OCR scenario it will be entirely normal for the sequence of images being processed to contain anything from a few to several tens of blank pages.

Apr 30 2021, 5:48 AM · Community-Tech (CommTech-Sprint-1), Wikimedia OCR

Apr 29 2021

Xover added a comment to T41510: Opening Special:EditWatchlist with a large watchlist hits server timeout (Create watchlist pager).

I just hit (what I think is) this issue with a ~30k page watchlist on enWS. The error message now (9 years after first report) looks like:

Apr 29 2021, 6:44 PM · Growth-Team-Filtering, affects-translatewiki.net, User-kostajh, Platform Engineering, Growth-Team, User-notice, Wikimedia-production-error, MediaWiki-Watchlist
Xover added a watcher for tech-decision-forum: Xover.
Apr 29 2021, 3:59 AM
Xover added a comment to T196878: FieldLayout: Remove need for extra `fieldLayout-header` element.

Going by the comments on the patch in Gerrit, isn't the actual state of this task "Declined"?

Apr 29 2021, 3:54 AM · Patch-For-Review, OOUI

Apr 26 2021

Xover added a comment to T278443: Wikisource OCR: fix issue with lines being formatted incorrectly.

Related: T230415 and T279019

Apr 26 2021, 10:45 AM · Wikimedia OCR, All-and-every-Wikisource, Community-Tech

Apr 24 2021

Xover added a comment to T270743: How to support templated fonts like Blackletter?.

Ping. It would be useful to get an idea of what would be involved in making this work, and whether there are any on-wiki workarounds or fixes that could be made.

Apr 24 2021, 11:58 AM · Community-Tech, WS Export
Xover added a comment to T259687: Wikisource Export: Remove support for Wikilivres.

People involved with the project have been saying on-wiki that it is actually dead (the front page is just a zombie) for going on a year now, and the iw prefix is on a todo for removal on enWS along with deleting and/or deprecating all the related templates and references to it. Linking to it was also always iffy legally speaking due to the concept of linking as contributory copyright infringement (with which concept one may disagree, but which courts and legislators appear entirely at ease with). I think the relevant support not only could but /should/ go.

Apr 24 2021, 11:54 AM · Community-Tech, WS Export

Apr 17 2021

Xover created T280448: Pages with non-wikitext content model are put into Pages using duplicate arguments in template calls.
Apr 17 2021, 3:48 PM · MediaWiki-Categories, MediaWiki-Parser

Apr 8 2021

Xover added a comment to T269628: Wikisource: investigate what data we can collect on OCR tools & potential instrumentation.

@ldelench_wmf That page is counting pages that have been marked as "Proofread" or "Validated"—using the radioboxes the Proofread Page extension adds to the edit form—as a result of a manual transcription, that may or may not have used OCR text from one of several different possible sources as a starting point. It does not directly measure anything related to OCR (but could, of course, conceivably provide an indirect measure).

Apr 8 2021, 4:28 PM · Community-Tech, Wikimedia OCR
Xover added a comment to T269518: IA Upload: Permit duplicate IA identifier if of a different format.

Do we want to allow duplicates of the same format?

Apr 8 2021, 12:27 PM · IA Upload, Community-Tech

Apr 5 2021

Xover added a comment to T275100: Change IA upload disallow on duplicate to a challenge.

The very simplest way would be to just change pageForIAItem() to always return an empty string.

Apr 5 2021, 10:38 AM · Community-Tech, IA Upload, Internet-Archive
Xover added a comment to T279118: Wikisource OCR: add support for tesseract on wikimedia ocr .

@Samwilson A couple of thoughts on skimming (and I do mean skimming) the diff…

Apr 5 2021, 9:43 AM · Community-Tech (CommTech-Sprint-1), Wikimedia OCR, All-and-every-Wikisource

Apr 1 2021

Xover updated subscribers of T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki.

In fact, looking at the code in SpecialFileDuplicateSearch.php it looks like querying for Commons media isn't particularly more complicated than local media when inside core, and T175088 suggests Special:ListDuplicatedFiles should be on the monthly "expensive query pages" cron job in any case. In that context, is there any particular reason SpecialListDuplicatedFiles.php for a given project couldn't do a (very specialised version of a) cross-wiki join itself and stuff the results in a category?

Apr 1 2021, 5:36 PM · Data-Services
Xover added a comment to T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki.

The file usage section on file pages lists duplicates, including from Commons. However, there is no way to find these since Special:ListDuplicatedFiles only lists local duplicates.

Apr 1 2021, 4:44 PM · Data-Services
Xover added a comment to T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki.

I'm sure there are other uses for the functionality described here, but…

Apr 1 2021, 1:32 PM · Data-Services
Xover added a comment to T277768: Wikisource: Investigate adding support for bulk OCR to Wikimedia OCR [16H].

I think the current OCR tool will read ahead in the current file and OCR the other pages in the background and cache the results, on the assumption that if you want one, you or others will want more. But I'm not sure how far ahead it goes.

Apr 1 2021, 10:35 AM · Community-Tech (CommTech-Sprint-1), Wikimedia OCR, All-and-every-Wikisource
Xover added a comment to T277768: Wikisource: Investigate adding support for bulk OCR to Wikimedia OCR [16H].
  • it's not possible to add the text layer to the PDF/DjVu/etc.
Apr 1 2021, 9:14 AM · Community-Tech (CommTech-Sprint-1), Wikimedia OCR, All-and-every-Wikisource

Mar 31 2021

Xover added a comment to T278623: Create a Section for Numerically Sequencing Images on Index ns.

For your conversion of Lippincot's v45 from Hathi, you can do a lot better:

Mar 31 2021, 6:46 PM · ProofreadPage
Xover added a comment to T278623: Create a Section for Numerically Sequencing Images on Index ns.

There are multiple issues with PDF.

Mar 31 2021, 5:19 AM · ProofreadPage
Xover added a comment to T278443: Wikisource OCR: fix issue with lines being formatted incorrectly.

As Peter says, this needs some form of configurability and probably at the per-user level. English Wikisource generally unwraps lines, but even there there are users who rely on hard linebreaks when proofreading. OCR is also imperfect at detecting page features, so for some scans automatic unwrapping will end up going to the opposite extreme (all text in one big lump with no line breaks).

Mar 31 2021, 4:47 AM · Wikimedia OCR, All-and-every-Wikisource, Community-Tech

Mar 28 2021

Xover added a comment to T278104: Unable to upload to Commons: uploadstash-file-not-found: Key "187kyl5ozj74.xtav8j.51508.djvu" not found in stash.

@Aklapper I'm not entirely steady on the projects/components and their scope, so apologies if I'm hopelessly confused, but looking at the descriptions for them I would say this task falls under MediaWiki-Uploading and UploadWizard? Or is this obviously pinpointed somewhere down in the Swift part of the stack? And maybe UploadWizard is excluded since this happens via API upload too?

Mar 28 2021, 5:55 PM · SRE-swift-storage, User-Inductiveload

Mar 22 2021

Xover added a comment to T278104: Unable to upload to Commons: uploadstash-file-not-found: Key "187kyl5ozj74.xtav8j.51508.djvu" not found in stash.

Possibly related: T254459

Mar 22 2021, 5:30 PM · SRE-swift-storage, User-Inductiveload
Xover added a comment to T278104: Unable to upload to Commons: uploadstash-file-not-found: Key "187kyl5ozj74.xtav8j.51508.djvu" not found in stash.

Ok, testing the >100MB file locally on enWS (I think most of the relevant bits of the stack are the same as for Commons), bigChunkedUpload.js tells me "Upload is stuck" for every single chunk (32 x 20MB chunks) but then seems to recover. After the last chunk hits 100% it tells me "Server error 0 after uploading chunk:" (I think this is an empty response from the server). After waiting and retrying a couple more times it terminates with the message "FAILED: internal_api_error_DBQueryError: [91f56af6-cec2-4969-938f-3aeaf9f35aff] Caught exception of type Wikimedia\Rdbms\DBQueryError" which I'm pretty certain is coming from somewhere inside MW proper rather than from Rillke's code.

Mar 22 2021, 5:06 PM · SRE-swift-storage, User-Inductiveload
Xover added a comment to T278104: Unable to upload to Commons: uploadstash-file-not-found: Key "187kyl5ozj74.xtav8j.51508.djvu" not found in stash.

I've successfully uploaded several <100MB files in the time period. The one >100MB file I've tried fails (I've been blindly trying different things so exact failure symptoms are a bit vague). All uploads with bigChunkedUpload.js with stash/async deselected.

Mar 22 2021, 2:07 PM · SRE-swift-storage, User-Inductiveload

Mar 16 2021

Xover added a comment to T276672: WS Export: Create separate credits page that can be viewed by everyone.

Random, possibly not useful or relevant, thought: there's an effort somewhere to tighten the privacy policy in such a way that IP addresses are no longer visible (not even to Checkusers). IPs are also not very useful as an entry in a "Contributors to this book" list. Perhaps both issues could be addressed by grouping all logged-out contributions at the end as "…, and n anonymous contributors."?

Mar 16 2021, 9:23 AM · Community-Tech, WS Export
Xover added a comment to T274959: Wikisource: Create option to disable credits in WSExport form.

Credits by default may be playing it safe, but does the risk really justify that much caution?

Mar 16 2021, 9:19 AM · Community-Tech (Kanban-2020-21-Q3), WS Export, All-and-every-Wikisource
Xover added a comment to T277435: Include copyright metadata based on Wikidata P6216.

Hmm. Does it actually need to be machine-readable? I would have thought what was wanted was a way to just identify the license template output so that it could be rendered in the appropriate place, but otherwise just use the on-wiki rendered template. Structured data is nice for all sorts of other reasons, but for this purpose I would think a simple CSS class would be sufficient; or possibly an ID in order to ensure there is only one container for license information.

Mar 16 2021, 9:07 AM · Community-Tech, WS Export

Mar 6 2021

Xover added a comment to T274959: Wikisource: Create option to disable credits in WSExport form.

The Wikisourcen (unlike Wikipedia) do not create original content that attracts a copyright.

Mar 6 2021, 10:51 AM · Community-Tech (Kanban-2020-21-Q3), WS Export, All-and-every-Wikisource
Xover added a comment to T274959: Wikisource: Create option to disable credits in WSExport form.

… we won’t be showing it to most downloaders.

Mar 6 2021, 10:43 AM · Community-Tech (Kanban-2020-21-Q3), WS Export, All-and-every-Wikisource

Mar 4 2021

Xover added a comment to T274959: Wikisource: Create option to disable credits in WSExport form.

@Prtksxna The Wikisourcen (unlike Wikipedia) do not create original content that attracts a copyright. They merely (mechanically) reproduce public domain or already-freely-licensed works. The standard licensing terms under the edit form are for contributions outside the content namespaces (Scriptorium, User pages, Talk, etc.). Thus the only relevant licensing information is the one for the work itself, much as the licensing for a media file on Commons.

Mar 4 2021, 6:45 PM · Community-Tech (Kanban-2020-21-Q3), WS Export, All-and-every-Wikisource
Xover added a comment to T273708: Don't show download button on subpages, and opt-out for top-level pages.

Not very good idea. There are works like encyclopedias or periodicals with thousands of subpages.
This solutions would need to add magic word to every subpage.

Mar 4 2021, 7:41 AM · All-and-every-Wikisource, WS Export, Community-Tech

Mar 2 2021

Xover added a comment to T271710: Allow sanitized CSS subpages in the Index namespace of Wikisource.

Should the config change be a separate task for Site-Requests to be visible on the board?

Mar 2 2021, 4:58 PM · MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), All-and-every-Wikisource, TemplateStyles, ProofreadPage
Xover added a watcher for User-Inductiveload: Xover.
Mar 2 2021, 8:37 AM

Feb 27 2021

Xover added a comment to T43614: ProofreadPage does not use image's full resolution when zooming in.

Hmm. As I recall, PRP uses a hard 1024px size for the "thumbnail" it requests. I am assuming this was a value picked as a sort of compromise between full fidelity to the user and various optimization concerns.

Feb 27 2021, 9:23 AM · All-and-every-Wikisource, ProofreadPage
Xover added a comment to T265219: Wikisource: Internet Archive Upload Fail.

Hmm. Based on this and a few other recent failures, I'm starting to wonder if php-exec-command (which is the Command::exec(); wrapper ia-upload is using to execute binaries) is broken and returning "Command not found" for any non-zero exit status.

Feb 27 2021, 9:10 AM · All-and-every-Wikisource, IA Upload
Xover added a project to T275912: Create an Importer for Distributed Proofreaders (pgdp.net) for Wikisource: All-and-every-Wikisource.
Feb 27 2021, 7:55 AM · All-and-every-Wikisource, importbots

Feb 26 2021

Xover renamed T275735: Change api cache ttl to be an .env var from Change api cache ttl to be an .evn var to Change api cache ttl to be an .env var.
Feb 26 2021, 10:34 AM · Community-Tech (Kanban-2020-21-Q3), WS Export

Feb 25 2021

Xover added a comment to T101075: Do not save unused (or deliberately removed) suggested parameters when inserting or editing transclusions.

… there's a difference in wikitext between an empty parameter and a not-provided-at-all parameter, …

Feb 25 2021, 8:44 AM · Skipped QA, User-Ryasmeen, MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), Editing-team (FY2020-21 Kanban Board), VisualEditor-MediaWiki-Templates, VisualEditor

Feb 19 2021

Xover added a comment to T257066: Extension:Score / Lilypond is disabled on all wikis.

There is some progress being made on various protected tasks, …

Feb 19 2021, 4:06 PM · MW-1.36-notes (1.36.0-wmf.26; 2021-01-12), User-notice, Security-Team, Security, Wikimedia-General-or-Unknown, MediaWiki-extensions-Score, SRE
Xover added a comment to T257066: Extension:Score / Lilypond is disabled on all wikis.

So… we're currently waiting for a suitable volunteer to materialize out of thin air to address an issue whose details are not public for security reasons? And in the mean time we have many thousand broken pages across multiple projects and all we can do is bleed contributors in those areas?

Feb 19 2021, 1:21 PM · MW-1.36-notes (1.36.0-wmf.26; 2021-01-12), User-notice, Security-Team, Security, Wikimedia-General-or-Unknown, MediaWiki-extensions-Score, SRE

Feb 11 2021

Xover added a comment to T274495: Genericize language on the Wikisource download button to remove specific models of tablet.

Absent specific proposals for better wording

I gave a proposal.

Feb 11 2021, 1:06 PM · Community-Tech, WS Export
Xover added a comment to T274495: Genericize language on the Wikisource download button to remove specific models of tablet.

Absent specific proposals for better wording I think the status quo works well enough. Far from every ebook user has any conception of file formats, much less any idea what kind is best for their device, so giving them enough information suited to their frame of reference to make a sensible choice is a priority.

Feb 11 2021, 12:37 PM · Community-Tech, WS Export

Feb 6 2021

Xover added a comment to T274027: WS Export: Don't show sidebar links in Page and Index namespaces.

Let me throw an extra angel on the head of this needle: a user might conceivably want to export a work when currently on a wikipage in these namespaces, and a user might conceivably want to export a single page, as defined by a Page: wikipage, of a work.

Feb 6 2021, 8:57 AM · Community-Tech, WS Export
Xover added a comment to T269726: Make 'pdf' format an alias for 'pdf-a5'.

I think that for any inherently paged format (like PDF), print should be a primary concern. For everything else we should nudge people to ePub where content can be dynamically reflowed. I have trouble imagining that a significant number of people actually print these onto dead trees, but that is the main rationale for the design of the PDF format the way it is.

Feb 6 2021, 8:35 AM · Community-Tech (Kanban-2020-21-Q3), WS Export
Xover added a comment to T269726: Make 'pdf' format an alias for 'pdf-a5'.

Uhm. A5? Every printer in the world is designed for A4 (or its bastard offshoot, US Letter), and every sheet of printer paper sold ditto. The other sizes, including A5, are barely measurable in comparison. In fact, I think some of the B sizes may actually outsell A5 due to use in automated mass-mailings of various kinds.

Feb 6 2021, 8:03 AM · Community-Tech (Kanban-2020-21-Q3), WS Export

Jan 20 2021

Xover added a comment to T272253: WS Export: open 'choose formats' link in new tab.

No, please don't. Forcing links to open in a new tab or window to keep the user on your site is literally a dark pattern in web design. Users are quite capable of opening a link in a new tab if they want to, and, conversely, those users who have trouble with this are also apt to be confused by navigating multiple tabs or windows.

Jan 20 2021, 7:24 AM · All-and-every-Wikisource, Community-Tech, WS Export

Jan 13 2021

Xover created T271958: Support "width: fit-content" in TemplateStyles/Sanitized CSS.
Jan 13 2021, 5:06 PM · css-sanitizer, TemplateStyles

Jan 10 2021

Xover added a watcher for WS Export: Xover.
Jan 10 2021, 1:26 PM

Dec 21 2020

Xover added a comment to T134469: doBlockLevels() inserts <p> and </p> randomly with no regard for HTML validity.

I bet something like __NO_P_WRAP__ would be fairly easy to support. Would it get enough adoption to get us closer to our goal of turning it off by default?

Dec 21 2020, 5:21 PM · MediaWiki-Parser

Dec 20 2020

Xover added a comment to T134469: doBlockLevels() inserts <p> and </p> randomly with no regard for HTML validity.

… In ten years, I'd love for us to be at the point where we don't do <p>-wrapping at all …

Dec 20 2020, 10:14 AM · MediaWiki-Parser

Dec 18 2020

Xover added a comment to T270387: Enable OPDS catalog for English Wikisource.

Yeah, daily would be better for newly added works. For changes to existing works the frequency could be much lower with not much problem I think. Alternatively new works could be manually triggered (we have lots of manual processes already) given an interface for it.

Dec 18 2020, 8:36 PM · Community-Tech (Kanban-2020-21-Q3), WS Export

Dec 11 2020

Xover updated subscribers of T230415: Stop ignoring paragraph and region separators in DjVu file OCR text layer.

Oh, no, wait… I think I'm just being a dummy!

Dec 11 2020, 1:52 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), All-and-every-Wikisource, MediaWiki-DjVu

Dec 10 2020

Xover added a comment to T230415: Stop ignoring paragraph and region separators in DjVu file OCR text layer.

It definitely isn't working. On this page the paragraphs run together, but the output from djvutxt thefile.djvu -page=17 -detail=page is:

Dec 10 2020, 9:41 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), All-and-every-Wikisource, MediaWiki-DjVu
Xover added a comment to T230415: Stop ignoring paragraph and region separators in DjVu file OCR text layer.

Hmm. $wgDjvuTxt is set in CommonSettings.php, so that should be ok.

Dec 10 2020, 7:18 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), All-and-every-Wikisource, MediaWiki-DjVu
Xover added a comment to T230415: Stop ignoring paragraph and region separators in DjVu file OCR text layer.

Hmm. I didn't think there'd be any caching of this, but I may have misunderstood. It might also be that retrieveMetaData() is called once on upload rather than on demand as I'd assumed. And we need to check what $wgDjvuTxt is set to, since this whole block is only executed if that config var isset().

Dec 10 2020, 7:28 AM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), All-and-every-Wikisource, MediaWiki-DjVu

Nov 18 2020

Xover added a comment to T215858: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema.

Just to add a perspective…

Nov 18 2020, 7:31 AM · cloud-services-team (Kanban), Data-Services, Analytics

Nov 14 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

Could you apply this diff

Done.

Nov 14 2020, 5:30 PM · Upstream, All-and-every-Wikisource, Tools
Xover updated subscribers of T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

… every word is on a new line. …

Same feedback as @Jan.Kamenicek tonight, although it seemed to worked great a week ago.

Nov 14 2020, 9:25 AM · Upstream, All-and-every-Wikisource, Tools
Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

@Xover, I think it is a misunderstanding
data.text.substring(0,5) != "<?xml" -> XML is accepted, if it is not XML, then is considered error.

Nov 14 2020, 9:13 AM · Upstream, All-and-every-Wikisource, Tools

Nov 13 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

…fallback to old OCR when got text is an error message instead of XML content:

function hocr_callback(data) {
	if ( data.error || data.text.substring(0,5)!="<?xml" ) {
Nov 13 2020, 9:07 AM · Upstream, All-and-every-Wikisource, Tools

Nov 12 2020

Xover updated subscribers of T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

Ok, I've now had some independent testing (Big big thank you to Jan!) that confirms the tweaked Gadget code now produces results that are at least within a reasonable distance of what it used to produce.

Nov 12 2020, 1:39 PM · Upstream, All-and-every-Wikisource, Tools

Nov 11 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

Ok, an update on the corrupted cache…

Nov 11 2020, 6:38 PM · Upstream, All-and-every-Wikisource, Tools
Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

… the [OCR] result is very poor, …: every word is on a new line.

This is a separate problem, and is most likely related to Tesseract being upgraded to 4.x.

Nov 11 2020, 7:28 AM · Upstream, All-and-every-Wikisource, Tools

Nov 10 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

Unfortunately, the OCR does not work with any of these at all

Nov 10 2020, 9:52 AM · Upstream, All-and-every-Wikisource, Tools

Nov 9 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

… I tested it now e. g. on Page:John_Huss,_his_life,_teachings_and_death,_after_five_hundred_years.pdf/122 and some other pages of the same book and it still does not work here :-(

Nov 9 2020, 8:39 PM · Upstream, All-and-every-Wikisource, Tools
Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

@Xover - What would be the effect of just deleting all the caches? Tesseract has been upgraded since most of those caches were generated anyway.

Nov 9 2020, 5:26 PM · Upstream, All-and-every-Wikisource, Tools
Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..
Nov 9 2020, 10:45 AM · Upstream, All-and-every-Wikisource, Tools

Nov 6 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

The cache for a given work will be in a subdirectory of ~/cache/hocr/ created from the MD5 hash of the file's name (spaces replaced with underscores) concatenated with the invoking project's language code. So for Mexico_under_Carranza.djvu requested from English Wikisource, you can generate the hash with…

Nov 6 2020, 9:57 PM · Upstream, All-and-every-Wikisource, Tools
Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

Ok, having gotten access to the project in connection with T265640 I've been trying to debug this a bit.

Nov 6 2020, 5:04 PM · Upstream, All-and-every-Wikisource, Tools

Nov 5 2020

Xover added a comment to T265640: phe-tools: Match&Split bot is not running because of python2 deprecation in pywikibot.

@JJMC89 Thanks!

Nov 5 2020, 7:42 AM · Tools

Nov 4 2020

Xover added a comment to T265640: phe-tools: Match&Split bot is not running because of python2 deprecation in pywikibot.

@Candalua Thanks!

Nov 4 2020, 8:32 PM · Tools
Xover added a comment to T265640: phe-tools: Match&Split bot is not running because of python2 deprecation in pywikibot.

@Candalua That leaves you as the only admin on phetools with any likelihood of having the spare cycles to look at this (Phe and Tpt are highly unlikely to be available for this any time soon). Any chance you could poke around here a bit?

Nov 4 2020, 9:17 AM · Tools

Nov 2 2020

Xover added a comment to T265640: phe-tools: Match&Split bot is not running because of python2 deprecation in pywikibot.

@Aklapper Indeed. Community-Tech was added as their Toolforge group account is one of the four accounts set as admin for the phetools Toolforge project.

Nov 2 2020, 3:32 PM · Tools

Nov 1 2020

Xover added a comment to T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.

… is this a challenge a lot of people are encountering?

Nov 1 2020, 2:02 PM · Editing-team, Community-Tech, VisualEditor, ProofreadPage
Xover merged T202200: Visual Editor set double header in ProofreadPage header into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:55 PM · Editing-team, Community-Tech, VisualEditor, ProofreadPage
Xover merged task T202200: Visual Editor set double header in ProofreadPage header into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:54 PM · ProofreadPage, VisualEditor
Xover merged T198688: Switching between editors on Wikisource, the header and footer are moved into the body into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:53 PM · Editing-team, Community-Tech, VisualEditor, ProofreadPage
Xover merged task T198688: Switching between editors on Wikisource, the header and footer are moved into the body into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:52 PM · VisualEditor, ProofreadPage
Xover merged T212347: Proofreading on Wikisource, switching editor from source to visual to source incorrectly moves header text into page body into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:52 PM · Editing-team, Community-Tech, VisualEditor, ProofreadPage
Xover merged task T212347: Proofreading on Wikisource, switching editor from source to visual to source incorrectly moves header text into page body into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:51 PM · ProofreadPage
Xover merged T266942: Visual Editor issue on Bengali Wikisource into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:49 PM · Editing-team, Community-Tech, VisualEditor, ProofreadPage
Xover merged task T266942: Visual Editor issue on Bengali Wikisource into T244657: Visual Editor moves ProofreadPage header / footer into page text field, duplicating them.
Nov 1 2020, 1:48 PM · Bengali-Sites, ProofreadPage, All-and-every-Wikisource, VisualEditor

Oct 19 2020

Xover added a comment to T228594: [phetools] Wikisource OCR deletes old contents of a page, but does not generate new text..

@kaldari Nope, still seeing the same failure mode. It greys out the text in the editor and then throws an error in the JS console ala. An error occurred during ocr processing: /tmp/52004_6179/page_0199.tif.

Oct 19 2020, 8:19 PM · Upstream, All-and-every-Wikisource, Tools

Oct 16 2020

Xover added a comment to T265571: MediaWiki 1.36/wmf.13 needlessly HTML encodes ASCII characters in DjVu text layer.

Apparently the HTML entities are fixed automatically in the English Wikisource (when I try in this book). ~~~~

Oct 16 2020, 11:52 AM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), ProofreadPage, Editing-team, MediaWiki-DjVu, All-and-every-Wikisource