Page MenuHomePhabricator

Images not shown in Apple Books for ePub exported from Wikisource
Open, Needs TriagePublicBUG REPORT

Description

Exporting an ePub containing images (test case) from English Wikisource creates an ePub which when viewed in Apple Books (macOS 13.5) does not display images.

The proximate cause is the included stylesheet which defines a selector for .mw-halign-center with the style display: table;. This class is used on the <figure>…</figure> element that wraps the images in Parsoid-generated HTML. Manually changing the style rules to display: block; in the downloaded ePub resolves the problem and makes the images display again. The issue was probably introduced in this diff (T330949) and has existed since whenever that was deployed (on March 16 this year).

Since display: table; is permitted by the standards for <figure> this is strictly speaking an upstream bug, but 1) limitations and quirks in ebook readers is a known state of affairs, and 2) display: table; is a weird rule for any modern stylesheet to begin with (tables are among the hackiest and awkward parts of HTML and CSS, explicitly asking for table behaviour on other elements is kinda weird).

To reproduce open the "test case" linked above; hit the big blue "Download" button; choose "EPUB"; open the downloaded .epub file in Apple Books; scroll a little bit into the first chapter (or search for "They began to cuss") to find image captions (centred italic text) without an image. The "Wikisource Scribe" image on the about page is not affected by this due to different markup structure and styling.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
JWheeler-WMF claimed this task.
JWheeler-WMF subscribed.

Workaround identified.

Soda reopened this task as Open.EditedMar 26 2024, 9:37 PM
Soda removed JWheeler-WMF as the assignee of this task.
Soda subscribed.

Boldly un-closing. The workaround identified here is wiki specific and does not address the root cause of the problem. I would assume a code change in the WS Export tool would be required.

Indeed. The workaround from enWS was linked here for the benefit of other projects until the problem can be fixed in WS Export. We can't carry manually added and manually updated custom CSS for this on 100+ Wikisourcen indefinitely.

We're adding adding display:table because that's what's done in content.media-common.less but it does seem like the wrong thing to do.

It's already display:block for enwikisource (and maybe others) so obviously the simple thing is to switch to that. But in my testing, it looks like that breaks the alignment, at least with Calibre. Should we also add text-align:center? We don't want to fix a bug in Apple Books if it means everyone else gets the wrong alignment.

The wikitext [[File:The Gentle Grafter (1908), cover.jpg|480px|center]] creates ws-export HTML that looks like this:

<figure xmlns="http://www.w3.org/1999/xhtml" class="mw-halign-center" typeof="mw:File" data-lnum="6"><a href="https://en.wikisource.org/wiki/File:The_Gentle_Grafter_(1908),_cover.jpg" class="mw-file-description" data-lnum="6"><img src="images/c16_The_Gentle_Grafter__1908___cover.jpg_480px_The_Gentle_Grafter__1908___cover.jpg" decoding="async" class="mw-file-element" style="width:480; height:717; " data-title="The_Gentle_Grafter_(1908),_cover.jpg-480px-The_Gentle_Grafter_(1908),_cover.jpg" data-lnum="6" /></a><figcaption data-lnum="6"></figcaption></figure>

The point of display:table on the figure is so that the caption stays left aligned and doesn't extend beyond the width of the figure, without having to set an explicit width. With display:block any caption becomes too wide.

Or do we assume that Wikisource images generally don't have captions? Then text-align:center on the figure works fine.

I'm not entirely sure I'm following the problem you're seeing, but...

So far as I know, the only way to get at the figcaption element from MediaWiki image syntax is using the Caption argument (the last unnamed argument to image markup). This is only visible if the thumb argument is used, which forces skin-mediated chrome around the thumbnail. On enWS this is not the acceptable way to add images in content namespaces (although we have historical texts with this construct). I would argue this should be the case for the other Wikisourcen too, but I'm not familiar with their practices in this area and I know they can vary significantly on such matters. I would also argue that since the thumb chrome is a function of the skin (which is exactly why we don't use it in content), which is not present and won't work in ePub output, this would also be a reasonable limitation to impose in WS Export.

I would love for there to be updated MediaWiki image syntax that let me control this efficiently from wikitext (so we could put our captions in the figcaption element), but tasks related to image syntax have been languishing since at least 2013 with no action so I don't think that'll happen any time soon. We might at some point generate the HTML image markup from Lua without going through wikimarkup though.

But since figcaption is always a child of figure, won't max-width: 100% be sufficient to prevent it from exceeding the width of the image?

I would also argue that since the thumb chrome is a function of the skin (which is exactly why we don't use it in content), which is not present and won't work in ePub output, this would also be a reasonable limitation to impose in WS Export.

I'll just note that, as all elements on a page, any css can be overridden and changed. So even if you don't want to use the skin frame, that is possible via Common.css, template styles etc etc.

The base HTML however is semantic, so can (and possibly should) be used in any situation where you have a thumbnail and a caption.

but it does seem like the wrong thing to do.

The point of display:table on the figure is so that the caption stays left aligned and doesn't extend beyond the width of the figure, without having to set an explicit width. With display:block any caption becomes too wide.

However IF display:block is set on this element, you likely should also set text-align: center; so that the image is still centered in the viewport.

display:table is a workaround for browsers not supporting width:min-content; and is the only way to have width limited figures without using inline styles, when you use responsive sizing for images.

I suspect btw that the combination with responsive sizing is the actual reason it breaks in apple ebooks btw. I see that ws uses

#columnContainer img, .ws-column-container img {
    max-width: 100%;
    height: auto;
}

If you then have a higher level that uses the intrinsic size of an element, than this is what you get. We see the same problem with images inside tables that can become 0x0 because they have no intrinsic minimum size.

I was inspecting the epub, and I noticed that it sets:

style="width:480; height:717; "

That's invalid css, it should be style="width:480px; height:717px;" or width="480" height="717". Does epub not support using the width and height attribute or something? It's kinda strange that width and height are converted into a style.

Another note, the epub css file includes:

/*
 * Force images to fit in their containers.
 * Specifically, this is to avoid them extending off the page
 */
.content a > img,
.content noscript > img {
  max-width:100% !important;
  height:auto !important
}

But as far as I can tell, the epub doesn't include a .content element, so this css will not apply. Considering that I do see the same effect in the Books app, I assume that the Books app applies its own stylesheet to limit the width of the images. I wonder if someone somewhere reverse engineered that stylesheet. Should be useful to know.

I'll just note that, as all elements on a page, any css can be overridden and changed. So even if you don't want to use the skin frame, that is possible via Common.css, template styles etc etc.

Have you ever tried fighting the skin for control over elements like heading wikimarkup and others where the platform or the skin wants to attach behaviour or is opinionated about styling? And especially to provide formatting templates so regular users can tweak the formatting with some sort of deterministic behaviour. The only reasonable approach for these is to simply not use those constructs inside content. MediaWiki and skin developers are not notorious for making this stuff cleanly overridable for the community.

The base HTML however is semantic, so can (and possibly should) be used in any situation where you have a thumbnail and a caption.

Sure, if I had sufficient control over it for that to be feasible. But MediaWiki has so much dark magic surrounding images that interferes between our use case and the output, that we can't go that route. Add in p-wrapping and "stuff that can appear in captions but will break the MediaWiki image syntax"… Get me a Lua lib that lets me manipulate the relevant parts of the HTML output programatically, without going through the image wikimarkup and parser, and which produces a designed-to-be-overrideable default styling and I'm there in a heart beat. In the mean time that whole figure/figcaption structure will only be used as a really overengineered classic img tag around which we wrap various div type wrappers with little semantics and structure but that give us the styling needed.

I was inspecting the epub, and I noticed that it sets:

style="width:480; height:717; "

I tried changing that to valid units (i.e. add px), and moving them to width/height attributes (without units), and it made no visible difference. The unitless values are spit out from MW (because CSS units aren't valid in HTML width/height attributes), so the bug there seems to be that Calibre converts these to inline CSS in the style attribute without also adding the implicit units. Not sure why it touches that at all, but… It doesn't seem to affect this issue though.

Another note, the epub css file includes:

/*
 * Force images to fit in their containers.
 * Specifically, this is to avoid them extending off the page
 */
.content a > img,
.content noscript > img {
  max-width:100% !important;
  height:auto !important
}

But as far as I can tell, the epub doesn't include a .content element, so this css will not apply.

Minerva adds a .content, and I suspect Vector Classic also used to do this before the consolidation last year(ish) during Vector 2022 development. I can find traces of this in some onwiki TemplateStyles. In any case, I have removed it from epub.css on enWS now since it was non-functional in any case.