The epub format is an alias for epub-3, but no such shortcut exists for pdf. I think A5 (pdf-a5) should be the default, as it's the international standard (ISO 216).
Yeah, I found this confusing when I was using WS-Export, since EPUB had a number attached to it, but not PDF. Thanks for clarifying that PDF should ideally have this too. Lower priority at the moment (since we need to focus on reliability & font support work first), but it would be nice to fix this.
It looks like I was wrong about defaulting to A4; it should be A5 instead, based on the popularity of PDF formats in the last six months:
MariaDB [s52561__wsexport_p]> select format, COUNT(*) from books_generated where format like 'pdf%' and time > date_sub(now(), interval 6 month) group by format; +------------+----------+ | format | COUNT(*) | +------------+----------+ | pdf-a4 | 57637 | | pdf-a5 | 247238 | | pdf-a6 | 52 | | pdf-letter | 85 | +------------+----------+ 4 rows in set (2.19 sec)
I think this is because some Wikisources have their sidebar gadget set to use pdf-a5 for the PDF format.
This is ready for QA.
Should this new alias be announced to the wikisource communities? (I guess the same question goes for epub -> epub-3)
Yes, good idea, but probably only after it's in production. No need to mention the epub one though, I think, because that was a regression (my fault).
If you export with format=pdf, it will export it in pdf-a5 format.
It appears to be on production as well now: https://wsexport.wmflabs.org/?format=pdf&lang=en&page=Gulf_Railway_Company_v._Texas
Good question! None, any more. It was being used by some of the gadgets, but we've since retired them, and the new sidebar links accidentally went with pdf-a4 — because that's what English Wikisource used.
Should that be a new ticket?
I also wonder if my assessment above of A5 being more popular is actually wrong, because A5 is only more popular because it was the default used on some of the gadgets. If we were to make A6 the default, it'd soon enough look like that was most popular! Either way, readers in the US will find it less than optimal.
Uhm. A5? Every printer in the world is designed for A4 (or its bastard offshoot, US Letter), and every sheet of printer paper sold ditto. The other sizes, including A5, are barely measurable in comparison. In fact, I think some of the B sizes may actually outsell A5 due to use in automated mass-mailings of various kinds.
What's the rationale for preferring A5?
No, I quite agree! But the stats show A5 as more popular (see T269726#6730649 above). The stats might lie. :-)
Do we know if people are mainly downloading PDFs because they want to print? If so, then A4 makes most sense. If they're using them on computers, maybe it's a way to get shorter line lengths? Or better display on mobile?
So, the changes that're required here are:
- change the sidebar link to pdf;
- maybe change the alias in WS Export from pdf = pdf-a5 to pdf = pdf-a4.
I think that for any inherently paged format (like PDF), print should be a primary concern. For everything else we should nudge people to ePub where content can be dynamically reflowed. I have trouble imagining that a significant number of people actually print these onto dead trees, but that is the main rationale for the design of the PDF format the way it is.
The A5/A4 stats you quote above will be affected both by things like a randomly chosen default on, say, frWS (whose sustained rate of content production astounds me!), and whether or not dynamic layouts are available and commonly used on a given WS. Because, as the stats for pdf-letter show, users are mostly using whatever is the default size (and probably treating PDF as the default choice as that's what they recognize).
Is it feasible to make the default configurable per-project as a safety valve? That would take the pressure off picking a global default for this.
This has been on production for a while, so I'm marking it as Done. If we wish to make further changes to which format of PDF is automatically generated via download buttons & links, that can be explored in a separate ticket.