Page MenuHomePhabricator

Print to PDF with Prince produces page breaks after headlines
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

When a headline is at the end of a page it may be orphaned:

Orphaned_headline.png (1,027×239 px, 4 KB)

The text belonging to this headline appears on the next page.

What should have happened instead?:
The headline and the text following the headline should appear on the next page.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):
MediaWiki 1.39.7

Other information (browser name/version, screenshots, etc.):
We analyzed the issue and found out that there are two CSS setting from Mediawiki, which cause this behavior from Prince.

  1. In mediawiki/skins/Timeless/resources/libraries/common-print.less there is the setting:
        h1,
	h2,
	h3,
	h4,
	h5,
	h6 {
		font-weight: bold;
		page-break-after: avoid;
		page-break-before: avoid
	}

To avoid page breaks before a headline makes no sense. This should be changed to:

page-break-after: avoid !important;
page-break-before: auto !important;
  1. Normally (at least in our case) a paragraph follows after a headline. In mediawiki/skins/Vector/resources/common/print.less there is a setting:
p:before {
		content: '';
		display: block;
		width: 120pt;
		overflow: hidden
	}

This setting introduces empty content before each paragraph. A headline sticks to this empty paragraph, which makes the "page-break-after: avoid;" meaningless for the real content that follows the headline.

We propose to add this setting:

h1 + p::before,
h2 + p::before,
h3 + p::before,
h4 + p::before,
h5 + p::before,
h6 + p::before {
    display: none !important; 
}

This will avoid empty content before a paragraphs which follow a headline.

Event Timeline

RJ2904 subscribed.

Can I work on this issue? please provide the corresponding GitHub link.

I have made the required changes in change number 1113427 in gerrit. @User935823839, please review it.

@RJ2904: Thanks, please see https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines and amend your commit message.
No need to ask task reporters/users to "review" some proposed code changes without instructions how to do test.

Change #1113427 had a related patch set uploaded (by Aklapper; author: RJ2904):

[mediawiki/skins/Timeless@master] Fix orphaned headlines in PDF rendering

https://gerrit.wikimedia.org/r/1113427

@User935823839 I am a bit confused by this bug report. You're saying that the problem needs fixed in two components that belong to different skins – Vector and Timeless – but only one of them will ever be loaded at the same time (see https://www.mediawiki.org/wiki/Manual:Skins). Can you clarify which skin you're actually using for the print jobs?

The changes you're proposing for Timeless are probably a good idea, but I don't understand why they need the addition of !important. We usually avoid it, since it makes it more difficult to see which styles are applied (see https://www.mediawiki.org/wiki/Manual:Coding_conventions/CSS#!important).

The changes you're proposing for Vector would break the fix for T175008, and so probably would not be accepted. You'll need to come up with a solution that doesn't break that (maybe applying some of the page-break-... rules to the :before block would work?). Also, the CSS you wrote would work with MW 1.39, but it would need to be adjusted for more recent versions, since the h1 tags are placed in a wrapper div since MW 1.43 (see https://www.mediawiki.org/wiki/Heading_HTML_changes).

@RJ2904 I am also confused by your patch. You proposed it for the Timeless skin, but it doesn't apply the proposed changes for Timeless (it only adds !important, again without any explanation), and instead applies the proposed changes for Vector, which won't do any good when applied to Timeless. I suggest waiting until we figure out what actually needs to be changed, and in which skin (see above).

@matmarex Thanks for the clarification! I will work on this again after getting a clearer understanding of the issue from you.

@matmarex
To your question concerning the different skins:
Fact is, when I open a page of the Wiki with the printing problem in Firefox and look at it with the tools for web developers and set this view to printing view, then for the headline I can see the CSS setting "page-break-after/before: avoid;":

Headline_CSS_sttings.png (1,575×253 px, 30 KB)

and for the paragraph that follows the headline I can see that a p::before with empty content is inserted:
Paragraph_CSS_setting.png (1,903×258 px, 46 KB)

So I do not know from which skins these CSS settings come from, but it is a fact that the client sees them. After grepping the Mediawiki folder for those setting I found them in the files mentioned in the bug report at hand. So I assume the settings come from there.

To your question with the !important:
I made some tests with our setup avoiding the !important. If one of the !important is removed from the CSS settings, then I get page breaks after headlines again.

To your remarks about the Vector changes:
I am not aware of all the constraints that need to be considered for a solution. But I think you can agree that a "page-break-after: avoid;" and a "page-break-before: avoid;" together for the same html tag does not make sense. I do not have the chance to work with MW 1.43 at the moment. I assume that when my company updated to this version, there will already be a newer version, which needs another bugfix again. So I will not propose another solution. But I hope that some developer from the community will resolve this contradictory CSS settings.

@matmarex
To your question concerning the different skins:
(…)
So I do not know from which skins these CSS settings come from, but it is a fact that the client sees them. After grepping the Mediawiki folder for those setting I found them in the files mentioned in the bug report at hand. So I assume the settings come from there.

Thanks for the details! I searched for the page-break-before: avoid declaration as well (we have a neat tool for it), and found another copy in MediaWiki core: https://gerrit.wikimedia.org/g/mediawiki/core/+/REL1_39/resources/src/mediawiki.skinning/elements-print.less#34 – this is probably the one that's being loaded.

To your question with the !important:
I made some tests with our setup avoiding the !important. If one of the !important is removed from the CSS settings, then I get page breaks after headlines again.

This seems weird and unexpected to me. Maybe something is still loading another copy of the wrong CSS in your setup somewhere? Try removing the declaration with the incorrect avoid value, instead of setting it to auto, and see if that helps (or if you can still see the declaration in the web developer tools after that, try to see where it's coming from).

To your remarks about the Vector changes:
I am not aware of all the constraints that need to be considered for a solution. But I think you can agree that a "page-break-after: avoid;" and a "page-break-before: avoid;" together for the same html tag does not make sense. I do not have the chance to work with MW 1.43 at the moment. I assume that when my company updated to this version, there will already be a newer version, which needs another bugfix again. So I will not propose another solution. But I hope that some developer from the community will resolve this contradictory CSS settings.

I definitely agree that page-break-before: avoid does not make sense here. I wanted to find out why it was there, so I blamed the line, and I found that it was introduced in a large refactoring in 2017, with no explanation: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/320720. That is tagged with two tasks: T169823, where someone else actually pointed out that it might be a mistake: T169823#3481083; and T135022, where people actually discussed adding the opposite value for that property, page-break-before: always (but ultimately decided against it). I would guess that there was some misunderstanding during that work, and this declaration should be removed.

As for the p:before { … } rule, I experimented a bit (on Chrome – I'm not going to install Prince for this, sorry), and I found that adding a page-break-after: avoid declaration to that block fixes the unexpected page breaks. This makes sense to me, and I would expect this to work on Prince as well, although it would be nice if someone could test that.


So to summarize, to resolve this problem, I think we need to:

(I have to say, I'm not a fan of how the skin authors just copy-paste things…)

@RJ2904 If you'd like to submit the patches for any of these changes (you don't have to do all 8 at once), I'd be happy to approve them.

@User935823839 If you want to backport these changes to MW 1.39 afterwards, you'll probably only need the patches in mediawiki/core and mediawiki/skins/Vector, as I doubt you're using any of the other skins (but you can check that by visiting Special:Version on your wiki).

@matmarex
I made some experiments. You were right. The "!important" is not necessary, when the "page-break-before: avoid" is either removed or set to "auto".

Thank you for fixing this.

@matmarex, thanks for the explanation; I will make the required changes.

@matmarex, I created a sample page in MediaWiki and made the changes told by you, but I didn't notice any changes in the printable view. Could you explain what might be wrong?

@RJ2904 You need to be testing in a browser that supports these CSS properties (e.g. Chrome), you need to look at the preview in the print dialog (not just the simulation in browser dev tools, as that doesn't split into paper pages), and you need to have just enough content on the page so that the heading is right at the end of a paper page.

@matmarex I got it. this is how the issue should look in the development environment.
:

Screenshot from 2025-01-30 12-17-16.png (1,215×849 px, 218 KB)

@matmarex , please tell me if the problem I just told above is the one you wanted to point out in this issue or not?

@RJ2904 That looks like a screenshot from Firefox, which doesn't support page-break-after: avoid – that's why you're seeing the page break still appearing after the headline.

If you test the same changes on Chrome, you should instead see some additional empty space at the end of the page, then the headline appearing on the next page after the page break.

Jdlrobson-WMF changed the task status from Open to In Progress.Jan 31 2025, 10:50 PM
Jdlrobson-WMF added a project: Web-Team.

@matmarex , So, I need to provide the correct commit message and remove the line instead of just commenting out the change I made in the change number rEMFR111590586ec8 ?

Change #1115905 had a related patch set uploaded (by Bartosz Dziewoński; author: RJ2904):

[mediawiki/core@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115905

@matmarex , So, I need to provide the correct commit message and remove the line instead of just commenting out the change I made in the change number 1115905 ?

Yes, thanks for doing that!

Can you also update your other patches similarly?

Change #1113427 abandoned by Bartosz Dziewoński:

[mediawiki/skins/Timeless@master] Fix orphaned headlines in PDF rendering

Reason:

Superseded by https://gerrit.wikimedia.org/r/c/mediawiki/skins/Timeless/+/1115913

https://gerrit.wikimedia.org/r/1113427

Change #1115905 merged by jenkins-bot:

[mediawiki/core@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115905

Change #1115913 had a related patch set uploaded (by Pppery; author: RJ2904):

[mediawiki/skins/Timeless@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115913

Change #1115914 had a related patch set uploaded (by Jdlrobson; author: RJ2904):

[mediawiki/skins/Vector@master] print: Fix in vector skin for orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115914

Change #1115912 had a related patch set uploaded (by Bartosz Dziewoński; author: RJ2904):

[mediawiki/skins/Splash@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115912

Change #1115910 had a related patch set uploaded (by Bartosz Dziewoński; author: RJ2904):

[mediawiki/skins/Cosmos@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115910

Change #1115913 merged by jenkins-bot:

[mediawiki/skins/Timeless@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115913

Change #1115912 merged by jenkins-bot:

[mediawiki/skins/Splash@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115912

Change #1115910 merged by Bartosz Dziewoński:

[mediawiki/skins/Cosmos@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115910

@matmarex, can you tell me the names of most of the browsers in which this printable view feature can be observed?

Change #1115914 had a related patch set uploaded (by VolkerE; author: RJ2904):

[mediawiki/skins/Vector@master] print: Fix in vector skin for orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115914

Change #1115914 merged by jenkins-bot:

[mediawiki/skins/Vector@master] print: Fix orphaned headlines in printable view

https://gerrit.wikimedia.org/r/1115914

matmarex removed a project: Patch-For-Review.

Thank you for all the patches @RJ2904!

For future reference: A few of them didn't get tagged with this task, but we have resolved all of the issues I found in T384174#10493831: https://gerrit.wikimedia.org/r/q/owner:jyotishirashi1610@gmail.com+is:merged

It was a really nice and enriching learning experience working on this
issue. My mid-semester exams are starting in two days, so I will catch up
and contribute more as soon as my exams are over.