Page MenuHomePhabricator

Log when active formatting elements are reopened inside figures
Closed, ResolvedPublic

Description

The generic media structure looks like,

<wrapper> figure,span
  <link> a,span
    <media element> audio, video, img, etc.

and the less mirrors that,

figure[ typeof~='mw:File' ]
  > *:first-child
    > img,

The child combinators are necessary because media can be nested in the figcaption,

<wrapper>
  <link>...</link>
  <caption>...</caption>
</wrapper>

and styles should not apply in there.

Unfortunately, active formatting elements reopening in the wrapper will prevent the styles from applying. For example, <p>'''''[[File:Foobar.jpg|thumb]]'''''</p> renders as,

<p data-parsoid='{"stx":"html","autoInsertedEnd":true}'><i data-parsoid='{"autoInsertedEnd":true}'><b data-parsoid='{"autoInsertedEnd":true}'></b></i></p><figure class="mw-default-size" typeof="mw:Image/Thumb"><i data-parsoid='{"autoInsertedStart":true,"autoInsertedEnd":true}'><b data-parsoid='{"autoInsertedStart":true,"autoInsertedEnd":true}'><a href="./File:Foobar.jpg" class="mw-file-description"><img resource="./File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" data-file-width="1941" data-file-height="220" data-file-type="bitmap" height="20" width="180" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x"/></a><figcaption></figcaption></b></i></figure><p class="mw-empty-elt" data-parsoid='{"autoInsertedStart":true,"stx":"html"}'></p>

Let's add some logging or metrics see how common this is and under what conditions editors are producing it.

Event Timeline

Arlolra triaged this task as Medium priority.Jul 28 2022, 7:50 PM

Change 818220 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Log when active formatting elements are reopened inside figures

https://gerrit.wikimedia.org/r/818220

Change 818220 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Log when active formatting elements are reopened inside figures

https://gerrit.wikimedia.org/r/818220

Change 819170 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.16.0-a18

https://gerrit.wikimedia.org/r/819170

Change 819170 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.16.0-a18

https://gerrit.wikimedia.org/r/819170

There have been about 125 instances of the log "Active formatting element reopened in figure" since this was deployed last week

The takeaway from T65642 is that this is a violation of our media spec and we have a post-processing pass to migrate the formatting elements into the caption already.

Serialization should be robust to this, which it is,
https://github.com/wikimedia/parsoid/blob/master/src/Core/MediaStructure.php#L136-L143

So, we should continue to log these warnings and fixup any cases that fall out.

If we continue to do that, the child combinators in the css will apply.

VE still fails if we violate the spec (ie. in the case above). It can be made more robust, but better to just comply with the spec if we can.

and we have a post-processing pass to migrate the formatting elements into the caption already.

Apparently I saw fit to remove that pass in,
https://github.com/wikimedia/parsoid/commit/d473791ea982178af7a0fe15aff5cf8e21aaa5e8

Change 826902 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Restore cleanupFormattingTagFixup pass

https://gerrit.wikimedia.org/r/826902

Change 826902 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Restore cleanupFormattingTagFixup pass

https://gerrit.wikimedia.org/r/826902

Change 827533 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a20

https://gerrit.wikimedia.org/r/827533

Change 827533 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a20

https://gerrit.wikimedia.org/r/827533

While Parsoid is now robust, the legacy parser is not, so when wgParserEnableLegacyMediaDOM is disabled, the styles don't apply to the core output,
https://www.mediawiki.org/w/index.php?title=User:Arlolra/sandbox&oldid=5443106
https://www.mediawiki.org/api/rest_v1/page/html/User:Arlolra%2Fsandbox/5443106

We could write a similar patch in core or make this a lint as suggested in the patch in T314059#8190280, or both.

Change 828107 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Don't reconstruct formatting elements in figures

https://gerrit.wikimedia.org/r/828107

Change 828107 merged by jenkins-bot:

[mediawiki/core@master] Don't reconstruct formatting elements in figures

https://gerrit.wikimedia.org/r/828107

We could write a similar patch in core or make this a lint as suggested in the patch in T314059#8190280, or both.

The patch was written and now the styles apply properly to core,
https://www.mediawiki.org/w/index.php?title=User:Arlolra/sandbox&oldid=5443106

The analysis of what should be linted away can be done as part of T200562