Page MenuHomePhabricator

Don't output useless about attributes
Closed, DeclinedPublic0 Story Points

Description

about attributes exist to tie together groups of sibling nodes ("about groups"). If a node has an about attribute that is unique (i.e. there is no other node with the same about attribute value, so the size of the about group is one), then by definition that about attribute is useless (for grouping; @GWicke is investigating if there are consequences for the RDFa graph).

At http://parsoid-lb.eqiad.wikimedia.org/enwiki/Barack_Obama?oldid=644858117 there are 1298 about groups in the body. Of those, 803 are useless (have size 1). They break down into the following categories:

  • 37 transclusions that don't end up outputting multiple siblings: <div class="hatnote" about="#mwt1" typeof="mw:Transclusion">
  • 395 references: <span class="reference" about="#mwt25" typeof="mw:Extension/ref">
  • 1 reference list: <ol class="references" typeof="mw:Extension/references" about="#mwt1307">
  • 370 list items in the reference list: <li about="#cite_note-94" id="cite_note-94"> (note that these are not mwt abouts, instead they're duplicates of the id)

Event Timeline

Catrope created this task.Feb 3 2015, 12:57 AM
Catrope raised the priority of this task from to Normal.
Catrope updated the task description. (Show Details)
Catrope added subscribers: Catrope, GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 3 2015, 12:57 AM
Catrope set Security to None.Feb 3 2015, 12:58 AM
Catrope edited a custom field.
marcoil claimed this task.Feb 3 2015, 3:38 PM
gerritbot added a subscriber: gerritbot.

Change 188380 had a related patch set uploaded (by Marcoil):
WIP: T88383: Remove unnecessary 'about' attributes

https://gerrit.wikimedia.org/r/188380

Patch-For-Review

Removing the 'about's in references and reference lists reduces Barack_Obama by ~9K. All tests pass and I don't think it affects anything else, but (as with the rest of 'about's) I'll wait for @GWicke's investigation into consequences for the RDFa graph. So the patch is a WIP for now.

ssastry moved this task from Backlog to VE Q3 on the Parsoid board.Feb 3 2015, 6:11 PM
marcoil changed the task status from Open to Stalled.Feb 10 2015, 2:34 PM

@GWicke, do you think removing the 'about's in <ref>s and <references> will have any consequence for the RDFa graph, independent of the rest of the cases? If not, I have a patch ready for just that case.

GWicke added a comment.EditedFeb 13 2015, 4:18 PM

Unfortunately, abouts change the subject of nested RDF statements, so can't generally omitted without changing the RDF graph. @marcoil, are your numbers with compression or without? If it is without, then 9k seems not very much compared to ids. See T78676.

Unfortunately, abouts change the subject of nested RDF statements, so can't generally omitted without changing the RDF graph.

But at least the ones generated in <references/>, which are just repeating the ids, look like they are either redundant or wrong (should they point to the <ref> that generated them instead>)

@marcoil, are your numbers with compression or without?

Without.

If it is without, then 9k seems not very much compared to ids. See T78676.

Yep, in the other cases it seems like not much can be gained.

Jdforrester-WMF closed this task as Declined.Feb 17 2015, 8:46 PM
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Breaks things without sufficient cost advantage.

Change 188380 abandoned by Marcoil:
WIP: T88383: Remove unnecessary 'about' attributes

Reason:
Bug declined.

https://gerrit.wikimedia.org/r/188380