Try node bin/roundtrip-test.js --prefix dewiki Pentaquark
@ssastry says,
i expect it is because some element/citation/ref id is different in the second round which throws off the diffing code.
Try node bin/roundtrip-test.js --prefix dewiki Pentaquark
@ssastry says,
i expect it is because some element/citation/ref id is different in the second round which throws off the diffing code.
{{Literatur|arxiv=1507.03414|Titel=Observation of J/ψp resonances consistent with pentaquark states in <math>\Lambda_b^0 \to J/\psi K^- p</math> decays|Jahr=2015-07-13|Sprache=en|Sammelwerk=Phys. Rev. Lett.| Band= 115 |Seiten= 072001}}
The "Literatur" template invokes "Modul:Vorlage:Literatur" which requires "Modul:Zitation" which uses Titel there for string interpolation in Zitation.COinS. I'm not sure who's job it is to remove the strip markers but that leads to,
"&rft.atitle=Observation+of+J%2F%CF%88p+resonances+consistent+with+pentaquark+states+in+%7F%27%22%60UNIQ--math-00000003-QINU%60%22%27%7F+decays"
and the "UNIQ--math-00000003-QINU" varies between parse requests.
I'm not sure who's job it is to remove the strip markers
T133477#2234972 says the module should be calling mw.text.killMarkers
Change 422345 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Eliminate a source of indeterminacy from leaked strip markers
Change 422350 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Normalize away unnecessary attributes in data-mw.html too
Change 422345 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Eliminate a source of indeterminacy from leaked strip markers
Change 422350 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Normalize away unnecessary attributes in data-mw.html too
The template {{Zukunft|2018|3}} on dewiki probably uses the {{CURRENTTIMESTAMP}} variable to produce the sort key(?) for this category, [[Kategorie:Wikipedia:Veraltet nach März 2018| 20180328203041]], which Parsoid outputs as,
<link rel="mw:PageProp/Category" href="./Kategorie:Wikipedia:Veraltet_nach_März_2018#%2020180328203041" data-parsoid='{"stx":"piped","a":{"href":"./Kategorie:Wikipedia:Veraltet_nach_März_2018"},"sa":{"href":"Kategorie:Wikipedia:Veraltet nach März 2018"},"dsr":[0,63,null,null]}'/>
Whenever there's this variability in the content, roundtrip-test.js will automatically fail the "quick" semantic diff test,
https://github.com/wikimedia/parsoid/blob/master/bin/roundtrip-test.js#L407
after which, we need to rely on simplediff to give us comparable ranges to test. Unfortunately, it's not always great. For example, an extra newline is producing two diffs, which are trivially semantically different.
wt1 "|style=vertical-align:top|\n" wt2 "|style=vertical-align:top|\n'''[[Goldene Schallplatte|Platin-Schallplatte]]'''\n" @@ -1,0 +1,1 @@ +<p><b><a href="Goldene_Schallplatte" title="Goldene Schallplatte">Platin-Schallplatte</a></b></p>
wt1 "'''[[Goldene Schallplatte|Platin-Schallplatte]]'''\n" wt2 "" @@ -1,1 +1,0 @@ -<p><b><a href="Goldene_Schallplatte" title="Goldene Schallplatte">Platin-Schallplatte</a></b></p>
http://localhost:8000/es.wikipedia.org/v3/page/html/Cleveland_Cavaliers/106401442 has duplicate template arguments, that get normalized in the first roundtrip,
- |propietario = |propietario = [[Dan Gilbert]] + |propietario =[[Dan Gilbert]]
which means that only the first time around will we output the category,
[ '-', [ '<span> </span><link rel="mw:PageProp/Category" href="./Categoría:Wikipedia:Páginas_con_plantillas_con_argumentos_duplicados"/>\n' ] ],
Change 422583 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Follow up to e034960 w/ the ' variant
https://en.wikipedia.org/wiki/Brazil_at_the_2016_Summer_Olympics uses a template that invokes Module:Sports_table, which has the following,
-- Now define the identifier for this note_id = '"table_note_'..team_code_ii..rand_val..'"' -- Add random end for unique ID if more tables are present on article (which might otherwise share an ID) note_id_list[team_code_ii] = note_id
where rand_val is defined as,
-- Random value used for uniqueness math.randomseed( os.clock() * 10^8 ) local rand_val = math.random()
which results in cite ids like,
[ '-', [ '<td>8<sup class="mw-ref" id="cite_ref-table_hth_GER0.75888191385143_67-0" rel="dc:references" typeof="mw:Extension/ref" data-mw=\'{"name":"ref","body":{"id":"mw-reference-text-cite_note-table_hth_GER0.75888191385143-67"},"attrs":{"group":"lower-alpha","name":"table_hth_GER0.75888191385143"}}\'><a href="./Brazil_at_the_2016_Summer_Olympics#cite_note-table_hth_GER0.75888191385143-67" data-mw-group="lower-alpha"><span class="mw-reflink-text">[lower-alpha 1]</span></a></sup></td>\n' ] ], [ '+', [ '<td>8<sup class="mw-ref" id="cite_ref-table_hth_GER0.36878909374065_67-0" rel="dc:references" typeof="mw:Extension/ref" data-mw=\'{"name":"ref","body":{"id":"mw-reference-text-cite_note-table_hth_GER0.36878909374065-67"},"attrs":{"group":"lower-alpha","name":"table_hth_GER0.36878909374065"}}\'><a href="./Brazil_at_the_2016_Summer_Olympics#cite_note-table_hth_GER0.36878909374065-67" data-mw-group="lower-alpha"><span class="mw-reflink-text">[lower-alpha 1]</span></a></sup></td>\n' ] ],
Change 422583 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Follow up to e034960 w/ the ' variant
The imagemap extension looks like it uses timestamps in various places,
<div class="noresize" typeof="mw:Extension/imagemap" data-mw=\'{"name":"imagemap","attrs":{},"body":{"extsrc":"\\nՊատկեր:Italy location map.svg|300px|Պեսկարա (Իտալիա)\\nrect 0 0 0 0 [[##]]\\ndesc none\\n"}}\'><map name="ImageMap_1_890608218" id="ImageMap_1_890608218"> <area href="##" shape="rect" coords="0,0,0,0" alt="##" title="##"/></map><img alt="Պեսկարա (Իտալիա)" src="//upload.wikimedia.org/wikipedia/commons/thumb/b/be/Italy_location_map.svg/300px-Italy_location_map.svg.png" width="300" height="377" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/b/be/Italy_location_map.svg/450px-Italy_location_map.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/be/Italy_location_map.svg/600px-Italy_location_map.svg.png 2x" data-file-width="1034" data-file-height="1299" usemap="#ImageMap_1_890608218"/></div>
Change 423038 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Eliminate variability from imagemap extension attributes
For posterity, these diffs were taken with,
diff --git a/bin/roundtrip-test.js b/bin/roundtrip-test.js index f164566b..239d0706 100755 --- a/bin/roundtrip-test.js +++ b/bin/roundtrip-test.js @@ -416,6 +416,8 @@ var checkIfSignificant = function(offsets, data) { }); } return results; + } else { + console.log(Diff.diffLines(normalizedOld, normalizedNew)) } }
Change 423038 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Eliminate variability from imagemap extension attributes
Going to consider this round of investigation resolved. We can always reopen when this rears its ugly head again.