Set up mass round-trip testing infrastructure on real content
Open, NormalPublic

Description

Automated mass round-trip testing on actual page content would be useful to ensure proper HTML round-tripping in VE. This is very similar to your existing DOM sanity check. Basically load it into DM, export it again and check that the result is identical.

You can probably reuse parts of our distributed test infrastructure for this (currently rt testing 160k pages from various wikis through Parsoid), and can directly use the cached HTML from production as the input.


Version: unspecified
Severity: normal

Details

Reference
bz50513
bzimport raised the priority of this task from to High.
bzimport set Reference to bz50513.
GWicke created this task.Jul 1 2013, 6:49 PM

Timo,

Let's use this bug for what we discussed. As I suggested, we should probably run on:

  • enwiki featured articles (~ 4k), fixed revision (so if we regress we notice)
  • enwiki ~ 5k most recently-changed articles (Special:RecentChanges)
  • {en,fr,de,it,es,nl,he,ru,ar,ja,ko,vi}wiki ~ 5k random articles (Special:Random)

Thoughts?

He7d3r added a comment.Jul 1 2013, 7:33 PM

(In reply to comment #1)
...

  • {en,fr,de,it,es,nl,he,ru,ar,ja,ko,vi}wiki ~ 5k random articles

Could you add also 'pt' to this list?

(In reply to comment #2)

(In reply to comment #1)
...
> * {en,fr,de,it,es,nl,he,ru,ar,ja,ko,vi}wiki ~ 5k random articles
Could you add also 'pt' to this list?

Sure. It was just writing a quick list rather than setting it in stone.

When we expand to cover language variants we'll want to expand the list further - for example, zh. :-)

So based on discussions with Gabriel:

  • Parsoid has a better organised infrastructure for this than we do, so let's use that as a base. Right now they periodically run their sets of roundtrip tests on a certain set of articles.
  1. Change that set of articles to include and/or match James' specification.
  2. Improve ve-dirtydiffbot to not just do parsoid-ve-ve-parsoid roundtrip but also parsoid-ve-ve rountrip (e.g. parsoid dom > ve linmod > ve dom; "sanity check")
  1. Extend the test runner to include 2 pieces of information for each article in addition to the data parsoid gathers:
    • result of parsoid-dom > ve linmod > ve dom ("sanity check")
    • diff of parsoid-dom > ve linmod > ve dom > parsoid dom ("full wikitext roundtrip")[1]

[1] this is the one that ve-dirtydiffbot is currently doing.

(In reply to comment #4)

So based on discussions with Gabriel:

  • Parsoid has a better organised infrastructure for this than we do, so let's use that as a base. Right now they periodically run their sets of roundtrip tests on a certain set of articles.
  • Change that set of articles to include and/or match James' specification.

Include, not switch, please; the stuff that Parsoid is doing for RT tests should also be expanded, IMO.

  • Bug 56330 has been marked as a duplicate of this bug. ***
Jdforrester-WMF edited a custom field.
Krenair edited a custom field.Feb 10 2015, 8:41 PM
Krinkle added a comment.EditedMar 10 2015, 2:40 AM

We currently have https://github.com/wikimedia/ve-dirtydiffbot, which:

  • For 720 random en.wikipedia.org pages:
  • Load VE
  • Trigger save dialog
  • Assert diff is empty

There's a potential race condition of an edit having occurred and there actually being a diff or edit conflict. Though we've never hit this so far.

The current dirtydiff bot has not successfully run since April 2014.

As of February 2015, we no longer have the "Sanity check" method.

Before reviving this test (the code is a bit outdated, no longer works) and expanding its test input, I'd like to revisit what it is doing.

With Parsoid's selective serialisation mechanism (selser) this may not be a very useful test. I feel like I'm missing something. Isn't this nothing but a test of the browser's ability to parse and serialise HTML? What part of VE, related to different pages as input, would this test? Other than on-load exceptions for exotic content and such. Which is not the primary purpose of this test. Should we involve linmod somehow? Or is it already? What steps do we want the test to do instead?

Jdforrester-WMF lowered the priority of this task from High to Normal.
Jdforrester-WMF edited a custom field.
Jdforrester-WMF renamed this task from VisualEditor: Set up mass round-trip testing infrastructure on real content to Set up mass round-trip testing infrastructure on real content.Apr 29 2015, 3:05 PM
Jdforrester-WMF removed Krinkle as the assignee of this task.May 5 2015, 12:09 AM
Jdforrester-WMF edited a custom field.
Jdforrester-WMF moved this task from TR8: ???? to Backlog on the VisualEditor board.