I ran the comparison script, which compare the exract_html fields, again.
Here are the English results:
For the new run I added a 'b' to the file names. The en.v2.txt is from the old run we did a while ago.
(http://jdlrobson.com/summaries/en.2b.html)
So far I've noticed a bunch more issue classes:
- showing coordinates in Qatar and United States (probably the order of operations is to blame)
- Escaped HTML causes issues in Transformers:_The_Last_Knight: <i id=\"mwCQ\">Transformers</i> and & inside Ariana Grande. See also Logan (Film), DJ Khaled, Keanu Reeves, Beyoncé, Clint Eastwood, Emma Watson and Chris Pine,
- undefined in Donald Trump, Barack Obama and Botulism