Sub ticket of: https://phabricator.wikimedia.org/T339232
A flaw in table parsing, if there is a table within table in the infobox, we get the first filed with all the content. And then the individual cells output correctly. Example:
JSON infobox
Original HTML See <table> inside an infobox <tr>
Acceptance criteria
First field in this scenario should only have the text in that table cell, not all the descendant text
ToDo
- Check for <tr> in the descendants, if exists then change the extract text to be get the text without traversing the embedded tr
Checklist for testing
- manually run cli gen in parser project.
Things to consider:
- Check that other JSON outputs don't get side effects with the change in parser selectors.