Page MenuHomePhabricator

[BUG] WWT breaks or removes some article elements
Open, Needs TriagePublicBUG REPORT

Assigned To
Authored By
Sep 5 2019, 5:47 PM
Referenced Files
F30231639: wwtbug_citeerror.png
Sep 5 2019, 9:26 PM
F30230170: nested.png
Sep 5 2019, 5:47 PM
F30230176: center.png
Sep 5 2019, 5:47 PM
F30230162: tables.png
Sep 5 2019, 5:47 PM
F30230182: links.png
Sep 5 2019, 5:47 PM


What is the problem?

There is some wikitext which the WhoColor API does not handle well. When turning the WhoWroteThat tool on in an article, those elements get broken. See "Examples found so far".

Often, they are related to the regexes that WhoColor uses to tokenise wikitext.

We could investigate:

  1. How widespread the unsupported wikitext is used
  2. How much effort they would be to fix (considering we would probably have to fix the WhoColor code)
  3. Do we want to fix them
Examples found so far

This doesn't even look like valid wikitext to me, but apparently it is.

tables.png (355×1 px, 82 KB)

WhoColor uses a regex to tell when a template starts and ends. Not sure how easy it is for this to support nesting.

nested.png (235×742 px, 21 KB)

  • Cite Errors

wwtbug_citeerror.png (684×1 px, 392 KB)

  • Some HTML tags in wikitext are not supported (e.g. T232064).

This should be an easy fix.

center.png (90×491 px, 3 KB)


links.png (183×788 px, 18 KB)

  • Donation banners are removed. Presumably because the HTML we get from the WhoColor API does not include those banners.

Event Timeline

@FaFlo Hello! We have begun some development work for the "Who Wrote That" tool. As a result, we have also begun identifying and investigating relevant bugs. In this case, we think that this bug report may be of interest to you. It appears that some of the data that we are receiving from the WhoColor API is breaking or removing some article elements. Perhaps this is something that the team might be interested in looking into? Thanks!

This is an important ticket, but it is dependent on changes to the WhoColor API. For this reason, I'm leaving this ticket open but I'm removing it from the 'To Be Estimated' column (since we cannot make these changes ourselves & thus cannot make estimate it).

Sorry for not responding this earlier: Yes, most of these are known, but this overview and examples are certainly helpful. See also:
And the assessment of this having to do with the regexes used is also correct.
In regards to fixing it: We do not have the (wo)manpower on our side at this time to fix this. You are certainly invited to become a collaborator (or submit pull requests) on the github repo and we can deploy these changes, if they were tested beforehand, i.e. check if the parsing of the HTML breaks (more) for whatever fixes are administered. So I do not see why your team could not make and test these changes or not estimate it, but maybe I'm missing something.

I'm moving this into the 'To Be Estimated' column, so we can discuss any relevant next steps or communication relevant to this ticket during Estimation.