Page MenuHomePhabricator

Web2Cit should not fail on misconfigured XPath selection
Closed, ResolvedPublic

Description

The Web2Cit translation server is failing on XPath selection steps configured with some XPath expressions.

For example, using the following expression, which is valid for XPath 3.1, but not for XPath 1.0, results in a server error:

.//*[contains-token(@class, "byline__name")]//a

See https://web2cit.toolforge.org/sandbox/Diegodlh/https://www.newyorker.com/news/the-political-scene/pennsylvania-republican-primaries-trump-dr-oz-mccormick-barnette-mastriano

Event Timeline

diegodlh created this task.

These invalid XPath expressions are not being caught at config validation because for some reason jsdom's document.createExpression() is not failing on them. Reported to jsdom here.

On the one hand, the expression should not have passed validation. At this stage, it must be taken into account that since v2 validation depends on the windowContext object which has been passed to the Domain object constructor. This comes from JSDOM in Web2Cit-Server, but comes from the user's browser in Web2Cit-Editor.

On the other hand, what should we do in case of an error at the actual selection stage? This would be equivalent to what happens if we can't fetch the external resource that a selection step depends on (see T305163). Is the wisest solution to silently return an empty step output in these cases, as proposed in T305163? Or should we automatically mark the translation result as non-applicable and continue with the next template in the queue?

diegodlh claimed this task.
diegodlh moved this task from Backlog to Done on the Web2Cit (Grant end) board.

On the one hand, the expression should not have passed validation. At this stage, it must be taken into account that since v2 validation depends on the windowContext object which has been passed to the Domain object constructor. This comes from JSDOM in Web2Cit-Server, but comes from the user's browser in Web2Cit-Editor.

As a workaround, try the parsed expression's evaluate method on the Document object with which it was created before confirming config value validation.

Regarding what we should do if selection or transformation steps fail at the application stage, I'd rather address this on T305163.

Closing as resolved. Fixed in a280783d.