Page MenuHomePhabricator

Errors during step application may return an empty step output
Closed, ResolvedPublic

Description

So far, the only parts of Web2Cit that rely on resources external to Web2Cit (i.e., excluding configuration files on Meta) are some selection steps. Namely, the Citoid selection, which relies on the Citoid response for the target webpage (i.e., the page requested to be translated), and the XPath selection, which relies on the HTTP response for the target.

However, external resources may be unavailable. For example, Citoid is known to fail for some URLs. Right now, this results in a Web2Cit-Core error, which the Web2Cit-Server handles by returning a 404 error, indicating what external resource could not be fetched. See for example https://web2cit.toolforge.org/https://www.realestate.com.au/property/123-main-st-kangaroo-point-qld-4169

A possible solution may be having Citoid and Xpath selection steps, which rely on external resources, return an empty step output ([ ]) if the external resource cannot be fetched.

Note that this solution would make the fallback template non-applicable in cases where Citoid fails, because the fallback template uses Citoid selection for all fields, and mandatory fields (i.e., always-required fields) would not accept an empty output. The Web2Cit-Server may handle this by returning a 404 error if no applicable templates were found for a given target webpage. See T305166

Also, we should consider what we want to do in cases where the target webpage does not exist (i.e., the HTTP request times out or returns a 404 error).

Event Timeline

As described in T314943, failing to fetch the HTML from the original source (as needed by XPath selection steps), makes the translation fail altogether. That is, Web2Cit does not proceed with the next template in these cases.

This would affect components using Web2Cit-Server: Web2Cit-Gadget and Web2Cit-Editor.

Note that, as described in a comment in T308666, (1) silently returning an empty step output in these cases may not be the wisest. An empty field output may be applicable (if the field has been marked as non-required), potentially hiding the error.

Alternatively, we may (2) return an empty step output AND mark the field as non-applicable, regardless of whether it has been marked as required or not. This ensures that translation is not halted at this step and that other translation steps in the template are processed, while at the same it ensures that the next template in the queue will be attempted.

Finally, (3) an enhanced output model (see T302431) may be considered that includes an optional error field indicating whether a config validation or step application error has occurred. This may be helpful for the Web2Cit-Editor, which is already prepared for such error fields.

To begin with, we may just do (1) or even (2), to address bugs such as T314943, and leave (3) for later.

diegodlh renamed this task from Make selection steps relying on external sources (Citoid and Xpath) return an empty StepOutput if external resource is unavailable to Errors during step application may return an empty step output and mark the template as non-applicable.Sep 14 2022, 6:50 PM

Alternatively, we may (2) return an empty step output AND mark the field as non-applicable

Note that marking the template as non-applicable may make it more confusing for the person trying to interpret the debugging information, which would have to figure out not only why the step returned an empty output, but also why the template was marked as non-applicable.

Maybe we should just return an empty output here, and address T317441 to make sure we include translation errors in the translation output.

diegodlh renamed this task from Errors during step application may return an empty step output and mark the template as non-applicable to Errors during step application may return an empty step output.Sep 14 2022, 7:08 PM
diegodlh claimed this task.

Fixed in 97ac70e5.