Page MenuHomePhabricator

Consider circumventing Citoid failure by adding alternative selection steps to the fallback template
Closed, DeclinedPublic

Description

As described in T305166, there may be cases where the fallback template (which currently uses Citoid selection for all fields) does not apply. Concretely, this would happen if Citoid is failing for the given target webpage. If T305166 is addressed, this would result in the translation server returning a 404 error if no applicable template is found for a target webpage.

Alternatively, we may consider changing the mandatory (i.e., always required) fields in the fallback template to prevent Web2Cit from failing in these cases:

  • itemType field: next to Citoid selection use fixed selection "website". Then use range transformation "0" to keep the first item selected. If Citoid is working, it will return the itemType returned by Citoid (current behavior). If it isn't, it will return "website".
  • title field: similarly, next to Citoid selection use (1) Xpath selection "/html/head/title" and (2) URL selection "href" (see T304326), followed by range transformation "0" again. This way, it will return the title returned by Citoid, otherwise the title in the html head, or finally the full URL if none of the above are available.

Of course, this task would be irrelevant if we switch to the "use Citoid except for"-approach described in T302019. Although in that case it may still be worth it considering an alternative to Citoid for main fields such as itemType and title.

Event Timeline

In the meantime, we may consider returning an empty template output (instead of no output at all), as suggested in T319074.

Note that since 7960712 the Web2Cit-Server shows target-specific error message when no applicable template is found for a target webpage.

this would happen if Citoid is failing for the given target webpage

Note that Citoid may fail for different reasons and we may not want to react the same in all of these cases. For example, it has happened that Citoid returns a 504 Gateway Timeout error. This error should we temporary. If we circumvent this error, the Web2Cit-Monitor may write incorrect results, for example.

As described in T305166, there may be cases where the fallback template (which currently uses Citoid selection for all fields) does not apply.

Noting here that they are experimenting with Citoid fetching metadata from Wayback Machine snapshots (T95388), in cases where sites are blocking Citoid (T362379).

Then use range transformation "0" to keep the first item selected.

This would add a transformation step to the default procedure which would often pass unnoticed by editors. How could this impact their procedures if they customize the selection steps? Also consider T308354 regarding whether we want transformation steps in the fallback template at all.

we may consider changing the mandatory (i.e., always required) fields in the fallback template to prevent Web2Cit from failing in these cases:

Maybe we should just let the users fix these cases themselves by manually choosing a fixed selection for the itemType field and a URL selection (to be supported) for the title field.

Closing for now as declined.