Page MenuHomePhabricator

Consider automatically removing control characters from Web2Cit field outputs
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Create a template that outputs a field with a control character. For example, this revision of templates.json (by @Kerry_Raymond) for www.findandconnect.gov.au includes an XPath selection step that returns a string including control character U+00A0 (no-break space). Alternatively, one may just add a fixed selection returning a string which includes a control character.
  • Use Web2Cit user script to insert a citation for a webpage using the template defined above.

What happens?:

The Cite dialog shows an error:

image.png (469×529 px, 55 KB)

What should have happened instead?:

No error should have been shown.

Software version (skip for WMF-hosted wikis like Wikipedia):

Web2Cit sever v1.1.0

Other information (browser name/version, screenshots, etc.):

We may consider automatically removing (or replacing, as appropriate) these control scripts from Web2Cit field outputs.

Note that Zotero translation (or Citoid, or at least the Embedded metadata translator) seem to be doing this (i.e., removing/replacing the control characters) since citations created without the user script (that is, by having Citoid use embedded metadata in Web2Cit server's HTML response) do not show the same error. That is, having Citoid generate a citation for https://web2cit.toolforge.org/https://www.findandconnect.gov.au/guide/qld/QE00439.

In the meantime, one may use replace transformation (not yet supported, see T302691) to manually replace these control characters, or use split + join transformations (as suggested by @Kerry_Raymond).