Page MenuHomePhabricator

Don't "strip tags" from DOIs
Closed, ResolvedPublic0 Estimated Story Points

Description

Paste:

10.1002/1096-8628(20000612)96:3<302::aid-ajmg13>3.0.co;2-i

in Citoid and it will fill the citation, however, in the doi-field it wil add:

10.1002/1096-8628(20000612)96:33.0.CO;2-I

Which is a non-existing doi.

Checked on sv.wiki prodution

Event Timeline

Josve05a raised the priority of this task from to Needs Triage.
Josve05a updated the task description. (Show Details)
Josve05a added a project: Citoid.
Josve05a subscribed.

Hmm. It's getting caught by a fix that strips html tags out of any fields using a node library called stripTags. This is a validation measure to make sure we aren't accidentally sending html to wikis... not sure how to get around this except not to use it on the doi fields. I'll have to consult with security :).

Hmm. It's getting caught by a fix that strips html tags out of any fields using a node library called stripTags

Use a better library?

Stripping html tags does not sound like the correct solution here. Unless you're worried about dirty data that has extra html tags you don't want in it, I would expect that you would simply want to escape angled brackets (ie turn >, < into &gt; and &lt; respectively).

@Bawolff, That's exactly what we're worried about. It was originally a fix for some Zotero data that was coming in with div and i tags.

Fittingly their wikipedia translator, hah :).

Also, changing < into &gt; in a doi changes its meaning.

Also, changing < into &gt; in a doi changes its meaning

It should be different layers. turning < into &lt; when fed to the wiki, will eventually get turned back into a '<' when read by the user.

That's exactly what we're worried about. It was originally a fix for some Zotero data that was coming in with div and i tags.

I'm not exactly sure about the syntax of doi's, but that means you'd probably have to do something like strip html-type tags, but only those that are div, span, i, etc, and then escape angle brackets for the other uses of angle brackets.

Bawolff triaged this task as Medium priority.
Bawolff edited projects, added Security-Team; removed acl*security.

That's true, but we can't guarantee that every consumer of the doi field is
going to be html, so that makes me concerned. I think I'd be happier not
using striptags just on the doi field. The issue we had with polluting html
tags was not in the doi field and I think is unlikely to be. @mobrovac what
do you think?

I'm with @Mvolz that the best solution might be not to enforce tag stripping on all of the fields. But, before we do that, I'd like to see some research conducted into what happens if malicious tags (primarily <script>) are inserted in the DOI field by the client. How would Zotero process this? What about DOI indexers?

Personally, I'm sad to see such abuses of the DOI spec.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Mvolz renamed this task from Citoid converts ignores <302::aid-ajmg13> to Don't "strip tags" from DOIs.May 19 2021, 10:34 AM
Mvolz claimed this task.
Mvolz added a subscriber: AManWithNoPlan.