Page MenuHomePhabricator

Strip .pdf from arxiv identifiers before adding them
Closed, ResolvedPublic

Description

See https://en.wikipedia.org/w/index.php?title=Insertion_sort&diff=826540785&oldid=823450699 and https://en.wikipedia.org/w/index.php?title=Many-worlds_interpretation&type=revision&diff=826183381&oldid=825333139

In both cases, the bot adds something like "| arxiv = quant-ph/0211138v2.pdf", when it should just be " | arxiv = quant-ph/0211138v2", without the .pdf at the end

Event Timeline

Headbomb created this task.Feb 20 2018, 3:29 PM
Headbomb renamed this task from Strip .pdg from arxiv identifiers before adding them to Strip .pdf from arxiv identifiers before adding them.Feb 20 2018, 3:31 PM

I'm not sure if https://github.com/dissemin/oabot/commit/5361bf122c27ac6220fc113e1a3767fe7773f62b has been deployed already when this bug has been observed.

Nemo_bis closed this task as Resolved.Apr 30 2018, 10:44 AM
Nemo_bis claimed this task.

The problem with (.*)(.pdf)? is that the second part is optional while the first is greedy.

Let's just follow the identifier format: https://arxiv.org/help/arxiv_identifier
After https://github.com/dissemin/oabot/commit/de27de3fc40657ad2495cd892ab52576864ba8d5 , the edits look right.

@Nemo_bis that's probably because the edits were cached and generated by an earlier version

@Nemo_bis that's probably because the edits were cached and generated by an earlier version

Ah ok. Can I just run sed over the cache or are you working on an alternative fix?

I won't work on this for the next 2 weeks, the floor is yours!