Page MenuHomePhabricator

Return >1 results from search in citoid service in the absence of a url, doi, isbn or pmcid/pmid
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Type in "Hamas' use of human shields in Gaza' (PDF), NATO Strategic Communications Centre of Excellence" in the VE automatic citation generator. Click create.
  • Get the information about an unrelated book chapter (ISBN 978-1-349-45658-1)

What happens?:

  • The wrong citation is returned. When you type in only the title of the NATO report, you still get the book chapter. When you cut out even more words, you get a third unrelated citation, a website this time.

What should have happened instead?:

  • It should have given no output and an error message that there isn't enough information. (bonus is that citoid guesses and lets us know it's just a guess?)

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Testing
This was deployed on Wed, Jan 7th. You now receive 2 results if you search for a string like "viral phylodynamics". If you search for a full citation that contains a url i.e. "Example. http://www.example.com" you will receive up to 3 if the url has metadata available. You will only receive 1 result of the string is a url, isbn, pmid, pmcid, or doi.

Details

Event Timeline

This service is provided by crossref search. It's less than ideal but I'm not sure it's worth investing a lot of time into? (If any).

Maybe we could return a few more search results rather than just the top one to make it clear what's happening?

https://search.crossref.org/search/works?q=Hamas%27+use+of+human+shields+in+Gaza%27+%28PDF%29%2C+NATO+Strategic+Communications+Centre+of+Excellence&from_ui=yes

I imagine editors would only notice this in 50-70% of cases (as the title will partially match), meaning that this bug has the potential to lead to a lot of mistakes with source-text integrity.

I love your solution. If we give multilpe search results, we slightly increase the chance the correct title is returned. We greatly increase the chance that editors notice it's just a guess from the software.

Mvolz renamed this task from citoid creates wrong citation from string without doi/isbn identifier to Return >1 result from crossRef search in citoid service.Jan 30 2025, 11:45 AM
Mvolz triaged this task as Low priority.

@Mvolz, why has this been given a low priority?

I opened this bug report as somebody was almost tbanned for not noticing the error in citoid, as her defense (it wasn't me, it was the software) seemed so outlandish that preceding admins had not taken the time to check. I did and was shocked by this output. We're likely introducing text-source errors at a large scale if we get almost random citations from citoid.

Maybe we could return a few more search results rather than just the top one to make it clear what's happening?

I do think it is a problem that the UX doesn't indicate this as a search. For an ISBN or different type of unique id it makes sense, but this just does arbitrary selection based on a few words now, which is not what people expect in this case probably.

Probably it should have an intermediate screen with language like: Maybe you meant one of these three things... and allow you to inspect and verify these results before inserting a specific one.

I agree that this is a problem that should be higher priority. Honestly, if a quick fix is desired, we could even disable the title search outright and only allow ISBN, etc. I have literally never had the correct item show up when I have used title search. Frankly, I had completely forgotten you even could title search with the citations tool.

Per offline discussion, we're going to:

  1. Build a PoC to explore the viability of showing multiple search results, per what @Mvolz described in T382446#10476282
  2. If "1." proves viable, prioritize work on improving the UX via T413679

Change #1223195 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] Return two results from CrossRef search

https://gerrit.wikimedia.org/r/1223195

Per offline discussion, we're going to:

  1. Build a PoC to explore the viability of showing multiple search results, per what @Mvolz described in T382446#10476282
  2. If "1." proves viable, prioritize work on improving the UX via T413679

This helps avoid confusion, but if we want to actually provide more useful search results to people, I would additionally suggest the following:

  1. Re-open discussions with worldcat to provide book search results T352571 or
  2. Search wikidata for books or
  3. Search open library for books T413785

And then:

Score results via Levenshtein distance or some other algorithm from original input, and possible implement a threshold.

Plausibly once the redesign happens we might also only want to return 1 or results if it looks like the others are too irrelevant based on distance.

Change #1223195 merged by jenkins-bot:

[mediawiki/services/citoid@master] Return two results from CrossRef search

https://gerrit.wikimedia.org/r/1223195

Mvolz raised the priority of this task from Low to Medium.

@Mvolz can you please update the task description with the experience you ended up implementing so that editing qa is equipped with the info. they need to verify this?

Mvolz renamed this task from Return >1 result from crossRef search in citoid service to Return >1 results from search in citoid service in the absence of a url, doi, isbn or pmcid/pmid.Jan 14 2026, 4:26 PM
Mvolz updated the task description. (Show Details)

@Mvolz can you please update the task description with the experience you ended up implementing so that editing qa is equipped with the info. they need to verify this?

I have done this and also moved it to the QA column, but to be honest I don't think this really needs QA as automated tests verify the number of citations in the back-end have now increased according to the description.

ldelench_wmf edited projects, added Skipped QA; removed Editing QA.