Page MenuHomePhabricator

More user-friendly message for 520s (when the service is unable to obtain metadata about the resource)
Closed, ResolvedPublic1 Story Points

Description

Steps to reproduce:

1.Open the Autofill from url dialog from the menu
2.Type BBC and click on Lookup

Observed Result:
It is coming up with an invalid result "http://BBC".

And the following error in the console:
"NetworkError: 520 unknown - http://citoid.wikimedia.org/api?action=query&format=mediawiki&search=BBC"

After adding it and clicking on the url ofcourse tries to open the invalid url

Details

Related Gerrit Patches:
mediawiki/extensions/Citoid : masterAdd general error to the inspector

Event Timeline

Ryasmeen created this task.Mar 17 2015, 8:21 PM
Ryasmeen raised the priority of this task from to Needs Triage.
Ryasmeen updated the task description. (Show Details)
Ryasmeen added a project: VisualEditor.
Ryasmeen added a subscriber: Ryasmeen.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2015, 8:22 PM

This is expected behavior. When the user fills something in the 'lookup' field, the service expects a certain format. If the service doesn't recognize the format (say, the spacialized "DOI" format) it assumes it's a link. That is expected and is actually a good behavior, because it lets the user look up something like "foo.somewhere.us" and the service will assume it's a webpage.

The service, however, can't reach that page (in your case, "bbc" or "http://bbc") so it returns an error. We display that error to the user (if you notice, a MediaWiki error should pop up in the right hand top corner) *but* we also convert this to a webpage by adding http to it. You can see this automatic conversion in the link above that shows the reply from the citoid service, already adding the http:// prefix.

Citoid can't really mind-read what the user wants if the user inserts random strings, though, especially since it can also read DOIs and (soon?) ISSN/ISBN etc etc.
If it's not in the anticipated format, we assume it's a link in those cases, which I think is pretty good behavior.

The only thing we can try to fix here is the error message. The console message is correct (and it doesn't break anything, just lets us know that we've received a network error 520 from citoid) but if this looks off to the users, then maybe we can add another error message or some sort of indication that we've taken it upon ourselves to assume it's a link.

Jdforrester-WMF set Security to None.
Jdforrester-WMF triaged this task as Unbreak Now! priority.
Mvolz added a subscriber: Mvolz.Mar 18 2015, 2:21 PM

I definitely think we should make this more user friendly, but I'm not sure exactly how we should do it.

I think having a message for 520s like "Sorry, we were unable to create a rich citation for you from the information you entered. You must enter a valid URL, DOI, PMID, or PMCID."

We could also consider not appending the http in the response. Or at the very least we could do some very minimal validation, like not have a web citation at all if there's no period in the entry at all. (Validating URLs is actually pretty prone to false negatives and I'd like to be as greedy with that as possible.) By the way, a smarter backend is in the works for this sort of thing :)

@KHammerstein, thoughts?

Mvolz renamed this task from Lookup is returning invalid url in Autofill Citation dialog to More user-friendly message for 520s (when the service is unable to obtain metadata about the resource).Mar 18 2015, 3:28 PM
Mvolz moved this task from Backlog to Extension on the Citoid board.

@Mvolz, I agree, and I like your idea about not prepending the http://. In fact, what we could do in 520 errors is not prepend anything and, instead of assuming the reference is a web page (which we do now by using cite web) we should use the "Basic" reference with whatever the user typed in.

I think that might be better user experience, and we won't have to change the interface text too much.

Mvolz added a comment.Mar 18 2015, 5:28 PM

I think that's a good idea, generally.

However, sometimes we get a 520 not because user input is bad, but we've been IP blocked. So it might be a real url and putting the url in the url field and starting a webpage citation might be a good start. At some point we might even have slightly more data than that, for instance, 404 pages often have metadata about website title and the like.

So I guess the question is, how much of that do we portion into front end versus backend :).

From citoid's perspective, we *have* to have an itemType (otherwise it breaks compatibility with zotero), so I have to give you webpages if we don't know what it is. But we could have more error groups-

Originally 520s were for when there was no server found, which we can be more confident means it was an invalid url at least, but we've added ones with http errors; before that http errors returned 200s with whatever bad metadata there was. So we could have separate errors for, really, this is probably not a url, versus, we experienced http errors and it probably is a url. But then that's getting a bit complicated. We could return 200 for http errors and just only fill out fields we're confident of, like publication title, but not scrape the actual title since it will be something like 404. Then the user will see that the metadata is a bit sparse- but on the otherhand won't know why, in case they mistyped the full url or something.

Okay, if we can get the back end service to give us more specific errors that can definitely help. My biggest worry (and correct me if I misunderstood) is that since we allow the user to insert a lot of formats that are very VERY different than one another (DOIs, ISBNs in the future, URLs and PMID, etc) then in the front end we'll have some problems validating. The code will either need to have a gigantic validation method checking if the given lookup string fits any of those formats, or we just send it off to citoid backend (which already does that anyways, I think?) and work with the responses as best we can.

Ideally, we should be able to handle the difference between the inputs --

  • "amazon.com" (website citation)
  • "nonexistentwebs.it" (520 server not found, should be website citation)
  • "BBC" (520 server not found, should be basic citation)
  • "Some sentence the user inserts as reference" (520 server not found, should be basic citation)
  • ISBN-10 and -13 that should be book citations (I assume zotero can handle those)
  • DOIs that have a fairly complicated and varied structure.

etc.

If we can get the backend to be more specific in the errors it would make the front end solution easier for sure, but we should probably still find some compromise on what would be the best way to handle unidentified strings.

In the short term, let's just show the user an error if for any reason their query didn't result in one or more options. The error can say something like "We weren't able to automatically create a reference, please create your reference using a form".

Change 197718 had a related patch set uploaded (by Mooeypoo):
Add general error to the inspector

https://gerrit.wikimedia.org/r/197718

Change 197718 merged by jenkins-bot:
Add general error to the inspector

https://gerrit.wikimedia.org/r/197718

verified in Betalabs and test2