Page MenuHomePhabricator

Decreased internationalisation of automatic citations as a result of switch to new translation-server
Closed, ResolvedPublic

Description

Regression: because the new translation-server doesn't pass through the request context (accept-language) but ingests more urls than the old translation-server, our internationalisation of citations will decrease and there will be some regressions where previous citations were in their requested language but will now be in English.

This is blocked by https://github.com/zotero/translation-server/issues/16

Event Timeline

Mvolz created this task.Nov 29 2018, 10:58 PM
Mvolz changed the task status from Open to Stalled.
Mvolz triaged this task as Normal priority.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 29 2018, 10:58 PM

@Mvolz this sounds like a serious consequence for non-english projects. What is the product decision here? Is this a blocker for switching Citoid to use the new Zotero server? Please consult with the appropriate people. We planned on doing the switch on Tuesday 2018-12-04, but will hold it until we have a clear response here.

Mvolz added a comment.EditedDec 3 2018, 12:39 PM

IMO it's relatively minor, and also not a new issue; it's in the patch summary for the update to citoid as well.

But basically it only affects websites which accept the accept-language header and return the site in a different language based on that. So in tests, that was scraping Twitter.com only where we saw this. Although maybe we could have better coverage on this.

Most non-English newspapers still return meta data about the article in the language it was written in and do not change this based on the accept language header and are unaffected. I have not been able to find a counter example.

And any website which was previously scraped by Zotero will show no change; the regression is only for websites where we previously used our native scraper and did not use Zotero. This is a small subset of websites.

We do have a lot of work to do on internationalisation but for the most part scraping a website gives metadata in the language it's written in. (This is actually a bigger problem for books and pmid/pmcid where we're using English language databases.)

We've implemented Accept-Language forwarding upstream.

Mvolz moved this task from Backlog to Service on the Citoid board.Dec 11 2018, 11:31 AM
Mvolz changed the task status from Stalled to Open.Dec 11 2018, 5:26 PM

Change 479012 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] Pass accept-language header to Zotero (update req)

https://gerrit.wikimedia.org/r/479012

Change 479020 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/zotero@master] Squashed commit of the following:

https://gerrit.wikimedia.org/r/479020

Change 479012 merged by jenkins-bot:
[mediawiki/services/citoid@master] Pass accept-language header to Zotero (update req)

https://gerrit.wikimedia.org/r/479012

Change 479020 merged by jenkins-bot:
[mediawiki/services/zotero@master] Update Zotero to 8b8355a

https://gerrit.wikimedia.org/r/479020

Mvolz closed this task as Resolved.Jan 31 2019, 1:40 PM
Mvolz claimed this task.
Mvolz removed a project: Patch-For-Review.

Mentioned in SAL (#wikimedia-operations) [2019-02-11T21:09:15Z] <mobrovac@deploy1001> Started deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755

Mentioned in SAL (#wikimedia-operations) [2019-02-11T21:13:03Z] <mobrovac@deploy1001> Finished deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755 (duration: 03m 47s)