Page MenuHomePhabricator

Poor support in Zotero for major non-English language newspapers
Open, Needs TriagePublic

Description

This is not an issue with the template type map, also because NYTimes links work just fine. But the 3 major Italian newspapers are not recognized as such.
In this test, the quote marks were replaced with � marks.

Event Timeline

Elitre raised the priority of this task from to Needs Triage.
Elitre updated the task description. (Show Details)
Elitre added a project: Parsoid.
Elitre subscribed.
Elitre set Security to None.

@Elitre,

We get good results for news articles that are US based because we're using Zotero, and they have better covered in English than they do in Italian.

You can see a list of all the news outlets and publishers covered here:

https://github.com/zotero/translators

When Zotero doesn't have a translator, we have a fall-back web scraper that doesn't do as good a job at recognizing whether something is a newspaper or website. Almost everything right now will be declared a website. We have a bunch of tickets for improving this so hopefully it will get better with time- however, even if we get back good metadata from a news site, it is very hard to tell whether a website is a news institution or just a blog based only from metadata- most metadata will just call either an "article".

Our default is to call everything a website, because we can really be guaranteed that it is... it just also happens to be a news article :).

If you would like to file tickets in https://github.com/zotero/translators asking for translators specifically for each of these major italian newspapers, you can post the tickets here as well.

(tl;dr: we know why, it will hopefully get better with time, but probably
not much better until people add specific translators to zotero for those
sites)

Is there a templated request that you could give in case someone really wants to create an account there and ask? Thank you.

Not really, https://github.com/zotero/translators/issues/new and just type what you want.

Unfortunately it looks like there's quite a backlog though...

https://github.com/zotero/translators/labels/New%20Translator

File a bug about unrecognized quotes though (here, not in zotero). It looks like some encoding issues with the « quote mark. Or we could just convert that thread to that issue :).

https://github.com/zotero/translators/issues/826

Looks like someone made a request about major French newspapers as well, it might be up to us to do this work if we want native Zotero translators for them.

I'd guess so. It wouldn't be much useful to send people there to ask for something Zotero doesn't have the capacity to give them.

@Elitre, it still makes sense to request them. It just depends if I/anyone else ever has time to add new translators or not :). If we write them, they'll likely get merged. But if you think it's "false hope" then I guess not.

Of course! If someone at WMF commits to add them so that they get merged, asking makes a lot of sense.

@Qgil: Is writing translations for Zotero a task suitable for GSOC or similar initiatives? Thanks :)

@Elitre, Zotero is a separate open source project, and no WMF members have +2 on the translator repo. It would probably be weird/against some rules somewhere for one organisation to organise contributions to another organisation- I think the proper way to do this would probably be to encourage Zotero to join such initiatives and maybe offer to co-mentor?

We do have our own translator repo, so we could technically just encourage contributions to that one, but I think it's preferable if these changes go upstream, particularly new translators, which would be useful to the project at large.

@Elitre, looking at https://www.zotero.org/support/dev/translators#metadata , a possible Google Code-in task could be i.e. to write five translators from a list that someone at Wikimedia would maintain. This should be done after discussing this collaboration with Zotero. The next GCi edition is expected to start in November, so you have time to plan. :)

Mvolz renamed this task from Find out why Italian news outlets link generate a "Cite web" template rather than a "Cite news" one to Poor support in Zotero for major Italian newspapers.Apr 8 2015, 8:52 AM
Mvolz renamed this task from Poor support in Zotero for major Italian newspapers to Poor support in Zotero for major foreign language newspapers.Apr 29 2015, 1:16 PM
Mvolz added subscribers: Nnemo, Liuxinyu970226, Josve05a.
Mvolz renamed this task from Poor support in Zotero for major foreign language newspapers to Poor support in Zotero for major non-English language newspapers.Apr 29 2015, 1:19 PM

I might be able to help with this, if we have a priority list for translators

(None of the major newsites in Swedish is recognized as such. Note the sv.wp is the second largest Wikipedia, in number of articles. Just saying.)

FWIW: the Swedish Wikipedia has a list of "most frequent domains" at https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:Edgars2007/Most_frequent_domains (https://phabricator.wikimedia.org/P691 is the source). It doesn't specify which ones are news sites though.

FWIW: the Swedish Wikipedia has a list of "most frequent domains" at https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:Edgars2007/Most_frequent_domains (https://phabricator.wikimedia.org/P691 is the source). It doesn't specify which ones are news sites though.

News sites extracted from that link, in order of use:

www.dn.se
www.svd.se
sverigesradio.se
www.aftonbladet.se
news.bbc.co.uk
arkivet.dn.se
www.expressen.se
www.svt.se
www.bbc.co.uk
www.sr.se       (Also a radio-station, so some my be news, some not)
www.nytimes.com
www.sydsvenskan.se
www.gp.se
www.tv4.se    (Also a TV-channel, so some my be news, some not)
www.telegraph.co.uk
wwwc.aftonbladet.se
www.theguardian.com
www.bbc.com
svenska.yle.fi
www.dailymail.co.uk
news.google.com
www.reuters.com
www.dagensmedia.se
query.nytimes.com
www.huffingtonpost.com
www.corren.se
www.skanskan.se
www.di.se

I did a test of the five most cited news sites on sv.wp here: https://sv.wikipedia.org/w/index.php?https://sv.wikipedia.org/w/index.php?title=Anv%C3%A4ndare:Josve05a/Citoid&oldid=36543513 and I did a manual fill of the links as well to compare what te difference was.

FYI there is a related grant request here, which proposes a tool to help non-technical users create and edit translators: https://meta.wikimedia.org/wiki/Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators