This is not an issue with the template type map, also because NYTimes links work just fine. But the 3 major Italian newspapers are not recognized as such.
In this test, the quote marks were replaced with � marks.
Description
Related Objects
- Mentioned In
- T160273: Create Zotero translator-writing priority list for each language Wikipedia
T148320: Documenting process of writing Zotero translators through translation-servers
T115158: Write a Zotero translator and document process for creating new Zotero translator and getting it live in production
Event Timeline
We get good results for news articles that are US based because we're using Zotero, and they have better covered in English than they do in Italian.
You can see a list of all the news outlets and publishers covered here:
https://github.com/zotero/translators
When Zotero doesn't have a translator, we have a fall-back web scraper that doesn't do as good a job at recognizing whether something is a newspaper or website. Almost everything right now will be declared a website. We have a bunch of tickets for improving this so hopefully it will get better with time- however, even if we get back good metadata from a news site, it is very hard to tell whether a website is a news institution or just a blog based only from metadata- most metadata will just call either an "article".
Our default is to call everything a website, because we can really be guaranteed that it is... it just also happens to be a news article :).
If you would like to file tickets in https://github.com/zotero/translators asking for translators specifically for each of these major italian newspapers, you can post the tickets here as well.
(tl;dr: we know why, it will hopefully get better with time, but probably
not much better until people add specific translators to zotero for those
sites)
Is there a templated request that you could give in case someone really wants to create an account there and ask? Thank you.
Not really, https://github.com/zotero/translators/issues/new and just type what you want.
Unfortunately it looks like there's quite a backlog though...
https://github.com/zotero/translators/labels/New%20Translator
File a bug about unrecognized quotes though (here, not in zotero). It looks like some encoding issues with the « quote mark. Or we could just convert that thread to that issue :).
https://github.com/zotero/translators/issues/826
Looks like someone made a request about major French newspapers as well, it might be up to us to do this work if we want native Zotero translators for them.
I'd guess so. It wouldn't be much useful to send people there to ask for something Zotero doesn't have the capacity to give them.
@Elitre, it still makes sense to request them. It just depends if I/anyone else ever has time to add new translators or not :). If we write them, they'll likely get merged. But if you think it's "false hope" then I guess not.
Of course! If someone at WMF commits to add them so that they get merged, asking makes a lot of sense.
@Qgil: Is writing translations for Zotero a task suitable for GSOC or similar initiatives? Thanks :)
@Elitre, Zotero is a separate open source project, and no WMF members have +2 on the translator repo. It would probably be weird/against some rules somewhere for one organisation to organise contributions to another organisation- I think the proper way to do this would probably be to encourage Zotero to join such initiatives and maybe offer to co-mentor?
We do have our own translator repo, so we could technically just encourage contributions to that one, but I think it's preferable if these changes go upstream, particularly new translators, which would be useful to the project at large.
@Elitre, looking at https://www.zotero.org/support/dev/translators#metadata , a possible Google Code-in task could be i.e. to write five translators from a list that someone at Wikimedia would maintain. This should be done after discussing this collaboration with Zotero. The next GCi edition is expected to start in November, so you have time to plan. :)
(None of the major newsites in Swedish is recognized as such. Note the sv.wp is the second largest Wikipedia, in number of articles. Just saying.)
FWIW: the Swedish Wikipedia has a list of "most frequent domains" at https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:Edgars2007/Most_frequent_domains (https://phabricator.wikimedia.org/P691 is the source). It doesn't specify which ones are news sites though.
News sites extracted from that link, in order of use:
www.dn.se www.svd.se sverigesradio.se www.aftonbladet.se news.bbc.co.uk arkivet.dn.se www.expressen.se www.svt.se www.bbc.co.uk www.sr.se (Also a radio-station, so some my be news, some not) www.nytimes.com www.sydsvenskan.se www.gp.se www.tv4.se (Also a TV-channel, so some my be news, some not) www.telegraph.co.uk wwwc.aftonbladet.se www.theguardian.com www.bbc.com svenska.yle.fi www.dailymail.co.uk news.google.com www.reuters.com www.dagensmedia.se query.nytimes.com www.huffingtonpost.com www.corren.se www.skanskan.se www.di.se
I did a test of the five most cited news sites on sv.wp here: https://sv.wikipedia.org/w/index.php?https://sv.wikipedia.org/w/index.php?title=Anv%C3%A4ndare:Josve05a/Citoid&oldid=36543513 and I did a manual fill of the links as well to compare what te difference was.
FYI there is a related grant request here, which proposes a tool to help non-technical users create and edit translators: https://meta.wikimedia.org/wiki/Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators