Page MenuHomePhabricator

Make use of Wikidata items to define if a URL should be using {{Cite web}} or {{Cite news}} and enhancing other fields in the citation template
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):
After an URL has been processed by the Zotero service, Citoid should access the relevant Wikidata item to assess if {{Cite web}} or {{Cite news}} should be used for the citation template.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
Currently, when https://www.bxtimes.com/orchard-beach-nature-center-reopens-after-2-35m-renovation-officials-hold-ribbon-cutting-event/ is processed, Citoid uses {{Cite web}} as the citation template.

Benefits (why should this be implemented?):
Ideally, it should be {{Cite news}} as bxtimes.com is a newspaper.

However, Zotero defaults to {{Cite web}} for itemType=article. This happens as bxtimes.com (at the time of filing this task) does not have a specific Zotero translator written yet and Zotero will default to "article" as a result of their default translator.
Conversely, for https://www.straitstimes.com/singapore/politics/tel-stage-4-from-tanjong-rhu-to-bayshore-to-open-for-passenger-service-on-june-23, Zotero will return as newspaperArticle as the item type as it has a translator written (by yours truly, it needs an update though), therefore Citoid will return the citation as {{Cite news}}.

Short of writing translators for every single newspaper in the world, what we can do is to make use of Wikidata items. From the Wikidata items, we can identify if the website is a website, newspaper, magazine, etc (check against https://aurimasv.github.io/z2csl/typeMap.xml for other possible type mapping), and also fill up other fields such as linking to the wiki article if there's one in the wiki, defining the article/website language, place of publication, issn/isbn numbers, etc.

A possible process

  1. Check if there is a translator. If there isn't one, proceed. If there is one, chances are that certain sets of URLs of a newspaper site have been defined otherwise, i.e. an About us page of the site is not newspaperArticle.
  2. Query against Wikidata for an item with official website (Property:P856) ~= bxtimes.com. This should return with https://www.wikidata.org/wiki/Q4974211
  3. From the Wikidata item, extract the relevant fields to enrich the respective values (example):
    1. If P31 (instance of) = newspaper, use {{Cite news}} template; magazine, use {{Cite magazine}}, etc.
    2. If P1476 (title) is set, set |work=<title>, in this case |work=Bronx Times-Reporter
    3. If P407 (language of work) is set, set |language= <language>, in this case |language=English (but if it is on enwiki, do not set since it is assumed that the references are in English unless otherwise specified)
    4. If there is a wiki article in that wiki, i.e. in enwiki, [[Bronx Times-Reporter]], set |work= [[<link>|<title>]], in this case [[Bronx Times-Reporter]] since both link and title are the same value.

Event Timeline

This seems like an excellent idea to me!

On further consideration, this should be layered with a cache somewhere to prevent potential excessive amounts of API calls being made on the fly to Wikidata. Either in the same vein of T369928 or a json file in the MediaWiki namespace on wiki with a list of key-value pairs of most frequently (for a start) cited domains and the relevant citation template to use. The json file then can be updated by the community as and when required/requested independently of the citoid server/extension development.