Page MenuHomePhabricator

Citoid created references for Youtube videos only show title, not channel or release date
Open, Needs TriagePublic

Description

Something is going wrong with references produced by Citoid of Youtube videos on English Wikipedia, it only shows the title of the video in the reference produced, there should also be:

  • That it is on Youtube
  • The name of the channel (e.g BBC, Guardian, etc)
  • The date published
  • Length (although that seems less important)

Example link: https://www.youtube.com/watch?v=5iSd8OBYOMM
Citoid output: https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D5iSd8OBYOMM

Event Timeline

Youtube has an Zotero translator at https://github.com/zotero/translators/blob/master/YouTube.js and in it title, url, duration, date, author and description are configured. It is expected that most of these are returned by Citoid, except description and duration, which are not configured in the en:Template:Citation TemplateData Citoid mapping.
Looking at ZoteroBib, it returns mostly the same information as Citoid, except that it does also show the website. For some reason citoid is excluding that.

I'm not sure how zotero returns channel name and date published, but it's not being returned by citoid at least. The fact that it is on YouTube can be retrieved from the "libraryCatalog": "www.youtube.com" attribute.

I'm not sure how zotero returns channel name and date published, but it's not being returned by citoid at least. The fact that it is on YouTube can be retrieved from the "libraryCatalog": "www.youtube.com" attribute.

thanks, who might know how to fix this?

I'm not sure how zotero returns channel name and date published, but it's not being returned by citoid at least. The fact that it is on YouTube can be retrieved from the "libraryCatalog": "www.youtube.com" attribute.

thanks, who might know how to fix this?

Potentially we could switch it from using Template:Citation to Template:Cite AV media and configure it to use the libraryCatalog as the work. I don't advise adding the library catalog to "website" on Template:Citation as this will have side effects for other, non youtube things. Of course I'm not sure adding it for videos is great either. Type videoRecording doesn't have a website field in general, these are the available fields: https://aurimasv.github.io/z2csl/typeMap.xml#map-videoRecording

Re: duration - we have that, there just isn't a place to put that in Template:Citation / Template:Cite AV as far as I know. There's "time" but that's to reference a point in time, I think.

Re: failure to get author; could be reported here https://github.com/zotero/translators/issues and potentially be fixed. It looks like there is code in there to set the channel as author.-> https://github.com/zotero/translators/blob/master/YouTube.js#L106 But I think this is javascript loaded html it's scraping, which won't be available to us because we are not a browser and can't execute javascript (zotero used to be but hasn't been for several years now). I did find some schema.org metadata that the translator could use instead that would be available to us, so potentially that's one fix that could go upstream.

The date - unfortunately seems to be javascript loaded entirely. Might not be possible to get this using html scraping.

Thanks very much, I guess time could be very useful for people to add the time in the video the reference is available at, is this possible or helpful?

@Mvolz I'm just coming back to this as I've tried to cite some news channels on Youtube again and it is still quite bad. Is the backend stuff working better now? Is there any way to provide a list of the data you get from Youtube just to see if even the fields can be used better? According to Wikipedia Youtube is the second most used website in the world so its a real issue if the Cite tool creates poor quality citations for Wikipedia. https://en.wikipedia.org/wiki/List_of_most-visited_websites

Also under which team would this fall under in WMF?

I'm not sure how zotero returns channel name and date published, but it's not being returned by citoid at least. The fact that it is on YouTube can be retrieved from the "libraryCatalog": "www.youtube.com" attribute.

thanks, who might know how to fix this?

Potentially we could switch it from using Template:Citation to Template:Cite AV media and configure it to use the libraryCatalog as the work. I don't advise adding the library catalog to "website" on Template:Citation as this will have side effects for other, non youtube things. Of course I'm not sure adding it for videos is great either. Type videoRecording doesn't have a website field in general, these are the available fields: https://aurimasv.github.io/z2csl/typeMap.xml#map-videoRecording

Re: duration - we have that, there just isn't a place to put that in Template:Citation / Template:Cite AV as far as I know. There's "time" but that's to reference a point in time, I think.

Re: failure to get author; could be reported here https://github.com/zotero/translators/issues and potentially be fixed. It looks like there is code in there to set the channel as author.-> https://github.com/zotero/translators/blob/master/YouTube.js#L106 But I think this is javascript loaded html it's scraping, which won't be available to us because we are not a browser and can't execute javascript (zotero used to be but hasn't been for several years now). I did find some schema.org metadata that the translator could use instead that would be available to us, so potentially that's one fix that could go upstream.

I have tried to report the big in Zotero but have absolutely no idea what I'm doing and the note in the new issue basically says don't report issues here. Please can you suggest a way forward for this issue :)

While we're on this topic, people really, really, really need to stop doing |publisher=YouTube. Unless you're citing something like YouTube's own terms-of-use policy (something actually published by YouTube), it's |via=YouTube, and the |publisher= (if not omitted as often completely redundant with the name of the channel, which belongs in |work=, with the specific video title in |title=) must be the name of the entity that editorially controls and originated the content. YouTube is just a conduit/platform/carrier. Doing |publisher=YouTube is like saying that your mobile service provider owns and originated and has editorial control over your phone conversations with your sister.

Citoid and any related tool being worked on here should comply with this, and not generate a false |publisher=YouTube in citation code.