Page MenuHomePhabricator

Citoid requests for YouTube metadata is giving 429: too many requests HTTP error
Open, Stalled, LowestPublic

Description

Citoid is no longer populating information for Youtube

Originally reported at enwiki (c.f. https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=961340886#Are_automatic_citations_to_Youtube_videos_broken?)

Steps to reproduce:

  1. Use the Visual Editor to edit a page
  2. Click "Cite"
  3. Enter a youtube link (e.g. https://www.youtube.com/watch?v=j3prAYfIL28)
  4. STOP, FAILS

Expected output: a citation will be generated
Current output: an error message, "We couldn't make a citation for you. You can create one manually using the "Manual" tab above."

Event Timeline

FYI: these are also failing using the legacy toolbar tool (see discussion linked in description)

Xaosflux renamed this task from Citoid is no longer autfilling information for youtube.com to Citoid is no longer populating information for Youtube.Jun 8 2020, 4:02 PM
Xaosflux updated the task description. (Show Details)

@Mvolz - Is this something you could look into?

I've found a few youtube videos which do work but its inconsistent, I'm also getting issues with some newspaper articles eg

https://www.washingtonpost.com/dc-md-va/2020/06/07/dc-black-lives-matter-defund-police/

Mvolz renamed this task from Citoid is no longer populating information for Youtube to Citoid requests for YouTube metadata is giving 429: too many requests HTTP error.Jun 9 2020, 9:31 AM

We've been blocked, it looks like.

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2020.06.09/citoid?id=AXKYaGn5sWch-KE77UY1&_g=h@0a3b269

YouTube is giving us a 429; too many requests HTTP error :/

Thanks for finding the issue, would it be possible to provide a graph or spreadsheet on the number of citoid requests to Youtube for the past few years? I feel like it would be really interesting for researchers to see and may reflect a larger availability of good quality sources in video (Youtube is something like 95% of all video).

@zhuyifei1999 @Varnent do you have contacts at YouTube / Google for potentially getting citoid whitelisted for this use case?

Sorry, I have COI.

@zhuyifei1999 @Varnent do you have contacts at YouTube / Google for potentially getting citoid whitelisted for this use case?

Sorry, I have COI.

NP, Wikipedia Library is looking into it ^-^

In T254700#6205179, @Mrjohncummings wrote:

We've been blocked, it looks like.

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2020.06.09/citoid?id=AXKYaGn5sWch-KE77UY1&_g=h@0a3b269

YouTube is giving us a 429; too many requests HTTP error :/

Thanks for finding the issue, would it be possible to provide a graph or spreadsheet on the number of citoid requests to Youtube for the past few years? I feel like it would be really interesting for researchers to see and may reflect a larger availability of good quality sources in video (Youtube is something like 95% of all video).

Unfortunately we don't keep statistics or logs on this sort of thing. We only log failed requests for debugging purposes.

@Mvolz - Can you tell if we're still getting 429 errors from YouTube? (Also, can you tell me how to look that up in Kibana?)

@Mvolz - And if we are still getting 429 errors, can you let me know all the info that Google would need to fix this on their end. For example, the URLs we are hitting and what IP address the requests are coming from.

@kaldari I just tried and it doesn't work, same 'We couldn't make a citation for you. You can create one manually using the "Manual" tab above.
'
here's the URL I tried https://www.youtube.com/watch?v=zH_ZNzsNLmI

@Mvolz - Can you tell if we're still getting 429 errors from YouTube? (Also, can you tell me how to look that up in Kibana?)

Yup, in search bar search for:

type:citoid AND err_body_internalURI:"https://www.youtube.com/watch?v=zH_ZNzsNLmI"

or whichever video you try.

Still happening: https://logstash.wikimedia.org/app/kibana#/discover?_g=()&_a=(columns:!(_source),index:'logstash-*',interval:auto,query:(query_string:(query:'type:citoid+AND+err_body_internalURI:%22https:%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DzH_ZNzsNLmI%22')),sort:!('@timestamp',desc))

@Mvolz - Do you know what IP addresses the citoid requests to YouTube would be coming from?

More info from Marielle: Citoid's user-agent string is "Citoid (Wikimedia tool; learn more at https://www.mediawiki.org/wiki/Citoid)" but we also need Zotero allowed.

Talked with some folks at YouTube today. They explained that the https://www.youtube.com/watch URLs are intended for human viewers only and not for machine requests. They have another endpoint that we should be able to use to pull the metadata using Citoid. We'll just have to remap the URLs on the fly. They're going to send me some more info by email shortly.

Talked with some folks at YouTube today. They explained that the https://www.youtube.com/watch URLs are intended for human viewers only and not for machine requests. They have another endpoint that we should be able to use to pull the metadata using Citoid. We'll just have to remap the URLs on the fly. They're going to send me some more info by email shortly.

In which case this will require modifications to the zotero translator; I'm not sure to what extent translators are able to remap urls. https://github.com/zotero/translators/blob/master/YouTube.js

Talked with some folks at YouTube today. They explained that the https://www.youtube.com/watch URLs are intended for human viewers only and not for machine requests. They have another endpoint that we should be able to use to pull the metadata using Citoid. We'll just have to remap the URLs on the fly. They're going to send me some more info by email shortly.

In which case this will require modifications to the zotero translator; I'm not sure to what extent translators are able to remap urls. https://github.com/zotero/translators/blob/master/YouTube.js

A brief Google suggests it requires an API key, so it might not be possible to do this upstream; Zotero doesn't support api keys in general https://developers.google.com/youtube/v3/docs

@Mvolz - Yes, YouTube suggested we use the video list API, which as you point out requires a Google API key. I've asked them if there's anything else we could use that doesn't require authentication.

kaldari claimed this task.

I asked YouTube to whitelist our Zotero user agent string "ZoteroTranslationServer/WMF (mailto:services@lists.wikimedia.org)" and it seems to be working again. Citoid requests for YouTube URLs no longer seem to be returning an error. The data it does return isn't super useful, but it's better than an error:

CiteTB.autoFill({"title":"YouTube","journal":"www.youtube.com"}, 'web', 'url')

I asked YouTube to whitelist our Zotero user agent string "ZoteroTranslationServer/WMF (mailto:services@lists.wikimedia.org)" and it seems to be working again. Citoid requests for YouTube URLs no longer seem to be returning an error. The data it does return isn't super useful, but it's better than an error:

CiteTB.autoFill({"title":"YouTube","journal":"www.youtube.com"}, 'web', 'url')

I'm still seeing the error in logs; https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2020.07.09/citoid?id=AXMzMjzu3_NNwgAU2dEI&_g=h@44136fa

https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/https%3A%2F%2Fm.youtube.com%2Fwatch%3Fv%3DklZeS-g0UHo

@Mvolz - Thanks for catching that. I'll follow up with YouTube.

@Mvolz - Thanks for catching that. I'll follow up with YouTube.

Did you give them the citoid agent string too? I think it needs both.

Having the same issue on all Youtube links I've tried, including ones that worked after I reported the issue originally

@Mvolz - I heard back from YouTube. Apparently, they can't whitelist by user agent, but they can whitelist by IP address. Is there a consistent IP address or IP range that the Zotero requests would come from?

I told them to whitelist 208.80.152.0/22 and 2620:0:860::/46. Hope that's right.

Reply from YouTube: "Our Trust&Safety team, which is responsible for whitelisting is requesting a much smaller set of IPs. Would it be possible for you to bind Citoid service to static IPs or a smaller prefix group?"

Reply from YouTube: "Our Trust&Safety team, which is responsible for whitelisting is requesting a much smaller set of IPs. Would it be possible for you to bind Citoid service to static IPs or a smaller prefix group?"

That is not advisable. It will become a maintenance burden both for us and them and a source of continuous problems and friction between the 2 organizations. It will expose them to how our network is structured and require that they adapt to whatever changes we make to it.

We can provide them with assurances that we very tightly control that IP space and that only sanctioned WMF production applications will reach out from that IP space.

Thanks for the feedback @akosiaris. I've conveyed it to our contact at YouTube. I also mentioned that binding Citoid to a static IP address would probably increase the chances of it being blocked from other websites.

I'm still getting the error, is there anything I can do as a non programmer to help?

Hello All,

Yael here, Director of Strategic Partnerships at the Foundation.

First, I want to apologize for the painfully long time it has taken me or anyone on my team to follow up from @Varnent's previous message. At first, the reason for the delay was ongoing (initially hopeful, productive-seeming) conversations with YouTube. More recently, the reason was... 2020 life getting in the way and me dropping the ball on communicate back to you all.

Unfortunately, my update is not what I would have wanted to share. Despite ongoing conversations (involving folks from Partnerships, Product and Legal from both organizations), we were not able to reach any resolution in our discussions with YouTube about this issue, and, unfortunately, I do not expect any changes from them coming in the future.

I'm personally disappointed about this, as I feel we offered them a potential way to work closely with the movement and support our mission with little risk or downside to their business. Unfortunately, they have chosen not to prioritize this at the moment.

I'm sorry I don't have happier news, and thank you to all of you who continued to try to find a solution. Again, my apologies that I haven't been more communicative about this (and thanks to @bd808 for continuing to nudge me.

Please don't hesitate to reach out to me directly at yweissburg@wikimedia.org if you have any questions.

Cheers,

Yael

Yael-weissburg changed the task status from Open to Stalled.Sep 18 2021, 12:03 AM
Yael-weissburg triaged this task as Lowest priority.