Page MenuHomePhabricator

Bloomberg is redirecting us to "Are you a robot?" captcha
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Attempt translation of a webpage in www.bloomberg.com

What happens?:
Selection steps relying on the page's HTML invariably return an unexpected output, because the server redirected us to an "Are you a robot?" webpage.

Citoid selection steps mostly work, but eventually they fail after repeated requests.

What should have happened instead?:
We should minimize redirections to this "Are you a robot?" page.

When I started testing, this was failing with XPath selection steps only. Citoid selections was working OK. I wondered whether including some of the headers included in the Citoid/Zotero request to their server (e.g., user-agent; see T302591) would help prevent/minimize this.

However, eventually, Citoid/Zotero requests would also start failing after repeated requests, both when running the Citoid service locally, or on Wikimedia's servers: T210871.

I wasn't able to hit the "Are you a robot?" page by repeatedly refreshing the page on my browser, though.

Event Timeline

diegodlh moved this task from To do to Backlog on the Web2Cit-Core board.

This feels related to T290834. Both have in common that we are getting an intermediary page, not only breaking translation, but also and more significantly, loosing the original URL. That is, if the user does not spot the error, they will insert a citation with a completely useless URL.

Confirm whether this still happens on the first attempt now that we include a user-agent header.