Page MenuHomePhabricator

Citoid fails to properly process references to zeit.de, picks cookie banner instead
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Use Citoid in Visual Editor or Wikitext 2017 editor to insert a reference into an article from zeit.de

What happens?:

What should have happened instead?:

  • The metadata for the selected web page should appear in the reference instead.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:

  • German Wikipedia, as of today.

I would like to add that I have personal access to Zeit Online (zeit.de) through The Wikipedia Library and I would like to use zeit.de more easily on Wikipedia.

The cookie banner Citoid picks instead of the actual article appears on other websites as well. It is a common feature we find according to European data protection law (GDPR) throughout the EU. So it should be possible to use Citoid also with zeit.de.

Thanks in advance!

Event Timeline

I think this bug report could do with a bit of attention, couldn't it? Thanks.

Currently zeit.de redirects the reader, and therefore Citoid, to a page where the reader can choose between reading with ads or paying for the content.

The JSON object zeit.de returns via Citoid:

  {
  "0": {
		"key": "MYN2NSQS",
		"version": 0,
		"itemType": "webpage",
		"url": "https://www.zeit.de/zustimmung?url=https%3A%2F%2Fwww.zeit.de%2Fwirtschaft%2F2022-06%2Ferdbeeren-ernte-verkauf-preis",
		"title": "ZEIT ONLINE | Lesen Sie zeit.de mit Werbung oder im PUR-Abo. Sie haben die Wahl.",
		"abstractNote": "",
		"accessDate": "2022-06-05",
		"websiteTitle": "www.zeit.de",
		"source": [
			"Zotero"
		]
        } 
  }

Translation for the title: "ZEIT ONLINE | Read zeit.de with advertising or with a PUR subscription. The choice is yours."

grafik.png (445×681 px, 42 KB)

Possible solutions:

  1. Improve Citoid so that it recognize such intermediate pages seeking for consent and simulate a click to get the real page. Using Firefox I have installed an addon for this: https://www.i-dont-care-about-cookies.eu/
  2. Give the user a warning that Citoid cannot fetch the real content
  3. Block the creation of such a reference

This also affects Web2Cit (a tool to collaboratively work around automatic citation problems), both where it relies on Citoid (i.e., Citoid selection steps) and where it relies on webpage's HTML (i.e., XPath selection steps). I hope it's OK that I add the Web2Cit-Core tag too.

Edit: Curiously, when a user agent is not included in the request headers (which is what Web2Cit currently does, although it shouldn't: T302591), the intermediary page is not shown:

  • cURL request with mock user agent header: redirect to intermediary page: curl -L -H "User-Agent: Web2Cit" https://www.zeit.de/wirtschaft/2021-09/bundesfinanzministerium-razzia-geldwaesche-strafvereitelungpezialeinheit-ermittlungen-bundesjustizministerium
  • cURL request without user agent header: expected page is loaded: curl -L https://www.zeit.de/wirtschaft/2021-09/bundesfinanzministerium-razzia-geldwaesche-strafvereitelungpezialeinheit-ermittlungen-bundesjustizministerium