Page MenuHomePhabricator

citoid/zotero creates citations without title for makorrishon.co.il, israelhayom.co.il, and news.walla.co.il
Open, Needs TriagePublic

Description

At least three popular Hebrew news websites often fail to add the title field (כותרת) when Visual Editor creates an automatic footnote from them. These sites are makorrishon.co.il, israelhayom.co.il, and news.walla.co.il. It doesn't happen all the time, but I found three URLs where it happens consistently. It happens with many other URLs from these sites.

To reproduce:

Expected: A <ref> is supposed to be added with the template קישור כללי, and the parameter כותרת ("title") is supposed to be filled.

Observed: A <ref> is added with the template קישור כללי, and without the parameter כותרת.

See this page for examples: https://he.wikipedia.org/wiki/User:Amire80/test-citoid-empty-title

I'm not sure whether it is a bug somewhere in the Citoid/VE/Cite pipeline, or a misconfiguration in the Hebrew Wikipedia. Any help will be appreciated. Thanks! :)

Meta-comments:

  • This is somewhat similar to T241293, but maybe the issue is different, and the examples here are, hopefully, more detailed. But if they are the same, feel free to merge them.
  • I'm not sure whether the problem is with Citoid, VisualEditor, or Cite, so tagging all of them. Remove what's unnecessary. Thanks!

Event Timeline

If it works with other websites, it's probably a Citoid problem.

Mvolz renamed this task from Visual Editor adds citation templates without title for makorrishon.co.il, israelhayom.co.il, and news.walla.co.il to citoid/zotero creates citations without title for makorrishon.co.il, israelhayom.co.il, and news.walla.co.il.Jul 22 2020, 8:59 AM
Mvolz removed projects: Cite, VisualEditor.

It's possible we might also be getting blocked or something because when I first ran it I was able to get the metadata, but the second time not. It's weird though, When I try to get the html of these locally, it's just a bunch of gibberish - basically a <script> tag in between an html tag and that's it.

If I had to guess it might be a captcha or something, but I don't know for sure because we're not a browser and aren't able to interpret what it does!

Anyway I think we can chalk this up to the website not liking that we're not a browser regardless of whether it's intentional or just unfriendly.

The no-title thing is definitely bad though, we should probably just show "couldn't make a citation for you" instead.