Page MenuHomePhabricator

Zotero translator needed to get correct author for Condé Nast requests
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
https://en.wikipedia.org/w/index.php?title=&diff=prev&oldid=1058626070

Condé Nast is a global mass media company, not a person.

What should have happened instead?:
Insert "Annabel Sampson" for the Tatler article and "Brianna Wiest" for the Teen Vogue article.

Event Timeline

Unfortunately this is a case where you would have to write a custom translator for zotero (see https://www.mediawiki.org/wiki/Citoid#My_favourite_site_isn't_recognised_by_citoid_and_only_gets_basic_information) for these sites.

What's happening here is that all those pages contain the following metadata tag

<meta name="author" content="Condé Nast">

Without a specific translators pointing Zotero to the right place in the html to pull the author from, zotero defaults to using those metadata tags.

Mvolz renamed this task from Citoid incorrectly inserts "Condé Nast" as author in references to Zotero translator for Tatler/Teen Vogue request.Jan 6 2022, 1:30 PM
Mvolz renamed this task from Zotero translator for Tatler/Teen Vogue request to Zotero translator needed to get correct author for Tatler/Teen Vogue requests.
Mvolz moved this task from Backlog to Zotero on the Citoid board.
GoingBatty renamed this task from Zotero translator needed to get correct author for Tatler/Teen Vogue requests to Zotero translator needed to get correct author for Condé Nast requests.Jan 6 2022, 2:58 PM

@Mvolz - The issue extends to other Condé Nast publications. For example:

Plus:

  • Allure
  • Epicurious
  • Lenny Letter

With support from a Wikimedia Foundation grant we are currently developing Web2Cit, a tool to collaboratively fix cases like this, requiring much less technical skills than those required to write a Zotero translator. Of course writing a Zotero translator will continue to be the best, most robust choice, but using Web2Cit may help in cases where a JavaScript programmer may not be available, or until a proposed translator has been approved by the Zotero community.

The tool is still under development, but some features are ready to try. You may read our introduction guide here, or join the session we will be hosting at the Wikimedia Hackathon this Friday (see T308449).

By chance, earlier today I saw that someone had defined a Web2Cit translation template for www.newyorker.com, which seems to be another Condé Nast publication. The screenshot below shows the Web2Cit user script in action, which tweaks the visual editor to use Web2Cit in addition to Citoid. It shows two citations automatically generated for target URL https://www.newyorker.com/news/letter-from-the-southwest/coffeezilla-the-youtuber-exposing-crypto-scams: the one at the top was built using Citoid response; the one at the bottom was built using Web2Cit response (i.e., using collaboratively defined translation procedures). Note both item type and author metadata fixed:

Screenshot from 2022-05-16 18-01-14.png (442×532 px, 49 KB)

Note that Web2Cit may not support all citation fields supported by Citoid. A list of currently supported fields is available here. As a result, some metadata returned by Citoid may be dropped by Web2Cit (e.g., ISSN). This was not the case in the example shown above.

@diegodlh Looks interesting! I watched both the "Short video explaining how Web2Cit works" and the "Short video explaining how to use Web2Cit". In the latter video, the field you demonstrate around the 6:00 mark doesn't require a transformation field. Then at 6:30, you instruct us to repeat the process for fields that do require the transformation field. But then you speed through it without allowing us to see the details?!?!?!? This is the most important part, so please consider editing your video to show the details of extracting the info needed for the transformation field from the web page. Thanks!

Hi, @GoingBatty. You are right that I shouldn't have fast-forwarded that. I meant it to be a short introduction video, but I should have made available the full video too. I will upload a new version including the fast-forwarded part at the end (T308586).

Nonetheless, in this Friday's session (T308449) we will be covering this in more detail. I hope you can join! If participants agree, we will record it and make it available later on. (Edit: we won't be able to record it, I'm afraid; we will try to record one soon)

In the meantime, you may read about the transformation steps we currently support here.

You may also check the templates configuration for www.newyorker.com:

  1. Open the translation results for any webpage in that domain. For example: https://web2cit.toolforge.org/https://www.newyorker.com/news/the-political-scene/pennsylvania-republican-primaries-trump-dr-oz-mccormick-barnette-mastriano
  2. Click "edit" beneath "Translation output"

In this case, a combination of XPath selection + Split & Range transformations was used to extract author metadata.

Coming up with XPath expressions to configure XPath selection steps is probably the most technical aspect of using Web2Cit right now. We hope we will be able to provide a simplified way when the Web2Cit-Editor is available. Alternatively, we also plan to support CSS selection. In the meantime, you may use your browser's inspector to get a weak XPath and tweak it manually, or leave those template fields empty for other Web2Cit contributors to help you with them.

Thank you for your interest! Feel free to start a new thread in our discussion page, or create a separate task in Phabricator using tag Web2Cit if you have any questions or comments :)

With support from a Wikimedia Foundation grant we are currently developing Web2Cit, a tool to collaboratively fix cases like this, requiring much less technical skills than those required to write a Zotero translator.

That's BRILLIANT!

There's an issue though, I tried to edit https://web2cit.toolforge.org/translate?url=https://www.independent.ie/life/food-drink/table-talk/a-day-in-the-life-of-chef-clodagh-mckenna-31196664.html and landed on https://meta.wikimedia.org/?action=submit resulting in a 405 "method not allowed" error. I had to enter web developer mode to add the missing /w/index.php. This is very impractical. At that point I was able to create https://meta.wikimedia.org/wiki/Web2Cit/data/ie/independent/www/templates.json. I don't see a difference in the output though, maybe changes are not instant due to vetting or cache or something?

Now I try to edit https://meta.wikimedia.org/wiki/User:Alexis_Jazz_T46787/Web2Cit/data/ie/independent/www/templates.json from the web2cit interface (after switching into sandbox mode) and it says:

Starting with an empty form
TypeError: page.revisions is undefined

And when I try to edit (after, once again, adding /w/index.php) I find out you shouldn't have percent-encoded my username as I am presented a "bad title" error.

I'm sure these issues will be worked out though. They're not unlike the kind of issues I have at the start of a big project. As such, I added web2cit support to Bawl.

Thank you, @AlexisJazz! I've opened a separate task to discuss this, so we don't continue detouring from the topic of this task (i.e., fixing Citoid response for Condé Nast publications). I know it was me who started it, sorry: T309310

Just a heads-up, the issue has been fixed up-stream at the Zotero translator (https://github.com/zotero/translators/pull/3287), so it should be fixed for Citoid as well once it and the Zotero translation server are updated with the latest translators.

@Yeeno - That's great news! I run a bot to fix articles with this error regularly (including 26 errors today), so it will be wonderful to see this go away.