Page MenuHomePhabricator

Web2Cit configuration for www.independent.ie
Closed, ResolvedPublic

Description

Creating separate task for comments posted by @AlexisJazz on T298427:

There's an issue though, I tried to edit https://web2cit.toolforge.org/translate?url=https://www.independent.ie/life/food-drink/table-talk/a-day-in-the-life-of-chef-clodagh-mckenna-31196664.html and landed on https://meta.wikimedia.org/?action=submit resulting in a 405 "method not allowed" error. I had to enter web developer mode to add the missing /w/index.php. This is very impractical. At that point I was able to create https://meta.wikimedia.org/wiki/Web2Cit/data/ie/independent/www/templates.json. I don't see a difference in the output though, maybe changes are not instant due to vetting or cache or something?

Now I try to edit https://meta.wikimedia.org/wiki/User:Alexis_Jazz_T46787/Web2Cit/data/ie/independent/www/templates.json from the web2cit interface (after switching into sandbox mode) and it says:

Starting with an empty form
TypeError: page.revisions is undefined

And when I try to edit (after, once again, adding /w/index.php) I find out you shouldn't have percent-encoded my username as I am presented a "bad title" error.

I'm sure these issues will be worked out though. They're not unlike the kind of issues I have at the start of a big project. As such, I added web2cit support to Bawl.

I'll reply to them below in a minute.

Event Timeline

Thank you very much for your interest in Web2Cit, for helping us test it, and for reporting the issues you found! It's very helpful for us.

landed on https://meta.wikimedia.org/?action=submit resulting in a 405 "method not allowed" error

Thanks for reporting this. There must have been a change in Wikimedia servers recently, because this was working a few days ago. But definitely yes, it should be https://meta.wikimedia.org/w/index.php?action=submit, not https://meta.wikimedia.org/?action=submit. I'll create a separate task and fix that ASAP: T309320

At that point I was able to create https://meta.wikimedia.org/wiki/Web2Cit/data/ie/independent/www/templates.json. I don't see a difference in the output though, maybe changes are not instant due to vetting or cache or something?

From what I see in the original revision of www.independent.ie's templates configuration file, Web2Cit was likely ignoring the translation template you had configured because it lacked mandatory template fields itemType and title. These template fields are mandatory and Web2Cit will ignore translation templates that do not include both of them. This is explained in the information box that pops up in the configuration file editor, next to the Fields property title (though I acknowledge it may be somewhat hidden in this initial Web2Cit-Editor version):

image.png (571×981 px, 96 KB)

Because of that, Web2Cit was using its default fallback template to translate the target webpage, which simply uses Citoid's response for all Web2Cit-supported fields. That's why you were not seeing a difference in the output. You can tell that Web2Cit was using the fallback template from the translation results page: "Translation result using fallback template".

Make sure you include in your translation template all the fields that you want an output for. If you want to reuse Citoid's response for a field, explicitly say so by using the Citoid selection step (provided by default). Try adding itemType and title fields to your template. Use the default procedure for both. This should fix the "I don't see a difference in the output" of the problem.

I see you then manually changed the templates file, which unfortunately made matters worse, as you ended with an invalid JSON file (there is an extra comma at the end of the selections array). JSON files are complex and we are likely to make mistakes when manually editing them. Even more so for JSON files in our main storage, which do not have the JSON editor available (see T305571). If possible, please use the configuration file editor instead. I've just removed that extra comma to make it a valid JSON again.

Now I try to edit https://meta.wikimedia.org/wiki/User:Alexis_Jazz_T46787/Web2Cit/data/ie/independent/www/templates.json from the web2cit interface (after switching into sandbox mode) and it says:

Starting with an empty form
TypeError: page.revisions is undefined

Thanks for reporting this...

you shouldn't have percent-encoded my username as I am presented a "bad title" error.

...and this!

Both seem to be caused by spaces in your username. I'll open a separate task and fix it: T309321. In the meantime, as a workaround, it should work with underscores instead of spaces: User:Alexis_Jazz_T46787

I added web2cit support to Bawl.

Thanks! I've never used Bawl, but for what I've read I understand that it adds a reply button to on-wiki discussion messages. Could you please clarify what it means that you added Web2Cit support to it? Thanks!

Try adding itemType and title fields to your template. Use the default procedure for both. This should fix the "I don't see a difference in the output" of the problem.

@AlexisJazz, you may use what I've done in my sandbox as an example:

  • I added mandatory itemType and title fields to the template, both using Citoid selection.
  • I added authorLast and date fields, both using XPath selection
  • I changed the template's path from / to the actual path to the webpage that we are using as a translation template /life/food-drink/table-talk/a-day-in-the-life-of-chef-clodagh-mckenna-31196664.html. This may seem useless now, but it will make sense when we support translation tests (T302722), and if multiple templates are needed per domain.

Feel free to copy this over to the main storage or to your sandbox!

Thank you very much for your interest in Web2Cit, for helping us test it, and for reporting the issues you found! It's very helpful for us.

You're welcome!

From what I see in the original revision of www.independent.ie's templates configuration file, Web2Cit was likely ignoring the translation template you had configured because it lacked mandatory template fields itemType and title. These template fields are mandatory and Web2Cit will ignore translation templates that do not include both of them. This is explained in the information box that pops up in the configuration file editor, next to the Fields property title (though I acknowledge it may be somewhat hidden in this initial Web2Cit-Editor version):

image.png (571×981 px, 96 KB)

Because of that, Web2Cit was using its default fallback template to translate the target webpage, which simply uses Citoid's response for all Web2Cit-supported fields. That's why you were not seeing a difference in the output. You can tell that Web2Cit was using the fallback template from the translation results page: "Translation result using fallback template".

Make sure you include in your translation template all the fields that you want an output for. If you want to reuse Citoid's response for a field, explicitly say so by using the Citoid selection step (provided by default). Try adding itemType and title fields to your template. Use the default procedure for both. This should fix the "I don't see a difference in the output" of the problem.

I see you then manually changed the templates file, which unfortunately made matters worse, as you ended with an invalid JSON file (there is an extra comma at the end of the selections array). JSON files are complex and we are likely to make mistakes when manually editing them. Even more so for JSON files in our main storage, which do not have the JSON editor available (see T305571). If possible, please use the configuration file editor instead. I've just removed that extra comma to make it a valid JSON again.

If the user is about to do something stupid, you should turn backgrounds for relevant elements pale red. (but make sure text contrast remains sufficient) If the user is about to do something really stupid, disable buttons.

I wasn't interested in changing any of those other fields. The value of publishedIn for Irish Independent is just "independent" (lowercase) , that's all I wanted to change/override.

Never require fields if you can reasonably guess the value. In this case, for any omitted field you should assume default behavior. If the user wants any particular field to be omitted, they should explicitly say so.

And perform tests on total crap, like an old smartphone with an outdated browser. Whatever you consider the bare minimum. If you have a passable experience on that, it'll be awesome on anything better. For Bawl, I use a Core 2 Duo laptop that's roughly the same age as your average Fortnite player and occasionally I also test on a smartphone I bought used for $15 or so.

I added web2cit support to Bawl.

Thanks! I've never used Bawl, but for what I've read I understand that it adds a reply button to on-wiki discussion messages. Could you please clarify what it means that you added Web2Cit support to it? Thanks!

In addition to that, Bawl can be used (opt-in) to edit whole pages or sections where references will be more useful. With Bawl opened, click the link icon and enter the source URL. This will cause a link named "Web2Cit" to appear next to the "Cancel" button to check the Web2Cit configuration of the entered URL. If "Insert as reference" is pressed, well, it inserts the reference. Currently this works only on enwiki because Citemap.json is a hard requirement for it.

Thanks again for your feedback, @AlexisJazz!

Never require fields if you can reasonably guess the value. In this case, for any omitted field you should assume default behavior. If the user wants any particular field to be omitted, they should explicitly say so.

I had in the past thought of including by default all fields supported by Web2Cit, set to using the default Citoid selection. That is, offer our fallback template to begin with, and let the user explicitly change/remove any field they'd like. However, some users preferred to start with an empty form. As a middle ground I came up with what we have now: begin with empty form, but use default configuration for any fields explicitly added.

Of course we could revise this again. But because this editor is just a preliminary workaround until we have our real-time Web2Cit-Editor available, I'm not putting too much effort into finely tweak it. We are already tracking this here, anyways: T302590.

We also have in mind the possibility of changing Web2Cit's fallback approach to one that uses the Citoid response for any field returning an empty output. We won't be able to consider this alternative approach in the following months, though: T302019

And perform tests on total crap, like an old smartphone with an outdated browser

This sounds like a reasonable thing to do with our to-be-implemented real-time Web2Cit-Editor. Nonetheless, I'm not sure how comfortable one may be editing these translation procedures from a smartphone. I acknowledge that it may be the only device available in some cases, though.

In addition to that, Bawl can be used (opt-in) to edit whole pages or sections where references will be more useful. With Bawl opened, click the link icon and enter the source URL. This will cause a link named "Web2Cit" to appear next to the "Cancel" button to check the Web2Cit configuration of the entered URL. If "Insert as reference" is pressed, well, it inserts the reference. Currently this works only on enwiki because Citemap.json is a hard requirement for it.

Oh, I see! Nice! I haven't revised Citemap.json in detail, but it seems to have more or less the same information as the Citoid maps here and the TemplateData section of some citation templates (for example, here), which is what the Citoid extension uses to convert citation metadata into citation templates. Could Bawl be configured to use these settings instead? These settings are already configured in Wikipedias using the Citoid extension.

Do we have anything pending from this task? Can we mark it as resolved otherwise? Thanks!

And perform tests on total crap, like an old smartphone with an outdated browser

This sounds like a reasonable thing to do with our to-be-implemented real-time Web2Cit-Editor. Nonetheless, I'm not sure how comfortable one may be editing these translation procedures from a smartphone. I acknowledge that it may be the only device available in some cases, though.

Some people do absolutely everything on a mobile device. I even know people who use Cat-a-lot on a mobile device which I severely doubt was ever designed with mobile in mind. It makes no sense to me and probably not much to you either, but this is just how it is. And indeed, some people (for various reasons) don't have any other device.

Another thing to keep in mind are screenreaders. Can't help you with that atm as I've yet to figure it out myself.

What is this "Web2Cit-Editor" going to be? (please no exe please no exe please no exe)

In addition to that, Bawl can be used (opt-in) to edit whole pages or sections where references will be more useful. With Bawl opened, click the link icon and enter the source URL. This will cause a link named "Web2Cit" to appear next to the "Cancel" button to check the Web2Cit configuration of the entered URL. If "Insert as reference" is pressed, well, it inserts the reference. Currently this works only on enwiki because Citemap.json is a hard requirement for it.

Oh, I see! Nice! I haven't revised Citemap.json in detail, but it seems to have more or less the same information as the Citoid maps here and the TemplateData section of some citation templates (for example, here), which is what the Citoid extension uses to convert citation metadata into citation templates. Could Bawl be configured to use these settings instead? These settings are already configured in Wikipedias using the Citoid extension.

I just ripped it from https://en.wikipedia.org/wiki/User:Salix_alba/Citoid.js so yes, I would expect it to work with Citoid as well.

Never really looked into TemplateData. Seems fine for extensions, they could just check the configuration every 10 minutes or whatever. A luxury I don't quite have. (unless I cache the stuff in localStorage) With Citemap.json, I get everything I need in 1 request, guaranteed. (the page title of the JSON is cached in localStorage) With the Citoid maps, and correct me if I'm wrong, I have to:

  1. Get content of MediaWiki:Citoid-template-type-map.json
  2. Get content of Template:(name of template according to previous step)
  3. If the template contains TemplateData, done. Otherwise:
  4. Scan the template for {{Documentation}} (Q4608595)
  5. If not found, request linksHere to get redirects to Template:Documentation
  6. Scan the template for redirects to Template:Documentation
  7. Figure out where the documentation is??? Download Template:Documentation and search it? Decompile it if it just invokes Lua? (I'm joking, but seriously, HOW?)
  8. If not found, check Special:PrefixIndex for subpages
  9. Request content of whatever page(s) were found
  10. If present, extract TemplateData and continue, otherwise, throw stupid error in the user's face.

And this would have to be done when you insert a reference. And I'm not sure I could reliably determine beforehand that the required info is unavailable to hide the reference button entirely.

If TemplateData was 100% guaranteed to be available at, say, Template:Cite example/TemplateData.json and I added Wikidata IDs for all known citation templates to my list of sitelinks to request I could get the source of MediaWiki:Citoid-template-type-map.json and TemplateData.json for all citation templates in a single request. But I can't!

Seems I better stick with Citemap here. TemplateData could be worth a look if they fix it. A TemplateData to Citemap conversion script seems more realistic.

Do we have anything pending from this task? Can we mark it as resolved otherwise? Thanks!

If I insert https://www.independent.ie/life/food-drink/table-talk/a-day-in-the-life-of-chef-clodagh-mckenna-31196664.html it still returns <ref name="independent">{{Cite web|title=A day in the life of chef Clodagh McKenna|website=independent|url=https://www.independent.ie/life/food-drink/table-talk/a-day-in-the-life-of-chef-clodagh-mckenna-31196664.html|access-date=2022-06-01}}</ref>.

And I noticed only now that the date is also missing.

To be honest, I just don't quite understand how I could change this. And the time to explain it to me is probably better spent improving the editor (or building the new one) so I won't need any explanation. The current editor isn't exactly following the KISS principle and I just don't want to spend time trying to understand something you say will be replaced anyway.

diegodlh claimed this task.

I've created a translation template for domain www.independent.ie based on the webpage at /life/food-drink/table-talk/a-day-in-the-life-of-chef-clodagh-mckenna-31196664.html. This (as well as any other templates defined for this domain in the future) can be revised by clicking on "edit" under "Translation output" in any translation summary page for this domain (example).

This translation template will be used to translate similar webpages from the same domain. See for example translation results for path /life/family/parenting/the-manny-why-some-families-prefer-having-a-male-nanny-41948507.html here.

Regarding the usability problems of our current editor, after discussing the issue with our Advisory Board we have decided to down-prioritize the development of our new real-time editor, and focus on fixing bugs and strengthening our core features for now instead. I'm creating a list of such tasks here. There might have been some updates to our current editor since this task was posted, though. You may check the workshop recently recorded here for an overview of how to use it.

Nonetheless, so we don't have to start the new editor from scratch in the future, a (very draft, buggy and incomplete!) version has been published and we hope that it ma be updated soon. It works as a bookmarklet that injects the editor as a sidebar on the webpage you are visiting. You can follow the (also draft) instructions here to find out how to give it a try. But do note that for now it is just a viewer (i.e., no editing functionality yet) and that it still needs lots of design and functional fixes. I'd say it's more of a prototype for now.

All in all, given that a configuration for www.independent.ie has been provided, I will close this task for now. But feel free to reopen it if you consider so.