Page MenuHomePhabricator

Citoid and bot disagree about use of language parameter in cite book
Closed, DeclinedPublic


After editing a bunch of cites in [[Rose Pesotta]], a bot helpfully corrected the new citations by removing the language parameter.

No idea why, but seems worth investigating.

(Tangentially, this did make me notice that sometimes language is "English" and sometimes "en"? Not sure if that is worth addressing.)

Event Timeline

LuisVilla raised the priority of this task from to Needs Triage.
LuisVilla updated the task description. (Show Details)
LuisVilla added a project: Citoid.
LuisVilla added a subscriber: LuisVilla.

Possibly also useful, there is another bot going around correcting citoid-authored citations:

Changes in that diff to jstor and amazon links. Without having an opinion on whether the changes are a good idea, if we're making edits that regularly then get further changed by bots, we should probably either fix citoid or talk to the bot authors to change their bots...

Possibly also useful, there is another bot going around correcting citoid-authored citations:

No that bot was activated by me, since the cites were not complete and missing information. Not to do with it was used by citoid.

@Josve05a sorry! I did not mean that your bot was anti-citoid, or that you were doing it because of citoid! Just that I am noticing it because I know I created them with citoid, and obviously we should work to make sure that citoid generates complete/accurate cites.

No worries. Just that JSTOR-link and ASIN links should use the parameters, and not url. Same as using Google Books-links for ISBN, GBooks-links should only be used if pointing to a "preview/page".

See also

So, this is because the citation templates on en wiki complain if the value of the language parameter is English.

There are two ways to fix this: 1, don't put the language parameter in the citoid map in the template data. Unfortunately this means that no language parameters will be added automatically. But, it will stop the values from being added. 2. Change the en wiki template to not care if the language parameter has a value of en.

On the back end, we won't change this as it makes sense to report the language code no matter what. On the front end, there really isn't much we can do about this as Template Data is a bit of a blunt instrument- we can't say "only include this parameter if it doesn't have this value" without making template data programmable- definitely something we don't want to do at this point.

I personally think the templates are wrong about this, there's nothing wrong about knowing if a source is originally in English even if you're on en wiki; it's assumed, yes, but the cost is just a few more bits to be extra sure :).

Mvolz set Security to None.
Mvolz added a project: TemplateData.

I would not mark as resolved until a solution is negotiated with the
template or bot authors. I agree with you that it seems fine to put the
information in even if redundant, but it isn't our call. Perhaps the CLs
could help facilitate the discussion?

It's declined because I don't forsee doing anything on our end. It's up to the community to decide what is done with templates and bots. You could go in right now as a community member and change the TD, or the template! :) It's inefficient, sure, but I think there will always be a role for bots; we're providing software that can theoretically be used with any templates on any wiki in any language and there will always be things we can't address without "over fitting" to a particular template or language.

I'm open to reopening if anyone has any ideas on how to address this in the service, citoid extension, or template data extension that seem reasonable.

Adding @Qgil, @Whatamidoing-WMF, @Elitre and @Quiddity to see how this can perhaps be made into a bot developer engagement process. May require different tickets but definitely worth considering solutions to.

There's actually nothing "wrong" here. Bots and scripts like AWB clean up formatting all the time, and this edit would have been made anyway, to deal with misplaced punctuation and the (common) use of a hyphen instead of the (correct) endash for the page numbers. The template functions correctly with the information present, but that community has decided not to bother including |language=English. It's a local stylistic decision, not a technical issue.

As Mvolz says, it is difficult to resolve this without making it a programmatic exception (which might complicate things a lot).
I.e. In Luis' original edit, the 3 items in the #References section have "(in English)" mid-way through, and if this pattern were applied consistently throughout Enwiki, then the vast majority of our citations would include this string.
This longer string, would bulk up the references section with information that is generally assumable for the reader.
(Semi-relatedly, on my volunteer account I already have this CSS: .reference-accessdate {display:none} to hide the "Retrieved 2015-03-04." element in all references. Simply because it annoyed me as being generally not relevant, and a distraction when I'm looking for the date of publication.)

However, I can see how it would probably be useful to researchers, to have this meta-info on the language of the citation, available programmatically (rather than just assumed). I'm not sure how much the added code-complexity would cost (in initial creation and ongoing maintenance)?

(If I understand correctly...)

  • Ideally: we could tag all citations with their actual language, but hide the language from displaying to readers if it is the local-default-language (i.e. English at Enwiki).
    • This would be done in a way that didn't cause a FOUC, and didn't annoy people using screen-readers.
  • Aha!! the French Wikipedia seems to have this, e.g. this diff contains many citations with langue=fr and langue=français but only the 3 citations marked as langue=anglais (English) are marked as such in the rendered article (with the (en) prefix). Their Lua sub-module is - but I don't know how to find (let alone suggest a fix for) the equivalent within Enwiki's

I recommend someone who understands the issue and the code, should suggest an update to Enwiki's module at (and ping user:Trappist_the_monk as he/she seems to have been part of the main development of that aspect in the past).
Outline the benefits, and ask if there are any edge-cases to be considered, or past discussions to be consulted (I couldn't see anything in that subpage's archives, but discussions about citation style could be in any of a few dozen locations!).

(edit for clarity) Could someone in CL notify the community so they're aware, at least?

(follow up) - Nick, I mist have had a blind spot to your recommendation. Absolutely agreed, and thank you. I'll Phab task it (I'll assign you initially, but feel free to reassign)

French Wikipedia citation templates just ignore |langue = fr

This is done in module (sub-module of, an equivalent to, line 219 and 245 today (twice as the function handles multiple language code as |langue = en, fr).

So it's not a problem if citoid add French language code on

Zebulon84, as Quiddity explained above, it is true that on the references are not preceded by (fr) if the reference is in French: still, the parameter langue=français is added to the template itself. Is there a way to avoid this entirely (on enwiki, by not getting language=en added I mean)? Thank you!

Is there a way to avoid this entirely (on enwiki, by not getting language=en added I mean)? Thank you!

My suggestion does involve keeping the language=en in the cite templates, because that meta-info is useful for researchers to be able to extract, and it's more complicated to prevent Citoid from adding it (at each language).
I'll try to clarify over at
@Zebulon84 thanks for the details, I'll include a link to those.

The decision at enwiki appears to be that these |language=en parameters will be kept, and the citation template has been modified to not display this information on the English Wikipedia. Presumably anyone who imports the module to another wiki will want to change that setting (e.g., so that English-language sources get visibly marked as English on the Russian Wikipedia, but Russian-language ones don't get labeled as being in Russian).