Page MenuHomePhabricator

[feature] Change color of the link regarding to the content of the sections in the page
Open, Needs TriagePublic

Description

The suggestion is to color the interwiki link differently, if on the linked page, there is a section related to the current language.

Example :
I'm on the French Wiktionary, on the page "pain". There is an interwiki link to the English Wiktionary because a page "pain" also exists there.
On this en:pain, there are several sections for several definitions in several languages. There is a section about French word "pain".
Then, on the French page, the link will be blue.
Otherwise, if no French section exists on the English page, then on the French page, the link to the English page will be (for example) green.

Use case : this inform the user that even if a page exist in another language, there is no section about his own language, then maybe he could go there and add it. It would encourage contribution and cross-wiki contribution.

This feature could be considered as a gadget.

Event Timeline

Update: we've been discussing the feasibility with @Addshore . It doesn't look too difficult, but we're blocked on one point: each language version of Wiktionary have their own way to format the titles of the language sections. For example:

  • in French: {{langue|en}} (a template with the language code)
  • in Catalan: {{-fr-}} (another template with the language code)
  • in German: eau ({{Sprache|Französisch}}) (the title of the page then a template with the language name in German)
  • in English: French (only the language name in English)

In order to continue the investigation further, we need to build a list with the formatting of all the Wiktionaries.
I couldn't think about any easy programmatic way to build this list, so I started building it manually. It may take a while to build it, so I'd be happy if anyone wants to help completing the list!
Of course, if you think about a better option, let me know.

The list of different formats is now complete.
As expected, the formatting is very diverse accross the Wiktionaries, some sites don't have a harmonized way to format the titles.
@Addshore can you have a look and tell me what you think? How complex would it be to achieve the feature?

@Addshore Can you describe here the way to go, so it can be picked up by someone else if needed? :)

To avoid having to have any silly configuration for individual title styles on individual wiktionary projects our extension can instead provide a parser function / magic work for use in headings of wiktionary pages.
For example

{{COGNATE_HEADER_LANG:en}}

This can then be used in headings or on pages in whatever way individual projects want:

// In the heading itself
==English{{COGNATE_HEADER_LANG:en}}==
// As part of a template
=={{heading|en}}==

Use of these magic words / parser functions can then be detected in a normalized way at page save time and a new db table can be updated (along with the other cognate db tables).
With the current design this new data really needs to be stored alongside the 'page' entry in the cognate table which currently has a PK of (siteid,namespace,title(all ints)). It might be an idea to add an auto incrementing Id to this table to avoid duplication of all three fields in a new sections table.

If the pages table had an autoincrement ID to reference then a sections table (or 2) could be introduced.

Table: cognate_sections?
Fields:
 - page_id INT NOT NULL
 - section_id / section name

section_id would either relate to another table to avoid duplication of the section strings, or it could simply be the code for the section name "en" as all strings will be short.

We should check with the DBA to see what sounds like the best path in their opinion. And also get an estimation for the alteration of the page table.
If it would be hard to add a new auto inc PK for the page table then site, namespace and title could be duplicated into the sections table, but this seems wasteful.
A migration with a new PK for the table should be fairly easy, this could even be possible with parallel running of multiple tables rather than any sort of "big bang". Or some other fancy staged migration.

"ContentAlterParserOutput" can not be used to provide colour to the links however cognate could add extra information to the parser output here for use by JS, for example use JS config vars.
Alternatively or in combination a hook later in the process could likely be utilized to add the the styles to certain links in the backend (not relying on JS)

So if I understand correctly, there's not much we can do before the Wiktionaries agree on having a common template in their titles?

So if I understand correctly, there's not much we can do before the Wiktionaries agree on having a common template in their titles?

Ahah, we not even agree on a common logo!

I didn't get Addshore suggestion, but I though he had an idea on how to process this, didn't he?

So if I understand correctly, there's not much we can do before the Wiktionaries agree on having a common template in their titles?

I agree with @Noe . @Addshore presents a possible technical solution. There are two points:

  • from the Wiktionary community, we need to agree, project by project to change the template that generate the language section (or at least include the future new magic word in it).
  • from the developer point of view, write the code and make available this new magic word/function for all the Wiktionary communities.

If we agree, I would say that from the Wiktionary community point of view, we can start discussion to know its feeling. If it is rather good on the main Wiktionary projects (en, fr, ru, de, pl, el, es, nl, ja, id, ..., let us say the Wiktionary projects that have more than 100k pages), then I think the developer can start on the code and provide a proof of concept (or more).
What do you think?

Both ideas (normalize the titles or add a magic keyword) will request the Wiktionaries to agree on changing something in their titles on a big scale. I guess these discussions will take some time, therefore it would be wise to start them as soon as possible :)

I like the idea of starting with a few Wiktionary projects, see how it work, test the feature. That could also help the community to convince the other projects that this change could involve their Wiktionary into a nice feature.

Realistically, the development could not start this year, but we can include it in our roadmap for 2019.

Discussions on Wiktionaries could be quite straighforward, communities are small and technical issues are usually solved by everyone following a trusted colleague saying it's good. Well, and a development dedicated to Wiktionaries by your team can't be bad.

If I get the idea right, a magic word could be created for sections, something like SECTIONSORT. This could also solve T183747 , another major issue of Wiktionaries: orthographical ordering when several languages are presents on the same page.

This MediaWiki development could be of great use for Wiktionaries and I am pretty sure no one will oppose to harmonized the way to format the titles if it can solve ordering and permit to improve Cognate Dashboard with more specific data!

(I am very enthusiastic!)

I have included in the tracking page the main templates used for uniformazing language section titles in some projects. That makes a total of 20 projects using a main template, including 5 out of the top 10. It is not any issue to add a magic word in the template on these projects. This is a good starting point.

This feature would be useful, but using language codes would be complex.

English Wiktionary (at least) has a lot of custom, non-ISO, language codes containing hyphens. For instance, codes for proto-languages often (always?) end in -pro. And the scope of each language code is not always the same across Wiktionaries: what is one language on one Wiktionary sometimes corresponds to two or more on another Wiktionary. I guess the prominent example is Serbo-Croatian vs. Bosnian, Croatian, Montenegrin, Serbian. (Sometimes one Wiktionary has three languages that partially overlap: Norwegian, Norwegian Bokmål, and Norwegian Nynorsk.) The code for a given language is not necessarily the same between Wiktionaries: English Wiktionary uses ine-pro for Proto-Indo-European, but French Wiktionary apparently uses ine-pie for the same language (indo-européen commun). I'm not sure what this all means for the extension.

The language data for the English Wiktionary can be found in our language data modules. This is what templates use to validate language codes (and sometimes language names). Sometimes the code for a language name or the language name for a code is changed, or two codes are merged (by edits to those modules). The extension would have to access these modules somehow, unless the method of storing this data is changed. [Edit: Actually, I am not sure what information, if any, from the data modules the extension might need. At least the data modules would indicate which language codes are actually used (for instance, en rather than eng), which might be useful to the extension.]

I don't like the fact that putting templates in section headers (which has been proposed above as the mechanism for making this extension possible) breaks the links in edit summaries. For instance, in this edit summary on the English Wiktionary there's an arrow that successfully links to the Catalan section, whereas in this edit summary on the French Wiktionary, the link to the Tchèque section doesn't work because the anchor contains the wikitext of the template that is in the header, rather than the text from the expansion of the template. It is convenient on the other hand that the language headers contain anchors for both the language name and code. If the template-in-header solution is used, English Wiktionarians would have to get used to the fact that some of the links from edit summaries will not work.