Page MenuHomePhabricator

Allow categories in Commons in all languages
Closed, DeclinedPublic


This card tracks a project that's in the Community Wishlist Survey top 10:

Original proposal: Categories in Wikimedia Commons must be only in english, but Commons is a multilingual project and it should allow other languages. This change would benefit Commons users that doesn't know english or with a poor level, I know many users of wikipedias that have a lot of problems with that, and that problem makes that many users don't use Commons. I think that the problem exists in a lot of wikipedias. Bye, --Elisardojm (talk) 11:32, 10 November 2015 (UTC)

Community Tech preliminary assessment:

Support: Very high. Unanimous support votes for the concept; comments were about different ways to implement the idea.

Impact: High with structured data, Medium to Low for a straight translation. Searching and sorting images on Commons is already difficult and confusing in English. If we really want to make Commons work well across languages, then we may need to do more than take the confusing English categories and make them confusing multi-lingual categories. Incorporating structured data would be a more scaleable long-term solution.

Feasibility: Difficult. There are hundreds of thousands of categories, maybe more than a million. Starting with A in Special:Categories, the #20,000th category on the list is Abraham. There are also a lot of overlapping categories, ex: Camels, Camelus, Camelus dromedarius, Camel anatomy, Camels eating, Drinking camels, Camel markets, Camel milk, Camels in art, etc. Is it possible for humans to realistically make a dent in translating all these categories in every language? If the solution is creating duplicate categories in each language, it's daunting to think about how to manage up to 200x the current number of categories. That being said, using structured data concepts is also difficult.

Risk: High. This needs scoping and consensus on how this could work, with the Commons community as well as Wikidata.

Status: This is an important problem, and we want to help figure out the best solution. Right now, the most promising line of thought seems to be the "concept tagging" that Wikidata could provide through a structured data platform. The idea is: tag images using concepts that exist in Wikidata, and then cross-reference concepts to find the images you're looking for. Made-up example: instead of having separate categories for Camels, Camel milk and Camel markets, you might be able to mark the images with the concepts "Camels", "Milk" and "Markets". Then you can search for images that have both Camels and Markets, and then drill down into concepts like specific locations. If the concepts are pulled from Wikidata, then they're already translated or translateable. We don't know for sure if that's going to be the ideal solution for this project, but we want to learn more when Wikidata has a working prototype. We'll have more to say as we learn more.

Event Timeline

DannyH raised the priority of this task from to Needs Triage.
DannyH updated the task description. (Show Details)
DannyH added a subscriber: DannyH.

I think this needs to be considered in the context of structured data for commons. That doesn't mean that multi-lingual categories should be blocked on the implementation of structured metadata for media. But the efforts should be coordinated.

Off the top of my head, there seem to be several different possible approaches:

  1. Make a nice UI for managing redirects to category pages.
  2. Attach structured data as a second content object to each category page, which would contain translations of the category name.
  3. use labels of Wikidata items connected with categories, for display.
  4. use Wikidata concepts as tags instead of categories. This would be blocked on structured metadata.
IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on #Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.

Daniel proposed several different solutions in the post above. I do not like much approaches #1 and #2. Redirects proposed in #1 do not seem like enough. Approach #2 I think requires starting with a new structured data system and I would rather use wikidata. I do like approach #3 where each Commons category associated with Wikidata article q-code (through Property P373 on Wikidata or through some template on Commons) would be able to access wikidata labels and article titles in other languages for display or for alternative text recognized by HotCats / Cat-a-lot and other standard tools as category names. Categories linked with Wikidata q-codes could be categorized automatically based on wikidata properties, so if someone has a wikidata properties saying he is a painter and he is from Spain than he would be automatically added to category Painters from Spain.. I think we should have more categories closely linked with wikidata entries. I also like approach #4 of using fully translatable concept tags, but as a parallel system to Commons categories.

This is complex as the Commons category system is far deeper than any other category system. Take as example Category:Cathédrale Notre-Dame de Chartres (which is not even linked to Q180274 as preference was given to the gallery page which is of no real importance to Commons. This category comes with subcategories like Category:Exterior of Cathédrale Notre-Dame de Chartres, Category:Portals of Notre-Dame de Chartres, Category:Royal portal of Cathédrale Notre-Dame de Chartres, and finally even Category:Right bay of Royal portal of Cathédrale Notre-Dame de Chartres, all properly nested in this order. It seems unpractical to link all these categories to Wikidata items. If you look at these cases, you see a stem, i.e. "Cathédrale Notre-Dame de Chartres" (which is in French as it is considered a proper noun) and various general specializations like "exterior of", "portals of" etc. We need an approach to tackle these composite category names by defining naming schemes which should be translated instead of translating each category individually.

Another problem at Commons is the multidimensionality which is not yet supported by the category system. Look at Category:Art where you find various "Art by ..." categories like Category:Art by location, Category:Art by medium, or Category:Art by period. There are many categories resulting from some combination of these dimensions like Category:Art by year by country, Category:Sculptures by year by country, and Category:1888 sculptures in the United States. All these categories would have not been necessary when automatic intersections of multidimensional categories would be supported. Then "sculpture" and "United States" would have to be translated just once and not for each odd category combination that exists at Commons.

@AFBorchert: For the Cathédrale Notre-Dame example, Do you think it would work if there were "concept tags" from Wikidata for: Cathedrals, Cathédrale Notre-Dame, Church portals, Gothic portals, Royal portals, 12th-century, Central bay, Paris, France and so on? Instead of going to an existing category, you'd do a search for the intersection of several concepts.

It would be complicated to come up with the correct concept tags -- I'm not sure how you'd handle "Church exterior", for example -- but the people who developed that existing category system are apparently okay with creating complicated things. :)

This tasks raises some old discussions and ideas from Commons Community: having tags, categories or both.

To complete AFBorchert's example, I take an example with this image.

There is two concepts in one:

  • categorization, related to concepts, in an tree structure: Royal portal of Cathédrale Notre-Dame de Chartres, included in Portals of Notre-Dame de Chartres, included in Exterior of Cathédrale Notre-Dame de Chartres, included in Catégorie Notre-Dame de Chartres. This primal category is the only necessary one.
  • tagging, related to characteristics: Pilaster, Gothic art, CC-BY-SA, File by Andreas F. Borchert. It is possible to add more tags: "statues", "12th century"... These tags does not exist at the moment. They are easily translatable.

A faceted search using tags can give user access to all information needed. On the portal image example, there is two categories which are for characteristics:

  • Category:Gothic sculptures of Chartres cathedral : can be found with "Gothic art" and "Notre-Dame of Chartres Cathedral" (or, in French, "Art gothique" and "Cathédrale Notre-Dame de Chartres" - yay, translations!). It is possible to use both categories and tags, in an invisible way for users.
  • Category:Pilasters in France: can be found with "Pilasters" and "France". "France" is the parent category for "Notre-Dame of Chartres Cathedral".

The example image can be found by typing, for example "Notre-Dame de Chartres", "pilasters" and "portal".

Looking for exterior views of the Cathedral? Type "exterior view of Chartres' cathedral". The category will do the job.
Looking for exterior sculptures of the Cathedral? Type "exterior sculptures of Chartres' cathedral". The category and the tagging will do the job.
Bot I'm pretty sure these features will need a little bit of improvement on MediaWiki's Search engine.

With a category, it is the same. Categories should just be for concepts: we replace "Category:Women facing left and looking right in art" (you like it? See more!) by tagging only.‎ Adding translatable categories' descriptions would be very helpful (better that the existing one).

With a page... well, pages must be nuked. Definitely.

Thinking for the future IMHO there are three aspects that have to be treated and handled separately:

  • Category structure issues (including category "attributes")
  • Category naming issues
  • Searching and tagging issues

A cat has a name but in fact there is no need that it has one. A cat has to be identified by its ID only. And all category "names" should be "name aliases" of that ID as a 1-N relation. So we'd get rid of these never ending back-breaking discussions.

  • English or a different language? No matter, some more aliases
  • AE vs BE? So what, one more alias
  • Singular vs plural? No problem, one more alias
  • Serbian vs Albanian or Russian vs Ukrainian (and so on) name? No problem, one more alias
  • Real name or artist's name of a person? One more alias
  • Cat names ASCII plus diacrits? No need, even Japanese or Korean or Hindi aliases
  • Category redirects? Ridiculous, no longer needed

And the user had the chance to see not "the" cat name on top of the page but the name aliases of his two or three favorite languages. Of course maintaining the whole thing is not easy for the tree's structure is made up by ID pointers only, so special maintenance UIs will be needed.

In my opinion, the proposed implementation (redirects!) would be a huge waste of time. Unstructured Commons will never manage to scale to hundreds of languages as well as Wikidata has done. The category system shows its weaknesses.
While the WMDE folks are working on T68108, the Commons community should gather and devise a solid model for translating categories into specific properties. Internationalization and ease of search will come consequently.

@Achim55, that does sound like a good and workable idea, although it would be a lot of work to implement. However, as @Ricordisamoa says, Wikidata is currently doing work that leads to a more flexible and scaleable system.

The Community Tech team is planning to support Wikidata's work on concept tagging, rather than building a new type of category system on Commons. Using Wikidata concept tags will allow for full internationalization, and it will serve as the basis of a much more powerful file search.

That being said, Achim -- thanks for your idea!

Given T69223#2511591, it would be nice for people promoting this task to state that they not wish to block the existing work on improvement of translation support for *pages* on Meta-Wiki, the WMBE wiki and so on.

DannyH claimed this task.

Closing this ticket; Community Tech won't be working on this project. But there will be lots of work done by other teams in 2017!

Closing this ticket; Community Tech won't be working on this project. But there will be lots of work done by other teams in 2017!

Is "resolved" a correct resolution? Seems to me either invalid or stalled would be more correct, or even leaving it open and setting priority to low/lowest.

Aklapper changed the task status from Resolved to Declined.Dec 16 2016, 1:43 PM

@DannyH: I don't see how anything got resolved here, hence changing to declined.

Okay, I identified what Danny means by "won't be working on this project": at this permalink for the 2016 community wishlist survey FAQ, the group is saying that the work done under T68108: [Epic] Store media information for files on Wikimedia Commons as structured data is going to be done instead of this work. So I agree with aklapper, "declined" is better.

Removing extra tags for which this feature request is certainly not declined.