Page MenuHomePhabricator

[Task] Decide on structure of entity ID for media info entity type
Closed, ResolvedPublic

Description

Options we have so far:

  • page ID
  • file name
  • numeric ID

Event Timeline

Lydia_Pintscher raised the priority of this task from to Medium.
Lydia_Pintscher updated the task description. (Show Details)

Arguments for the ongoing filenames vs. IDs discussion:

  • Outside of Commons, the only identifier to reference a file is the filename.
    • Do we want this to change? Do we want to provide an alternative? We have the chance to do so. Files currently don't have an internal identifier. Imagine https://commons.wikimedia.org/media/M42.jpg as a guaranteed, permanent link to each file.
    • The file description page does have an id, but is it sensible to reuse it?
  • Users change filenames a lot: https://commons.wikimedia.org/wiki/Special:Log/move
    • Thousands of users have the right to rename files: https://commons.wikimedia.org/wiki/Special:ListUsers/filemover
    • Some of the reasons to rename files will become less important when everything we want to do will work some day (e.g. when the search does have enough information to find a file and the actual filename is not that important any more). But this needs time and will most probably not stop users from renaming files for consistency.
    • We must take care that a media item moves with the file and the description page, when a file is renamed. Redirects must work and what not. We must do this no matter what. But how ugly will this code be? Is it more or less ugly with numeric ids?

Overall: What will cause less trouble, for all users combined?

I just want to throw in the room that file names will become more and more irrelevant when we have labels for images. The only place users will actually notice them is in the Wikitext source code but everywhere else the label will be the thing displayed and used for searching. Also, with the increasing popularity of Visual Editor also the file names in the source codes will get less important imo.

@thiemowmde if we use the ID of the file description page, instead of an auto-increment id, we have a stable id and don't have track renames - the page table does it for us. I'm currently favoring that dirty hack... This will however only owrk once we integrate with the File namespace. The first iteration is plannend to be completely standalone.

I'd prefer to introduce our own id system like on Wikidata with incremental ids created automatically when a new entity gets created. This way we do not have to adjust our code too much (eg. we aren't blocked on the prefixed id issue) and we have something stable that doesn't depend on MediaWiki internals which I think is much more desirable than using something from the "deep dark side" of mediawiki. I think it's correct that your proposal is a "dirty hack"... ;-)

Lydia_Pintscher claimed this task.

We discussed this in yesterday's story time. The result is: We will use m+pageID of the file page.

Clarification of Lydia's comment: we will use M+pageId of the corresponding image description page, once we have integration with image description pages. For the baseline version without integration with file pages, we will use M+incrementalId, just like we do for items and properties.

Using the M+pageId approach, it will be simple to find the file description page for an entity, and find the media info entity "connected" to a given file description page.

We also discussed using the file name as the identifier for the media info. However, files can be renamed, so the name is not a stable identifier. The page ID is stable against page moves and, apparently since recently, also against delete and restore (but we should verify that).