Page MenuHomePhabricator

Users may add so-called descriptions to MediaInfo entities
Open, HighPublic

Description

e.g. this edit added a so-called description to a MediaInfo entity via Special:SetDescription (which makes no differences in the page layout, but the information is somehow exposed in JSON). This should be disallowed.

Not only should this not be done via special page (T214691: Disable some special pages in Commons), but also no descriptions or aliases may be added via API.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 25 2019, 1:51 PM
Addshore triaged this task as High priority.Jan 25 2019, 1:52 PM
Addshore moved this task from incoming to monitoring on the Wikidata board.Jan 25 2019, 2:22 PM

@daniel had an idea last week about using isValid to disable this. I think we need a bit more detail on the implementation he had in mind, as James mentioned MediaInfo code doesn't currently use that (if I understood him correctly). Thoughts, Daniel? (also thanks for helping!)

daniel added a comment.Feb 8 2019, 9:14 PM

The MediaInfoContent class can just override the isValid() method, call $this->getMediaInfo()->getDescriptions(), and return false of that is not empty. This will prevent any MediaInfo to be stored if it has any descriptions set.

Other code that can provide a helpful message to the user should also do this check. isValid() is really the fail-safe.

Change 489320 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/WikibaseMediaInfo@master] MediaInfoContent: Hard-fail if the content has descriptions

https://gerrit.wikimedia.org/r/489320

The MediaInfoContent class can just override the isValid() method, call $this->getMediaInfo()->getDescriptions(), and return false of that is not empty. This will prevent any MediaInfo to be stored if it has any descriptions set.
Other code that can provide a helpful message to the user should also do this check. isValid() is really the fail-safe.

Aha, it was the jump from MediaInfoContent up to the MediaInfo class that confused me. Yeah, this works, but will presumably need us to remove all existing descriptions from production before deploying as otherwise those pages will be invalid stored content and so won't be editable any more?

daniel added a comment.EditedFeb 9 2019, 4:34 AM

Aha, it was the jump from MediaInfoContent up to the MediaInfo class that confused me. Yeah, this works, but will presumably need us to remove all existing descriptions from production before deploying as otherwise those pages will be invalid stored content and so won't be editable any more?

Yes, indeed. You should be able to find the in the wb_terms table, I guess. If they do get written to the terms table, which they should, but I have not verified.

There is also a hack solution to fix this: override unserializeContent() in MediaInfoHandler, and just blank out any existing descriptions there by calling setDesciptions() on the return value of the parent method.

I don't claim to understand all details on how MediaInfo is implemented, but if I understood the problem, and I were to "fix" it, I would see three steps:

  1. Make Special:SetDescriptions and relevant API actions really only work with items and properties (unless already done in all places, description of this task suggests it might not be the case yet).
  2. Adjust the code turning the PHP object into JSON object for storage, and the opposite when loading JSON object from the database, so it ignores anything which claims to be a description. E.g. skip "descriptions" key etc. This would prevent storing any "description" data even if it somehow made up to the storage layer, and also unexpected leaking of incorrect data should it be already stored in historical revisions etc.
  3. Adjust the code turning the loaded the PHP object for JSON output, e.g. in API results. This code should be able to control what data is used for presentation, e.g. what "elements" of the MediaInfo object to display. This would prevent exposing any "descriptions" even if they for whatever reason ended up to be stored in the database.

Re points 2 and 3. It has seemed to be the case that "serialization/deserialization" in the storage layer, and "serialization/deserialization" in the presentation layer have been using the same logic, i.e. whatever was stored in DB was basically directly presented to the user. I would not consider this the best idea, and would suggest considering separating those two "serialization/deserialization" processes both conceptually, and also in terms of code.

I am not sure if this helps, should I be more specific (there have been quite exact class names being referred to in above comments etc), please request so.

Ramsey-WMF moved this task from Untriaged to Next up on the Multimedia board.Mar 8 2019, 1:41 AM