Add an API module for querying MediaInfo entities by file page title
Closed, ResolvedPublic

Description

This should be a new API module, since the existing systems for getting Wikibase entities won't be sufficient, and there are specialized needs for this particular type of entity. However, it should basically be wbgetentities except with only titles instead of the current ids|site&titles system.

Rough draft of an example API query:

action=wbgetmediainfo&titles=File:LighthouseinDublin.jpg|File:Foobar.jpg&languages=en|de&props=sitelinks|aliases|labels|descriptions|claims

Related Objects

Change 382603 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/Wikibase@master] Make GetEntities API class extensible

https://gerrit.wikimedia.org/r/382603

Change 382604 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/WikibaseMediaInfo@master] Add wbgetmediainfo API module

https://gerrit.wikimedia.org/r/382604

These two patches are the first stab...this could probably be cleaned up and I haven't dealt with missing title errors yet. Marking as WIP, sorry.

Addshore added a subscriber: Addshore.EditedOct 6 2017, 9:51 AM

Just curious but what would be wrong with requests such as:

action=wbgetentities&ids=File:LighthouseinDublin.jpg|File:Foobar.jpg&languages=en|de&props=sitelinks|aliases|labels|descriptions|claims

or something similar, instead of introducing a new api module, that essentially does exactly what the other one does?

This works on the assumption that I remember from the offsite that the name of the file is the unique identifier within the repo.
Although if we allow changing of these name then this won't make as much sense.

I just found your comment @ T177005#3644641 which briefly talks about something like action=wbgetentities&sites=local&titles=File:LighthouseinDublin.jpg
Potentially down the line this more general approach (or something similar) would be nice.

Get entities by id:

  • action=wbgetentities&ids=Q123
  • action=wbgetentities&ids=Q1|Q2

Get entity by remote site title combination

  • action=wbgetentities&titles=Foo&sites=enwiki
  • action=wbgetentities&titles=Foo|Bar|Baz&sites=enwiki

Get entity by local site title combination?

  • action=wbgetentities&titles=File:Foo.jpg
  • action=wbgetentities&titles=Property:P3
  • action=wbgetentities&titles=Q234

This could generically be implemented within the main wbgetentities API module and not be something specific to mediainfo

Ramsey-WMF moved this task from Untriaged to Next up on the Multimedia board.Oct 6 2017, 6:01 PM

in progress

thiemowmde moved this task from in progress to monitoring on the Wikidata board.Oct 9 2017, 9:19 AM

I have to agree with @Aleksey_WMDE. The suggested approach basically makes an entire class a public interface. The past five years we learned the hard way that this is a pattern that should really, really be avoided.

That's why everything in our classes is private by default, and never protected.

You should consider everything in the Wikibase.git code base final. We are not technically enforcing this by adding the literal "final" keyword all over the place, because this would be more clutter than anything. But that's how you should think of our code.

What @Addshore wrote in T177022#3663888 also sums up what I would like to suggest:

  • If you really need to introduce a new API module (which I would find really unfortunate), please do just that and make the code your own. Copy what you need, and leave out as much as possible.
  • I also wonder what the problem is with resolving a page name to a page ID first, and then using this to query the existing "wbgetentities" module via the parameters it provides? What is the actual benefit the proposed shortcut gives you?
  • An idea I like is to expand the existing "wbgetentities" API module in a way that it always accepts local page names. This could be done with a new parameter, or with the existing "titles" parameter. Such a feature will not only be useful for the MediaInfo entity type, but for all of them: Properties for example can then be requested via their full qualified page name, e.g. "Property:P1" (while "P1" is the entity ID to be provided via the "ids" parameter).

Change 382603 abandoned by MarkTraceur:
Make GetEntities API class extensible

Reason:
OK, I gave it a shot, but I'll try a different approach today.

https://gerrit.wikimedia.org/r/382603

Change 382604 abandoned by MarkTraceur:
[WIP] Add wbgetmediainfo API module

Reason:
OK, this is not the approach we want to use, so I'll attempt a different one in a new patch (probably in Wikibase, not in MediaInfo)

https://gerrit.wikimedia.org/r/382604

@Addshore just to clarify, if I go in and implement this within Wikibase itself, I'll need to make the following changes, and I want to make sure it makes sense...

  • The titles parameter no longer requires sites to be set, if it's not, we assume the titles correspond to local page/entity pairs
  • Connect file pages to MediaInfo entities like Wikidata pages are currently connected to remote pages (unclear how this works currently)
  • Also somehow connect up the titles parameter with the ids parameter so you can use e.g. Q42 in both to pull the same entity, or Property:P4 in titles instead of P4 in ids

Does that jive with your understanding?

To me this sounds about right. The second point is indeed a bit unclear: There should be a MediaInfo-aware implementation of either an EntityTitleLookup (to look up a Title from a known EntityId), or an EntityIdLookup (to look up an EntityId from a known Title), or both. That service should be made available to Wikibase code via WikibaseMediaInfo.entitytypes.php. We might need a new configuration in this file to make this possible.

  • The titles parameter no longer requires sites to be set, if it's not, we assume the titles correspond to local page/entity pairs

Yep, we can either do this or add a separate parameter for looking on the local site called "localtitle" for example.
I haven't put much thought into which would be a cleaner implementation both internally and for external users though.

  • Connect file pages to MediaInfo entities like Wikidata pages are currently connected to remote pages (unclear how this works currently)

So, this point sounds a bit misleading, as it sounds like MediaInfo entities will have a sitelink to a file page (or something).
To me the MediaInfo entity is the file page, they are the same entity.
As @thiemowmde says, we will need some sort of MediaInfo aware EntityTitleLookup, but that is all internal.

  • Also somehow connect up the titles parameter with the ids parameter so you can use e.g. Q42 in both to pull the same entity, or Property:P4 in titles instead of P4 in ids

So this as I see it would be the target base implementation with wbgetentities, allowing you to pass title=Q42 or title=Property:P4 and it still return the correct entity.
The next step would then be allowing the MediaInfo extension to extend this and add ways in which to lookup entities by title (EntityTitleLookups)

OK, I think I have the lookup services mostly written (probably some PHP wiring that I'm not quite perfect on, will deal with details during testing), but I'm a little unclear how I'm meant to hook those up to the existing API module in Wikibase. I obviously can't put MediaInfo-specific code in there, but I'm also not sure if there's already a generalized way to say "for whatever entity type/namespace got passed in, determine the ID or title lookup service, call it with whatever information we have, and return the result", so I'll go into a documentation hole now but if you all have a faster answer I'd love to hear it.

Change 384996 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/WikibaseMediaInfo@master] [WIP] Add lookup services for title <--> ID

https://gerrit.wikimedia.org/r/384996

Change 386692 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/WikibaseMediaInfo@master] [WIP] Look up entity IDs based on title

https://gerrit.wikimedia.org/r/386692

Change 386693 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/Wikibase@master] [WIP] Add hook for more entity lookups based on title

https://gerrit.wikimedia.org/r/386693

Change 391051 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/Wikibase@master] Add hook for injecting alternate entity ID lookup

https://gerrit.wikimedia.org/r/391051

Change 391052 had a related patch set uploaded (by MarkTraceur; owner: MarkTraceur):
[mediawiki/extensions/WikibaseMediaInfo@master] Replace default entity ID lookup for MediaInfo

https://gerrit.wikimedia.org/r/391052

Change 386693 abandoned by MarkTraceur:
[WIP] Add hook for more entity lookups based on title

Reason:
Obviated by https://gerrit.wikimedia.org/r/391051

https://gerrit.wikimedia.org/r/386693

Change 386692 abandoned by MarkTraceur:
[WIP] Look up entity IDs based on title

Reason:
Obviated by https://gerrit.wikimedia.org/r/391052

https://gerrit.wikimedia.org/r/386692

thiemowmde added a subscriber: aude.

Change 386422 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Introduce EntityByTitleLookup for use by EntityByTitleHelper and EntityLoadingHelper.

https://gerrit.wikimedia.org/r/386422

Change 391051 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add hook for injecting alternate entity ID lookup

https://gerrit.wikimedia.org/r/391051

Abit moved this task from Blocked to Doing on the Multimedia-Team-Working-Board board.

Change 391052 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Replace default entity ID lookup for MediaInfo

https://gerrit.wikimedia.org/r/391052

Ramsey-WMF closed this task as Resolved.Mar 12 2018, 4:28 PM