Page MenuHomePhabricator

[EPIC] Image-positioning service for storing and retrieving image focal points
Closed, DeclinedPublic


(Moving our mobile tech discussion to Phab to let the Services folks see it and chime in...)

Readers love the new lead banner images in the latest Android release of the Wikipedia app[0] – but currently, if we wanted to port this design change to the mobile or desktop site, images would be cropped randomly and most of the images of people would have their heads chopped off. No bueno.

The Apps team has used native face-detection libraries to avoid the chopped-off-head problem, and Max from the mobile web team found a library[1][2] that we could use to build a generalizable face-detection service for everyone (apps, mobile web, desktop). In the short term, this would unblock the mobile web team from releasing a design update to give our users more parity between our two mobile experiences (apps and mobile web). This would also unlock our ability to evolve the design of desktop lead images, too, replacing the templates that projects like WikiVoyage are using to create banner images (which are static and pretty broken on mobile).[3]

Face-detection would get us 9/10th of the way to a good user experience, but even in the apps there are currently still some edge-case issues with cropping and positioning, so folks from the mobile teams have also discussed a more general image-positioning service. I'm not one to let the perfect be the enemy of the good, though, so if we could just request help from the Services team to conquer the chopped-off-head problem, I'd be very grateful ;)


Event Timeline

Maryana raised the priority of this task from to Needs Triage.
Maryana updated the task description. (Show Details)
Maryana added subscribers: Maryana, MaxSem, Dbrant and 5 others.

@GWicke @Jdouglas @mobrovac Thoughts from you guys? Does this seem like a use case that a Node.js service and RESTBase could support?

Definitely! Let's put it in the hopper.

Is there a way to designate this task as a backlog item for RESTBase, other than to just tag it as RESTBase?

Additional use-case to consider: sometimes the lead image is really not appropriate as a banner image (think scary medical conditions you don't want to see close up). It would be great if this service could support having new images pushed to it, so users could select a better lead image from all the ones in the article or on Commons without having to edit the page :)

My dream scenario
This data would be associated with the images themselves (via CommonsData?) rather than being tied to a service or specific use case.

Here's what I imagine it would look like:
Assume Commons images had a "focal area" concept which comprised a rectangle* and an optional article title (omitting lang to simplify the example):

{"rect": [0.20, 0.20, 0.12, 0.12], "title": "dog"}

( * Unit rectangle coordinates mean they can be easily applied to any size variant of the image. Note the rect coordinates above denote [origin x, origin y, width, height] )

A given image can, or course, have more than one of these focal areas, so imagine a picture of Obama watching a dog chase a cat:

     {"rect": [0.20, 0.20, 0.12, 0.12], "title": "dog"}, 
     {"rect": [0.60, 0.20, 0.22, 0.30], "title": "cat"}, 
     {"rect": [0.10, 0.80, 0.12, 0.30], "title": "Barack Obama"} 

So a get/set-able "focalareas" array property would be super cool:

    {"rect": [0.20, 0.20, 0.12, 0.12], "title": "dog"}, 
    {"rect": [0.60, 0.20, 0.22, 0.30], "title": "cat"}, 
    {"rect": [0.10, 0.80, 0.12, 0.30], "title": "Barack Obama", "isFace": true; "isMainFocus": true}, 

( Note: "rect" could instead be named "region" and "focalareas" could be "subregions", if people like that better. )

With "isFace" faces would be easy to distinguish.

With "isMainFocus" an image's primary focal area would be easy to specify - it may or may not be a face, of course, but only one focal area per image can have "isMainFocus" set to true.

Ease of use
With this approach it would be super easy for desktop, mobile or apps to clip (css "clip" property for web) any size variants of an image intelligently without the need to maintain variant specific data or literally cropped image binaries.

If the "focalareas" array could be stored such that entries were sorted in area-descending order, it would be even better - you'd always know the biggest focal area is first in the array and the smallest is last.

Edge cases
An optional qualitative flag (such as "isBadCoverImage") could be used to address edge case surgery/etc images, but this is a property of the overall image and as such is probably a candidate for a CommonsData property of the image, not a flag on an image sub-region.

Apps interfaces for editing regions or curating recently modified regions
Apps specifically could very quickly mock up super simple interfaces for experimenting with user editing of these focal areas if we had a way to store/retrieve such data. Think simple pinch-zoom-drag adjustment of translucent focal area overlays with one-tap to search and select an article "title" associated with the tapped focal area. An app interface for quickly curating and reviewing recently modified image focal areas would be easy too.

Web interfaces too
Web interfaces for the same, especially if informed by learnings from app proofs-of-concept, should be fairly simple.

Better Commons search
Commons image search could also be enhanced to (optionally) search against this richer dataset for better matches. Think an option to restrict searches to matching regions in images - i.e. against titles associated with image focal areas - "dog cat obama", from the example above.

I see several requirements here:

  • some service or library for face detection / other alignment inference
  • a place to store image positions, in a way that
    • is updated when images are re-uploaded
    • makes it convenient to retrieve alignment along with other image info (prop=imageinfo?)

From a logical grouping perspective I would think that whatever code does our image scaling would be best placed to integrate the image position stuff at a higher level. The calculation itself could well happen in a discrete service, but I'm hesitant to store alignment information separately from the other image information without a good reason & a good plan for keeping it up to date. The move to content hash-based image urls can help with the update problem even if stored externally, but the query part still remains.

For the lead image, we currently don't have a great way to store page properties persistently in a way that survives a re-parse. This is an area where restbase can potentially help, but we would again need to make sure that this information is properly updated / degrades well if the stored lead image preference is removed from the page or outright deleted.

More generally, I am a bit hesitant to start storing random bits of separate metadata per page without having a better idea for how we plan to organize page metadata in the longer term. Maybe a collection of random blobs is fine, but it might also pay off to think a bit about the general update and query requirements. There is also the idea to move wikitext-encoded page properties like categories, behavior switches etc to its own separate blob (see T55508).

If we don't get a commons-wikibase setup or something to store the data in, I recommend entering the data into the pages via a parser function, just as we do with coordinates in GeoData.

This could store to page_props and would survive reparses, and would already give you versioning, ability to revert, etc.

Could then migrate it along with other things to wikibase or whatever Commons ends up with for versioned media metadata...

... and makes it convenient to retrieve alignment along with other image info (prop=imageinfo?)

IMO this is a separate concern and we should stick to solving the root problem: helping the user frame the article's lead image (for a given platform/device?). There also seems to be a desire for the user to select (or upload?) a better lead image.

Maybe there's a separate service which suggests focal areas to assist user cropping, but AFAICT we've determined that tweaking will need to happen regardless. Also, we can already do some of this on (native) clients. Any aggregation w/ articles or imageinfo data can happen downstream.

I like Brion's idea of implementing this as a parser function (at least until Commons has some solution for storing image metadata).

@Mhurd I think having a method to just get a single aggregated rectangle, which represents the outer bounds of potentially multiple smaller focal areas (plus some amount of padding), would be good. No need to transfer all the data to the client if it's not interested in the details just to crop a lead image. Let's keep it simple and light-weight for the clients.

Jdlrobson moved this task from 2015-16 Q4 to Upcoming on the Readers-Web-Backlog board.
Jdlrobson moved this task from Upcoming to 2016-17 Q4 on the Readers-Web-Backlog board.

I recommend entering the data into the pages via a parser function, just as we do with coordinates in GeoData.

That's T91683: Allow editors control of the page image, I'll make it a blocking task for this.

So many people struggling because readers want lead images...
When will real issues get the attention they deserve?

Jdlrobson triaged this task as Medium priority.Sep 16 2015, 6:56 PM
Jdlrobson added a subscriber: Jdlrobson.

Wikivoyage now has lead images and supports a origin parameter

A service would greatly benefit making its banners more mobile friendly.

Jdlrobson renamed this task from Image-positioning service to [EPIC] Image-positioning service for storing and retrieving image focal points.Sep 18 2015, 8:41 PM

Given age and lack of activity and fact everyone is tracking/monitoring :)