Page MenuHomePhabricator

Collect ResourceTiming data of top article image
Closed, ResolvedPublic

Description

It's a signal worth studying for correlation with user sentiment as part of the survey, especially since we don't collect any ResourceTiming data at the moment.

Event Timeline

Gilles triaged this task as Medium priority.Jun 22 2018, 4:51 PM
Vvjjkkii renamed this task from Collect ResourceTiming data of top article image to 6paaaaaaaa.Jul 1 2018, 1:03 AM
Vvjjkkii removed Gilles as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 6paaaaaaaa to Collect ResourceTiming data of top article image.Jul 2 2018, 4:31 AM
CommunityTechBot assigned this task to Gilles.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.

Identifying the top image in the article (from the content meaningfulness perspective) is a bit tricky, because some infoboxes point directly to images, without them being thumbnails in the wikitext semantic way of the terms (which gets those img elements a special class), and those infoboxes also sometimes contain tiny icons. The tiny icons, while not very significant visually, might be before the first large image people pay attention to in the DOM.

Maybe a size minimum would do to filter out icons? I.e. the first image in the article DOM with a surface greater than X square pixels.

This seems to do the trick:

var srcset, urls = [], img = $( '.mw-parser-output img' ).filter( function( idx, e ) { return e.width * e.height > 100 * 100; } )[0], resources = performance.getEntriesByType("resource");

if ( img && resources ) {
    urls.push( img.src );
    srcset = img.srcset;
    srcset.split(',').forEach( function( src ) {
        var url = src.trim().split(' ')[0];

        if ( url ) {
            urls.push( 'https:' + url );
        }
    } );

    resources.forEach( function ( resource ) {
        if ( resource.initiatorType !== 'img' ) {
            return;
        }

        urls.forEach( function ( url ) {
            if ( resource.name === url ) {
                console.log( resource );
            }
        } );
    } );
}

Created the schema to collect the ResourceTiming data: https://meta.wikimedia.org/wiki/Schema:ResourceTiming

I ended up making it generic and not specific to the top image, in case we want to expand the use to other use cases, if we want to record data for other resources than the top image.

Change 458778 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Collect ResourceTiming data of top article image

https://gerrit.wikimedia.org/r/458778

Change 458778 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Collect ResourceTiming data of top article image

https://gerrit.wikimedia.org/r/458778

Change 460340 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/extensions/NavigationTiming@master] Add comments about top image resource timing

https://gerrit.wikimedia.org/r/460340

Change 460340 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] Add comments about top image resource timing

https://gerrit.wikimedia.org/r/460340

Data is correctly being recorded in event.resourcetiming on hive.