Page MenuHomePhabricator

[Feature Request]: Querying image description with Pywikibot API
Open, Needs TriagePublic

Description

It's impossible to get image description from Commons via pywikibot API.

Image is represented as FilePage (https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.html#pywikibot.FilePage). And while it allows to access Commons HTML page or e.g. revision history, it's impossible to get some useful descriptive properties such as description.

Below is provided workaround how to do that, although it's errorsome and also inefficient (need to first transfer and them parse entire HTML page). Thus it would be great to have that in API (such as, get_description or get_property(name="description") of some kind):

import pywikibot
from html.parser import HTMLParser
from html.entities import name2codepoint

class _MyHTMLParser(HTMLParser):
    _description = ""
    _tag_counter = 0
    
    def handle_starttag(self, tag, attrs):
        if self._tag_counter > 0:
            self._tag_counter += 1
        
        for attr in attrs:
            if attr == ('class', 'description'):
                self._tag_counter = 1
                

    def handle_endtag(self, tag):
        if self._tag_counter > 0:
            self._tag_counter -= 1

    def handle_data(self, data):
        if self._tag_counter > 0:
            self._description += data
        
    def get_description(self):
        return self._description

def get_description(img):
    html = img.getImagePageHtml()
    
    parser = _MyHTMLParser()
    parser.feed(img.getImagePageHtml())
    return parser.get_description().replace("\n", "")

site = pywikibot.Site('en', 'wikipedia')
page = pywikibot.Page(site, "Mary_Shelley")
img = list(page.imagelinks())[0]

print(get_description(img))

Event Timeline

It's impossible to get image description from Commons via pywikibot API.
Because it is not exposed by mediawiki API, afaik.

Anomie added a subscriber: Anomie.

Because it is not exposed by mediawiki API, afaik.

The core API is not in the business of parsing data out of the wikitext.

The CommonsMetadata extension provides an ImageDescription extracted from the {{Information}} template with prop=imageinfo&iiprop=extmetadata.

StructuredDataOnCommons may someday provide a description as structured data (currently it only provides "captions", I'm not sure of the difference), which would be accessed via Wikibase API endpoints such as action=wbgetentities.