Page MenuHomePhabricator

Expose image metadata to wikitext
Closed, ResolvedPublicFeature

Description

Images have metadata (EXIF, ...) but these metadata are currently not readable from wikitext, so it's currently not possible to have automatic information to some Wikimedia Commons templates, for example, to get description from EXIF tags.

Proposal 1: new parser function

Originally, it was proposed to add a parser function called {{#EXIF}} to use in File: namespace, which can extract metadata from EXIF tags. Then the parser function was proposed as {{#filemetadata:}} to be more generic (EXIF are not the only metadata).

[ Adding esby in CC, who wanted such a mean to print EXIF data into images descriptions ]

In 2013 a PHP patch was proposed by Brian Wolff:

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/67047

⚠️ That PHP patch was abandoned in 2018. Among the reasons: there was not much consensus to create yet another parser function. Doing it in Lua seems better (confirmed by patch author).

Proposal 2: expose from Lua (all metadata)

(This proposal was abandoned)

In 2013 a Lua patch was proposed:

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/67588

⚠️ That Lua patch was abandoned in 2019. Among the reasons:

  • it's too much complicated/confusing to expose all the metadata, because one generic metadata key could have more values
  • unclear what should be done with "more expensive" metadata like DjVu/PDF with long OCR transcripts (suggested by Brian Wolff)

Proposal 3: expose from Lua (small subset)

Late proposal 2025 (revival thanks to Giovanni Pennisi): maybe Lua could "just" expose a limited selection subset of metadata. Some of them, and reasons:

  1. latitude
  2. longitude
  3. altitude
  4. bearing of destination
  5. (more?)

These mentioned metadata are clearly useful in at least one highly useful workflow. For example in these or similar templates:

For development, here is an example file which has latitude, longitude, altitude, bearing, in file metadata:

https://commons.wikimedia.org/wiki/File:Chiesa_di_Nostra_Signora_di_Lourdes_-_2024-09-01_rear.JPG

Maybe Lua could expose a simple associative table with the mentioned keys, and a single value. Partially like MediaWiki does on the file page.

For example exposing such table or something similar:

local metadata = {
    GPSLatitude = 37.5283972222222,
    GPSLatitudeRef = "North",
    GPSLongitude = 15.084325,
    GPSLongitudeRef = "East",
    GPSAltitudeRef = "Above Sea Level",
    GPSAltitude = "97.2340584",
    GPSDestBearingRef = "True North",
    GPSDestBearing = 161.4809877,
}

See Also:

Details

Reference
bz41498
Related Changes in Gerrit:

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:52 AM
bzimport set Reference to bz41498.
bzimport added a subscriber: Unknown Object (MLST).

I think you'd have to specify the internal key of the data you want to extract, so we'll need to come up with some way of exposing it to the user... Maybe in the table on the image page, add another column which is shown to users with a certain preference, or just in the HTML source?

Do we assume the file is the current page? Or does the file name need to be specified in an argument?

Also are we sure this should be in an extension, rather than just another core parser function?

Note - the mediaFunctions extension already supports this. (However, that extension is super old. probably want something new maybe)

(In reply to comment #0)

Add a {{#EXIF}} parser function to use in File: namespace, which can extract
metadata from EXIF tags.

Lets not call it EXIF. We extract all sorts of metadata. Exif is but one type.

Do we assume the file is the current page? Or does the file name need to be

specified in an argument?

That's kind of a minor detail, but I would say default to current page, and allow override in an argument.


The main issue with this, is the implementation of how metadata is stored is kind of messy.
*Different file handlers store the metadata different ways
**Which raises the question of how much knowladge do we want to put in the extension about how metadata is stored.

Most bitmap images store metadata in a somewhat similar way, but still use different keys. Some things use a totally different scheme (ogg files for example). Some field values for the metadata can have multiple values - how should we expose that to the user - do we show all of them, can they select which one. Certain values can have language alternatives (certain XMP properties and PNG iTxT fields).

Related URL: https://gerrit.wikimedia.org/r/67047 (Gerrit Change I49b7d8a05173090f8173e19f8c0d9a878fa346f9)

Wouldn't it make more sense to do this with a Lua module instead of adding more parser functions ?

(In reply to comment #5)

Wouldn't it make more sense to do this with a Lua module instead of adding
more
parser functions ?

Perhaps. OTOH lua is not garunteed to be installed on all third party wikis.

(In reply to comment #6)

(In reply to comment #5)

Wouldn't it make more sense to do this with a Lua module instead of adding
more
parser functions ?

Perhaps. OTOH lua is not garunteed to be installed on all third party wikis.

Actually, personally I would like to have both, with more advanced functionality available to lua (Such as getting a table of all the metadata)

(In reply to comment #7)

(In reply to comment #6)

(In reply to comment #5)

Wouldn't it make more sense to do this with a Lua module instead of adding
more
parser functions ?

Perhaps. OTOH lua is not garunteed to be installed on all third party wikis.

Actually, personally I would like to have both, with more advanced
functionality available to lua (Such as getting a table of all the metadata)

Ok, I tried my hand at adding this to lua as well - https://gerrit.wikimedia.org/r/67588

For the lua part, I just added a method to the title class that can retrieve a table of all the (standard) file metadata. This is the raw unformated data.

I imagine that people using lua for this are more interested in the raw data, and will probably want to format it themselves. If they don't, there's still the parser function (lua after all outputs wikitext).

One of the use cases I imagine for the parser function version, is someone might want to put on the image description page of their image the description from the metadata. This is a simple non-complex use case, and I think is well suited to a parser function.

One point that I'm worried might be sticky in review of this, is the parser function, when in formatted output mode, formats according to user language. There's a number of reasons for this:
*Output then looks exactly like the formatted output in the metadata box on the image page, which varies by user language.
*Commons people (who realistically are probably going to be the main people using this) tend to like everything changing with user language
*Image description pages on commons effectively already always vary by user language. (Almost every template on commons has an {{int:...}} in it), so its not like we're splitting the parser cache, its already split.
*And the bad reason, the formatMetadata class already assumes using user language via a bunch of global state (This is really a non-reason. It can be changed if people don't like the user language thing. The main reason is the first 3)

The idea of adding yet another parser function, particularly a complex one, gives me weird feelings.

What's the use-case here, exactly? Comment 0 seems to indicate that the goal is to incorporate EXIF data into file description pages. I'm not sure a parser function is the appropriate means of doing so, especially one as scary-looking as "{{#filemetadata:Property[|R][|file=name of file]}}".

I haven't looked at any of the code here, to be clear, and I'm sure it's fine. I'm only speaking as to whether the overall idea here is sound. It'd be helpful to have a better idea of what this parser function will be used for. If it's only going to add more magic hackery to Commons, I'd prefer that we ... not. :-)

(In reply to comment #9)

The idea of adding yet another parser function, particularly a complex one,
gives me weird feelings.

What's the use-case here, exactly? Comment 0 seems to indicate that the goal
is
to incorporate EXIF data into file description pages. I'm not sure a parser
function is the appropriate means of doing so, especially one as
scary-looking
as "{{#filemetadata:Property[|R][|file=name of file]}}".

The way I see it, it would be used as follows: If you're on an image page, you can just write {{#filemetadata:ImageDescription}} and it outputs the image description from exif. The other options are things I would consider "advanced options". Most of the time they'd be omitted.

I personally like parser functions, for simple quick jobs, and reserving lua for complex templates, but I may just be weird.

Change 67588 had a related patch set uploaded (by Ricordisamoa):
Add a Lua interface for getting file metadata

https://gerrit.wikimedia.org/r/67588

Change 67588 abandoned by Brian Wolff:
Add a Lua interface for getting file metadata

Reason:
This isn't going anywhere

https://gerrit.wikimedia.org/r/67588

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:14 AM
Aklapper removed a subscriber: wikibugs-l-list.

We can try to improve the task to first explain the need, not the solution.

From my side, I'm quite surprised that it's not easy to retrieve the location coordinates, and the bearing.

Example:

I don't know if @Dereckson agrees on this use case, or if there was a completely different use case in mind. Thanks for sharing

valerio.bozzolan renamed this task from New extension: Image metadata parser functions to Expose image metadata to wikitext.Jul 28 2025, 12:18 PM
valerio.bozzolan updated the task description. (Show Details)
valerio.bozzolan added a subscriber: GiovanniPen.

@GiovanniPen (who raised a metadata question from the chat): note that both patches were abandoned so this task risked the "wontfix".

We can try to improve the task to first explain the need, not the solution.

So it's probably a good idea to rephrase the title, mention the already-tried implementations, and append another small proposal.

Thanks for any opinion about exposing a small subset, e.g. with just latitude, longitude, altitude, and bearing (and what else?). Thanks again to the task author for their extra comments. I'm not sure if I'm helping you both, or just Giovanni. Thanks :3

hello, just to notify that on https://commons.wikimedia.org/wiki/Category:Media_with_GPS_EXIF there are more tha 300k files, most of them are there because of the {{GPS EXIF}} template added by BotMultichill (not BotMultichillT). so it means that the bot can read and recognize that there are coordinate exif in the metadata information but not the location template.
edit: the bot owner [https://commons.wikimedia.org/w/index.php?title=User_talk:Multichill&diff=prev&oldid=1065446591 here] says that "The bot that was extracting it from EXIF broke down" so he started tagging files with {{tl|GPS EXIF}} (filling up [[:Category:Media with GPS EXIF]]) to get the missing coordinates added (ndr: how? not manually? asd)

I think if people want to pick this up again, you first need to give a very good answer to the question: WHY and HOW would you use this, instead of using the information from CommonsData (which is clearly the preferred route) or from the wiki text itself (which is the traditional route and the one most of the community still see as 'canonical').

One of the problems I see with information extracted directly from the images, is that it is not versioned together with the page. So if someone uploads a new image, your page using this information can change, which is simply strange. A lot of the information is also already provided by the user, and we put more trust in that. in some ways, we REQUIRE this information to be user provided information, as you need to know the magic incantations that something like Commons wants you to use. There is both a CommonsData bot reading this info and putting it in CommonsData, and there is a step inside the UploadWizard, that reads GPS information and puts it in a Location template. The more things we add, the harder it makes it to keep everything aligned and understandable.

Looking specifically at the example of @valerio.bozzolan wrt T363052, especially, I'm thinking it is way better to have bots update CommonsData to add heading info, and then reading that info with your external tool.

I think if people want to pick this up again, you first need to give a very good answer to the question: WHY and HOW would you use this, instead of using the information from CommonsData (which is clearly the preferred route) or from the wiki text itself (which is the traditional route and the one most of the community still see as 'canonical')

Answer: on new images, often this info is not available in Wikitext.

Also, on already-existing images, somebody may have done typos in the wikitext.

So, exposing EXIF metadata to Lua, may still be useful in these two cases: so to have a good default for some templates, and, allow easy cross-checks with CommonsData (I'm not interested in this thought)

But yes, a bot that extracts EXIF and puts this in the CommonsData directly, would also close the gap, but maybe this does not scale very well for other smaller external wikis. So, they don't need a bot to show the lat/lng/altitude/compass, but they can just write a simple template in Lua, I guess.

The problem is when the uploadWizard didn't do the job properly or if you use a different uploader... the bot on commons that previously did this job broke; I spoke to the owner and he told me that now it only adds the template {{GPS EXIF}}. for the rest I agree with what bozzy said, expecially could be useful in places where the bot doesn't exist.

Change #67588 restored by Brian Wolff:

[mediawiki/extensions/Scribunto@master] Add a Lua interface for getting file metadata

https://gerrit.wikimedia.org/r/67588

Wow, awkward moment where I disagree with everything I wrote in 2013.

The other day I was writing a template where this would have been useful, so I'd like to try and resurrect this (The version that is add a func to lua).

I think if people want to pick this up again, you first need to give a very good answer to the question: WHY and HOW would you use this, instead of using the information from CommonsData (which is clearly the preferred route) or from the wiki text itself (which is the traditional route and the one most of the community still see as 'canonical').

One of the problems I see with information extracted directly from the images, is that it is not versioned together with the page. So if someone uploads a new image, your page using this information can change, which is simply strange. A lot of the information is also already provided by the user, and we put more trust in that. in some ways, we REQUIRE this information to be user provided information, as you need to know the magic incantations that something like Commons wants you to use. There is both a CommonsData bot reading this info and putting it in CommonsData, and there is a step inside the UploadWizard, that reads GPS information and puts it in a Location template. The more things we add, the harder it makes it to keep everything aligned and understandable.

Looking specifically at the example of @valerio.bozzolan wrt T363052, especially, I'm thinking it is way better to have bots update CommonsData to add heading info, and then reading that info with your external tool.

The usecase I thought would be cool would to make things like {{pano360}} template automatically be added to the {{information}} box if the file's metadata said it was a panorama. I'd agree that in most cases what you want is SDC, but i think there are some cases where getting the actual info out of the file can be useful to.

In any case, we basically already have this information, it seems like a waste not to expose it.

Change #67588 merged by jenkins-bot:

[mediawiki/extensions/Scribunto@master] Add lua interface for fetching metadata.

https://gerrit.wikimedia.org/r/67588

Bawolff claimed this task.