Page MenuHomePhabricator

Conflicting timestamp in file history
Open, Needs TriagePublic

Description

This bug caused a script of mine to crash, and I cannot work out what is going on with the file.
File:HK 九龍灣 Kln Bay Wang Tai Road 一號九龍 One Kowloon indoor carpark entrance April-2012.jpg

On querying the file history, I get 'FileInfo' object has no attribute 'timestamp'.

The display of filehistory on Commons shows the filehistory apparently out of order by date. Presumably a result of the record being corrupted in some way.

Event Timeline

Reedy added a subscriber: Reedy.

On querying the file history, I get 'FileInfo' object has no attribute 'timestamp'.

Querying how? Via the API?

Via pywikibot:

page = pywikibot.ImagePage(site, f['title'])
rev = page.getFileVersionHistory()[-1]

Same error on:
File:HK 九龍灣 Kln Bay 臨豐街 Lam Fung Street view 企業廣場 Enterprise Square II April-2012.jpg

Only these two examples so far. The Commons image page shows file history out of order and not rendering the first thumbnail.

https://commons.wikimedia.org/w/api.php?action=query&titles=File:HK%20%E4%B9%9D%E9%BE%8D%E7%81%A3%20Kln%20Bay%20Wang%20Tai%20Road%20%E4%B8%80%E8%99%9F%E4%B9%9D%E9%BE%8D%20One%20Kowloon%20indoor%20carpark%20entrance%20April-2012.jpg&prop=imageinfo&iilimit=max

Looks like there's some discrepency between what the API returns, and what the file page displays:

"imageinfo": [
    {
        "timestamp": "2012-04-03T07:03:18Z",
        "user": "Maegistro"
    },
    {
        "filemissing": ""
    }
]

Screenshot 2019-11-26 at 10.45.18.png (570×1 px, 184 KB)

The api just marks the file as missing, but doesn't show the user/timestamp. Which seems like a bug to me...

There's also an issue apparently in Pywikibot handling of this case too... And as this is how MW has seemingly displayed it historically (without checking if/when it changed)...

Restricted Application added a subscriber: pywikibot-bugs-list. · View Herald Transcript

And in the case of the API... It's due to $exists being used in a few places as a guard to the output

		// Some information will be unavailable if the file does not exist. T221812
		$exists = $file->exists();

		// Timestamp is shown even if the file is revdelete'd in interface
		// so do same here.
		if ( isset( $prop['timestamp'] ) && $exists ) {
			$vals['timestamp'] = wfTimestamp( TS_ISO_8601, $file->getTimestamp() );
		}

And also

		if ( ( $user || $userid ) && $exists ) {
			if ( $file->isDeleted( File::DELETED_USER ) ) {
				$vals['userhidden'] = true;
				$anyHidden = true;
			}
			if ( $canShowField( File::DELETED_USER ) ) {
				if ( $user ) {
					$vals['user'] = $file->getUser();
				}
				if ( $userid ) {
					$vals['userid'] = $file->getUser( 'id' );
				}
				if ( !$file->getUser( 'id' ) ) {
					$vals['anon'] = true;
				}
			}
		}

I'm not sure 'dimensions' makes sense to be shown if the file doesn't exist. But if we're making it consistent with the UI... The date/time and user should at least.

Change 553085 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Make API fileinfo display same info as File Page

https://gerrit.wikimedia.org/r/553085

Reedy added subscribers: mobrovac, daniel.

It's caused by the fix for T221812: Some ApiQueryImageInfo queries consistently fail with a fatal BadMethodCallException from LocalFile.php in bdc6b4e378c6872a20f6fb5842f1a49961af91b4 (only in 1.34 and newer)

Reading the changelog...

* In the response to queries that use 'prop=imageinfo', entries for
  non-existing files (indicated by the 'filemissing' field) now omit the
  following fields, since they are meaningless in this context:
  'timestamp', 'userhidden', 'user', 'userid', 'anon', 'size', 'width',
  'height', 'pagecount', 'duration', 'commenthidden', 'parsedcomment',
  'comment', 'thumburl', 'thumbwidth', 'thumbheight', 'thumbmime',
  'thumberror', 'url', 'sha1', 'metadata', 'extmetadata', 'commonmetadata',
  'mime', 'mediadtype', 'bitdepth'.
  Clients that process these fields should first check if 'filemissing' is
  set. Fields that are supported even if the file is missing include:
  'canonicaltitle', ''archivename' (deleted files only), 'descriptionurl',
  'descriptionshorturl'.

It's clear Pywikibot needs an update for this use case.

Though I don't completely agree with the change log of all these things being "meaningless in this context", especially if the UI is showing some of them...

Even if the file is physically missing (due to whatever reason, corruption, deleted to save disk space), at least some of those entries still have value

CC @daniel and @mobrovac as author and reviewers

Though I don't completely agree with the change log of all these things being "meaningless in this context", especially if the UI is showing some of them...

T221812: Some ApiQueryImageInfo queries consistently fail with a fatal BadMethodCallException from LocalFile.php seems to have been concerned with images that don't exist at all, while this task is about images where the metadata exists in image or oldimage but the file itself is missing.

It's meaningless to display "Unknown user" as the user, the current time as the timestamp, 0 for size, width, and height, and so on for the former, but for the latter those all have valid values.

As examples of this bug seem rare, it seems worth noting another example that popped up during categorization this afternoon:
2013-03-26 File:HK 銅鑼灣 Causeway Bay 糖街 Sugar Street evening The Point Causeway Square shop 領域電訊 CityLink Mar-2013 Miss Chrissie Chau.JPG

Same error of 'FileInfo' object has no attribute 'timestamp'. This image was uploaded nearly a year after the last example, probably the same real-life user based in Hong Kong creating a 'disposable' account for the uploads but using a different camera type.

Additional:
2012-11-24 File:Five Districts Business Welfare Association School.JPG

Change 553085 abandoned by Reedy:
Make API fileinfo display same info as File Page for missing files

https://gerrit.wikimedia.org/r/553085

Further example is Negros_Oriental_State_University.jpg, per diff, the file cannot be deleted which seems to be caused by the first entry in the filehistory being corrupted.

Any image with a version showing "No thumbnail" will probably give you the response described in this task. It's not necessary to give more examples of the same.

The inability to delete that file is not related to this task, although it may be another side effect of whatever is the reason for the "No thumbnail" version. See, for example, T173374 which was filed long before the change that caused the responses causing you trouble here.

To avoid clogging this task up with examples, I have posted 408 examples from June 2013 of this same file overwrite bug to Faebot/SandboxU.

These are cases where the entry in the log shows both an upload and an overwrite at precisely the same timestamp. Whether this type of search returns all these types of failures, I don't know.

Another test case is an upload from 2007. Though overwritten in 2008, this does not stop the timestamp problem from cocking up programs:
File:Potenzmenge_von_A.png