Page MenuHomePhabricator

Query-continue for imageinfo returns timestamp which is ambiguous if multiple files were uploaded within the same second under the same file name
Open, LowestPublic

Description

Consider the following query:
http://commons.wikimedia.org/w/api.php?action=query&titles=File:Test.svg&prop=imageinfo&iilimit=10&format=jsonfm

... and the result:

"query-continue": {
    "imageinfo": {
        "iistart": "2014-05-07T18:26:53Z"
    }
},

... when using that iistart continue param, it's possible to get imageinfo for the same file revision again if iilimit was one because it's in theory possible to upload *two file revisions to an image within one second*.

Expected: Either a more precise timestamp for the short term or one has to consider adding ids to image revisions (which requires DB changes: Add ID column, [use parent ids for old file names instead of parent timestamps, move a lot of files,] ..., amendments to the API).

Sooner or later we'll have to fix that anyway.


Version: unspecified
Severity: minor
See Also:
T26782: API uses non-unique value for paging for some modules
T17441: Some tables lack unique or primary keys, may allow confusing duplicate data

Details

Reference
bz65251

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:23 AM
bzimport set Reference to bz65251.
bzimport added a subscriber: Unknown Object (MLST).

I'm sure this is another exposure of some existing bug.

Can be fixed using the same technique as bug 24782, but bug 15441 on oldimage needs to be resolved first to make it doable.

Should Bug 24782 be re-opened and depend on this one which in turn would depend on Bug 15441? Or duped as Bug 24782?

There's nothing the API can do about this right now, since the FileRepo code treats (title,timestamp) as a unique ID. It may be that FileRepo actually does enforce (title,timestamp) being unique[1]; I don't see an actual *example* of this bug given here, just a hypothetical.

So if this bug can actually occur, it would be blocked by a bug on FileRepo needing a truly unique ID in its interface. If it can't actually occur, it's just invalid.

CCing Aaron on this, since he actually knows how the FileRepo and FileBackend code works.

[1]: Effectively rate-limiting uploads to 1 per second per title, which seems entirely sane if it does so.

Effectively rate-limiting uploads to 1 per second per title

That's not a real solution and a bit unintuitive. No one would expect a one-second-limit - including API client authors, core contributors, and people abstracting FileRepo and FileBackend.

Based on comments above I did some experiments at https://zh.wikipedia.org/wiki/File:%E6%B2%99%E7%9B%92.png and didn't upload any two files in the same second successfully.

In meantime I got a bunch of errors from API, like MediaWikiApiError: internal_api_error_MWException: Exception Caught: Transaction idle callbacks still pending.; request: {'comment': u"UTFComment: le`

&\\?' (random)", 'ignorewarnings': True, 'format': 'json', 'filename': u'\u6c99\u76d2.png', 'token': <function <lambda> at 0x194aa28>, 'file': <open

file '/tmp/filefHOtkX', mode 'rb' at 0x189a930>, 'action': 'upload'}

(In reply to Liangent from comment #6)

In meantime I got a bunch of errors from API, like MediaWikiApiError:
internal_api_error_MWException: Exception Caught: Transaction idle callbacks
still pending.; request: {'comment': u"UTFComment: le`

&\\?' (random)", 'ignorewarnings': True, 'format': 'json', 'filename':

u'\u6c99\u76d2.png', 'token': <function <lambda> at 0x194aa28>, 'file':
<open
file '/tmp/filefHOtkX', mode 'rb' at 0x189a930>, 'action': 'upload'}

Filed that as bug 65263, FYI.

(In reply to Liangent from comment #6)

I did some experiments

Oh, thank you. That's very kind of you.

and didn't upload any two files in the same second successfully

Some luck is required to do so:
http://commons.wikimedia.org/w/api.php?action=query&titles=File:Treppe%2022%2022%20test%20upload.jpg&prop=imageinfo&iilimit=10&format=jsonfm

{

"query": {
    "pages": {
        "32597583": {
            "pageid": 32597583,
            "ns": 6,
            "title": "File:Treppe 22 22 test upload.jpg",
            "imagerepository": "local",
            "imageinfo": [
                {
                    "timestamp": "2014-05-05T13:00:59Z",
                    "user": "Rillke"
                },
                {
                    "timestamp": "2014-05-05T13:00:59Z",
                    "user": "Rillke"
                }
            ]
        }
    }
}

}

{

"query-continue": {
    "imageinfo": {
        "iistart": "2014-05-05T13:00:59Z"
    }
},
"query": {
    "pages": {
        "32597583": {
            "pageid": 32597583,
            "ns": 6,
            "title": "File:Treppe 22 22 test upload.jpg",
            "imagerepository": "local",
            "imageinfo": [
                {
                    "timestamp": "2014-05-05T13:00:59Z",
                    "user": "Rillke"
                }
            ]
        }
    }
}

}

Awesome, the bug is confirmed now. I'll file the bug that blocks this one momentarily.

(In reply to Rainer Rillke @commons.wikimedia from comment #8)

Some luck is required to do so:
http://commons.wikimedia.org/w/api.php?action=query&titles=File:
Treppe%2022%2022%20test%20upload.jpg&prop=imageinfo&iilimit=10&format=jsonfm

Hmm I guess the requirement is to stash files first then "commit" simultaneously?

(In reply to Liangent from comment #12)

Hmm I guess the requirement is to stash files first then "commit"
simultaneously?

Yes. I was reproducing an UploadWizard bug. UploadWizard is not supposed to overwrite files at all.

@Anomie Will the new API continue parameter fix this issue?

No. T67264 is needed before this can be fixed.