Page MenuHomePhabricator

API categories on files have a huge lag
Closed, DeclinedPublic

Description

When a file is uploaded, sometimes its not being properly reported by the API for a very long time after the initial upload.

For example, File:DINNER (held by) ? (at) NIAGARA TO THE SEA (SS?;) (NYPL Hades-274181-468456).jpg looked like this 15 hours after upload:

{
    "batchcomplete": "",
    "query": {
        "pages": {
            "48145796": {
                "pageid": 48145796,
                "ns": 6,
                "title": "File:DINNER (held by) ? (at) NIAGARA TO THE SEA (SS?;) (NYPL Hades-274181-468456).jpg"
            }
        }
    }
}

Even purging the page does not help. However, making a null edit does in fact always fix it:

{
    "batchcomplete": "",
    "query": {
        "pages": {
            "48145796": {
                "pageid": 48145796,
                "ns": 6,
                "title": "File:DINNER (held by) ? (at) NIAGARA TO THE SEA (SS?;) (NYPL Hades-274181-468456).jpg",
                "categories": [
                    {
                        "ns": 14,
                        "title": "Category:Buttolph collection of menus"
                    },
                    {
                        "ns": 14,
                        "title": "Category:Images from the New York Public Library"
                    },
                    {
                        "ns": 14,
                        "title": "Category:Images uploaded by F\u00e6"
                    },
                    {
                        "ns": 14,
                        "title": "Category:Media needing category review as of 12 April 2016"
                    },
                    {
                        "ns": 14,
                        "title": "Category:PD-scan (PD-1923)"
                    },
                    {
                        "ns": 14,
                        "title": "Category:PD 1923"
                    }
                ]
            }
        }
    }
}

It looks like this has been occurring for over a year (link). It is causing tools to incorrectly report the categories of files; for example, see the number of false positives on this page .

Event Timeline

Magog_the_Ogre renamed this task from API categories on files have a huhttps://phabricator.wikimedia.org/maniphest/task/edit/form/1/#ge lag to API categories on files have a huge lag.Apr 13 2016, 3:08 AM
Poyekhali triaged this task as Medium priority.Apr 13 2016, 4:47 AM
Poyekhali subscribed.

@Pokefan95 Are you planning to work on this or why you set the priority?

Anomie subscribed.

This is unlikely to have anything to do with the API itself, and everything to do with whatever jobs are supposed to be populating the categorylinks table on upload.

Aklapper lowered the priority of this task from Medium to Low.Apr 30 2016, 5:03 PM

However, making a null edit does in fact always fix it

Just FYI: my bot null edits the page twice before doing anything. Once was usually enough before category updates went to job queue.

Krinkle subscribed.

The job queue can sometimes take upto 24 hours to complete a task when there are lots of jobs enqueued. Various improvements have been made over the years to ensure jobs run as quick as they can, but they can still take longer sometimes. For example, we already fragment queues by wiki and by job type (e.g. if Commons has a lot of video transcode tasks that take hours to process, a new job for category links update will still be processed immediately in another job runner.) But if all queues have a backlog, then it may take a while.

I'm closing this for now as there isn't an immediate bug here that we can solve.

See https://wikitech.wikimedia.org/wiki/Job_queue for more information. Especially the status dashboards.