Page MenuHomePhabricator

action=query list=checkuserlog API response is inconsistent with other action=query responses
Open, Stalled, Needs TriagePublic

Description

querying logevents returns something like this:

{
    "warnings": {
        "query": {
            "*": "…"
        }
    },
    "batchcomplete": "",
    "continue": {
        "lecontinue": "20150511204708|66314508",
        "continue": "-||"
    },
    "query": {
        "logevents": [
            {
                "logid": 66314515,
                "ns": 2,
                "title": "User:Jimbo Wales/Facebook/picture.css",
                "pageid": 0,
                "logpage": 14655653,
                "params": {},
                "type": "delete",
                "action": "delete",
                "user": "Jimbo Wales",
                "timestamp": "2015-05-11T20:47:24Z",
                "comment": ""
            }
        ]
    }
}

By contrast, Querying the CheckUser log returns data like this:

{
    "batchcomplete": "",
    "query": {
        "checkuserlog": {
            "entries": []
        }
    }
}

Note the extra layer of nesting in checkuserlog. This makes building API clients more complicated. :(

Event Timeline

lfaraone raised the priority of this task from to Needs Triage.
lfaraone updated the task description. (Show Details)
lfaraone added a project: CheckUser.
lfaraone added subscribers: lfaraone, Anomie.
lfaraone set Security to None.

Changing this would be a backwards compatibility break, which isn't something to do lightly. But this also seems likely to be a little-used module, so it's not necessarily impossible.

To proceed, we'd need to try to determine how many clients are actually using checkuserlog to weigh the advantage of improved consistency against the disadvantage of breaking all existing clients.

Changing this would be a backwards compatibility break, which isn't something to do lightly. But this also seems likely to be a little-used module, so it's not necessarily impossible.

To proceed, we'd need to try to determine how many clients are actually using checkuserlog to weigh the advantage of improved consistency against the disadvantage of breaking all existing clients.

I suspect the number is quite low :)

Where can we find such statistics?

Change 791821 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Standardise CheckUserLog API response

https://gerrit.wikimedia.org/r/791821

I've written a patch that removes "entries" from the API response so that this might be bumped.

+1 to the change on technical merits, but as noted, this is a breaking change for anyone using this API, which is a non-zero amount of people. @Dreamy_Jazz: if I (privately) gave you a list of all the users who've used the API in the past week, could you reach out via the checkuser mailing list to get their support/consent/etc. for making this breaking change? It's unclear to me if it's just one script that everyone uses which will need updating, or if there are a bunch of scripts that would need updating.

For my reference, the list can be obtained with legoktm@mwlog1002:~$ grep list=checkuserlog culog.txt | awk '{ print $11 }' | sort | uniq

I could ask right now on the list. If you'd like me to get affirmations from everyone that uses it then I would probably need a list. I would keep it private as requested.

One tool that I know uses the checkuserlog API is SPI Tools (maintained by @RoySmith) which may explain some of the uses because it uses OAuth to view the CU log and collate the info. It is being kept up to date so if this change was to go through the tool would be updated. I think it it's only active on enwiki.

I've sent an email to checkuser-l asking for anyone that uses the CheckUserLog API. SPI Tools would account for a good number of enwiki CUs, but I'm pretty sure it's enwiki only.

@Dreamy_Jazz thanks for the ping. Yes, this change would break SPI-Tools. When I was building that part of the code, I did notice that things were wrapped in an extra layer. At the time, I just assumed something was strange on my end and worked around it (code here).

I'm fine with updating my code to handle this breaking change. It's an easy fix on my end and would certainly make my code cleaner. Let's just make sure we coordinate on when the change happens. Perhaps you could add a

"version": 2

attribute to the result, so client code would know which flavor it was parsing. Define the "version" attribute as defaulting to 1 if missing. Then I could roll out code that handles either version, and after you've updated the server side code, I can clean it up at my leisure.

+1 to the change on technical merits, but as noted, this is a breaking change for anyone using this API, which is a non-zero amount of people.

My understanding is the only way this API can be used is if you have an OAuth consumer key with the "Access checkuser data" attribute. Shouldn't it be trivial to query the database of consumer keys that have been issued to find the ones with that attribute set?

+1 to the change on technical merits, but as noted, this is a breaking change for anyone using this API, which is a non-zero amount of people.

My understanding is the only way this API can be used is if you have an OAuth consumer key with the "Access checkuser data" attribute. Shouldn't it be trivial to query the database of consumer keys that have been issued to find the ones with that attribute set?

nae, as the API can be called directly (e.g.) by suitably authenticated users — for example, in a user script using mw.Api:

var api = new mw.Api();
api.get( {
    action: 'query',
    list: 'checkuserlog',
    culuser: 'TheresNoTime',
    cullimit: 5
} ).done( function ( data ) {
    console.log( data );
} );

I could ask right now on the list. If you'd like me to get affirmations from everyone that uses it then I would probably need a list. I would keep it private as requested.

Thanks! I don't think we actually need anything as formal as affirmations, it's mostly that we want to avoid breaking critical workflows/tools for a pretty trivial tech debt cleanup.

I'm fine with updating my code to handle this breaking change. It's an easy fix on my end and would certainly make my code cleaner. Let's just make sure we coordinate on when the change happens. Perhaps you could add a

"version": 2

attribute to the result, so client code would know which flavor it was parsing. Define the "version" attribute as defaulting to 1 if missing. Then I could roll out code that handles either version, and after you've updated the server side code, I can clean it up at my leisure.

We typically don't add versions to the response like this. Can you detect the absence of the extra "entries" key to switch parsing behavior?

My understanding is the only way this API can be used is if you have an OAuth consumer key with the "Access checkuser data" attribute. Shouldn't it be trivial to query the database of consumer keys that have been issued to find the ones with that attribute set?

No, people could use it with a BotPassword grant (also tracked in the DB), or client-side JS requests just like any other API module.

OK, that makes sense, thanks.

TheresNoTime changed the task status from Open to Stalled.Jun 16 2022, 7:33 PM

Updating status to stalled to reflect actual state — to unstall we need to address the comments made at T106282#7981660 (namely, has enough notice been given to make this breaking change?)

Change 791821 abandoned by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] Standardise CheckUserLog API response

Reason:

Discussion stalled - unlikely to restart soon - no need to keep the patch WIP as it's a very small change which can be remade easily if needed.

https://gerrit.wikimedia.org/r/791821