Page MenuHomePhabricator

CheckUser Api Module
Closed, ResolvedPublic

Description

Author: FT2.wiki

Description:
Mockup showing how the extra formats might appear.

Checkuser currently has one output format, namely HTML "pretty layout". Unfortunately in many CU cases, a Checkuser needs to export this data to another application. A single checkuser run can contain up to 5000 results, which there is no easy way to sort or filter. Examples include:

  • To export to a spreadsheet (eg Excel or Openoffice Calc) to allow sorting, filtering, and further processing on long reports
  • To combine multiple CU run's (eg on possibly connected users or IPs) that normally cannot be shown on the Checkuser output together - eg to list 5 users' activities over the last while, in order of IP. (Some complex checks can need dozens of runs to acuqire all data, and that data cannot be combined within the confines of the extension or easily sorted and correlated.)
  • To combine with existing checkuser runs, to correlate with past reports and look for significant items.

There is no means to do this except "HTML scraping", nor any CheckUser API interface. These are tremendously useful CU functions, and formats more amenable to exporting would greatly help checkusers on all projects.

Requested - that an "output option" is added to the CU extension, that allows a user to switch between normal (pretty HTML), sortable wikitable, and XML, output. (Tabular output can usually be copy/pasted directly into relevant applications if needed.)

Note that this isn't a change to the database usage, but purely an option to choose other useful output formats for the results. The idea would be that all three views are available on separate tabs, saving repeat runs from being needed.


Version: unspecified
Severity: enhancement

Attached:

cu_formats.png (1×1 px, 50 KB)

Details

Reference
bz15129

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:18 PM
bzimport added a project: CheckUser.
bzimport set Reference to bz15129.

I'd rather combined/filtered reports be added in the core, rather than using external software. Also, I don't want to encourage people to store CU data, since they have to remember to delete it as it would normally expire. Data is stored temporarily for privacy policy reasons, which should also carry over to individual users.

FT2.wiki wrote:

Sorry to reopen, but then - how do we do these kinds of complex cases? I have had cases routinely requiring 50 - 200 checks to cover it. Checkuser is a tool, it needs to meet the task being done.

(In reply to comment #2)

Sorry to reopen, but then - how do we do these kinds of complex cases? I have
had cases routinely requiring 50 - 200 checks to cover it. Checkuser is a tool,
it needs to meet the task being done.

Features should be added to the internal software, rather than relying on excel.

mike.lifeguard+bugs wrote:

Could you suggest what those features might be? I cannot immediately think of another output format that would solve this issue.

To provide sorting/filtering, an "output format" is not enough. Whatever filtering is used with these files, which hasn't been specified but must exists, should be added to the checkuser tool itself.

Additionally, the 5000 limit can be increased with timeout limit tweaks if really needed.

Why not just write an API module that interfaces with Checkuser?

FT2.wiki wrote:

Yes.

Yes, yes, yes, yes, yes :)

The request for output formats is more because whatever is done in software, most checkusers will not be using API programs or unusual configurations. They need basic tabular and sortable format, because unlike editing, checkuser work is pure data manipulation (and lots of it) and mediawiki will never provide all that's needed.

So a way to export data to a program or format that will, is needed.

FT2.wiki wrote:

What about, above the data you get a few icons to click, that let you download or open the data in 2 or 3 common formats.

mike.lifeguard+bugs wrote:

Changing description etc per above; let's have an API module.

soxred93 wrote:

I've started to work on this. So far, I have the API base file set up, and now I have to calculate the checkuser results.

soxred93 wrote:

Patch to add CheckUser API module

I have written some bare code enabling this functionality (in other words, it needs review).

attachment cudiff.patch ignored as obsolete

soxred93 wrote:

Fix copyright

Because Reedy wanted me to. :)

attachment cudiff.patch ignored as obsolete

Bryan.TongMinh wrote:

A few general comments on this:

  • You are basically duplicating the CheckUser interface. This is bad because changes in one part of the CheckUser code will not affect other parts. Code duplication should be avoided when possible.
  • You are constructing raw sql, whereas using the wrapper functions is strongly recommended
  • Limits are hardcoded
  • XFF is added as /xff. In the api you will simply want to use a boolean xff parameter

(In reply to comment #11)

Created an attachment (id=5676) [details]
Patch to add CheckUser API module

I have written some bare code enabling this functionality (in other words, it
needs review).

Here you are:

  • By convention, class names of API modules start with 'API', so use e.g. ApiCheckUser as class name (and hence ApiCheckUser.php as filename)
  • 'cantcheckuser' message: "don't" misspelled as "dont"
  • Parameters:
    • Use ApiBase::PARAM_TYPE => 'user' for the user parameter (accepts IPs too) so you won't have to do username normalization
    • Use ApiBase::PARAM_TYPE => array('allowed', 'values', 'here') for the type parameter
    • If the duration parameter is a timestamp (not clear to me), use ApiBase::PARAM_TYPE => 'timestamp'
    • If it's a number of days, use ApiBase::PARAM_TYPE => 'integer' and set ApiBase::PARAM_MIN and ApiBase::PARAM_MAX to appropriate values
    • Use ApiBase::PARAM_DFLT => '' for the reason parameter rather than doing if(!isset($params['reason'])) $reason = '';
  • You're using regexes to check for IPv4 and IPv6 IP's while you shouldn't: the /xff thing should go (like Bryan said) and the rest will be handled for you when you change the user parameter to be of type 'user'

(In reply to comment #13)

A few general comments on this:

  • You are basically duplicating the CheckUser interface. This is bad because

changes in one part of the CheckUser code will not affect other parts. Code
duplication should be avoided when possible.

Bryan's right. You should call existing functions in CheckUser, not copy them. If CheckUser doesn't have nice functions for your purposes, create them and/or adapt existing ones to be nice to you.

  • You are constructing raw sql, whereas using the wrapper functions is strongly

recommended

Eek! Looks like that's part of the code copied from CheckUser, though.

  • Limits are hardcoded

Your module should accept a 'limit' parameter:

'limit' => array (
                         ApiBase :: PARAM_DFLT => 10,
                         ApiBase :: PARAM_TYPE => 'limit',
                         ApiBase :: PARAM_MIN => 1,
                         ApiBase :: PARAM_MAX => ApiBase :: LIMIT_BIG1,
                         ApiBase :: PARAM_MAX2 => ApiBase :: LIMIT_BIG2
                 )
  • XFF is added as /xff. In the api you will simply want to use a boolean xff

parameter

'xff' => false will do that. $params['xff'] will be a simple boolean then, so no need for isset($params['xff']) checks.

soxred93 wrote:

(In reply to comment #13)

A few general comments on this:

  • You are basically duplicating the CheckUser interface. This is bad because

changes in one part of the CheckUser code will not affect other parts. Code
duplication should be avoided when possible.

Bryan's right. You should call existing functions in CheckUser, not copy them.
If CheckUser doesn't have nice functions for your purposes, create them and/or
adapt existing ones to be nice to you.

All the functions in the CheckUser class are protected functions, and I can't have the code output them. In addition, they output pure HTML (which is bad in its own right).

The entire CheckUser code could use a rewrite.

(In reply to comment #15)

(In reply to comment #13)

A few general comments on this:

  • You are basically duplicating the CheckUser interface. This is bad because

changes in one part of the CheckUser code will not affect other parts. Code
duplication should be avoided when possible.

Bryan's right. You should call existing functions in CheckUser, not copy them.
If CheckUser doesn't have nice functions for your purposes, create them and/or
adapt existing ones to be nice to you.

All the functions in the CheckUser class are protected functions, and I can't
have the code output them. In addition, they output pure HTML (which is bad in
its own right).

The entire CheckUser code could use a rewrite.

No doubt it could. But unless it's extremely API-unfriendly, it shouldn't be too hard to split off the HTML-outputting functions into a backend part (that does the database stuff) and a frontend part (that does the HTML generation), and then just call the backend part from the API module (of course that means you should make the backend functions public). That's the tactic I used when writing most of the write API.

Not quite sure this is still under Api, assigning to CentralAuth

soxred93 wrote:

For the curious, I've started work on this again. So far I've got most of the base functions, all of the classes laid out, and the API is successfully doing a User -> IP check.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

Is this patch/file any use?

What else needs doing?

Created attachment 8572
Patch updated, stylized

attachment bug 15129.patch ignored as obsolete

(In reply to comment #20)

Is this patch/file any use?

What else needs doing?

Looks like it needs a lot of code issues (raw sql) etc fixing up...

Created attachment 8573
Non raw sql version!

attachment bug 15129.patch ignored as obsolete

Committed the API module in r88704

Comment on attachment 8573
Non raw sql version!

Obsolete as it's supplied

Bryan.TongMinh wrote:

Except for the second one, my comments in #13 were not addressed:

A few general comments on this:

  • You are basically duplicating the CheckUser interface. This is bad because

changes in one part of the CheckUser code will not affect other parts. Code
duplication should be avoided when possible.

  • You are constructing raw sql, whereas using the wrapper functions is strongly

recommended

  • Limits are hardcoded
  • XFF is added as /xff. In the api you will simply want to use a boolean xff

parameter

Especially with a high sensitivity module such as CU, I really do want to have a common backend for the special page and the API module. With all the code duplication the chances that are high that security vulnerabilities and other bugs that are found in the API module and fixed are not propagated to the special page and vice versa. Considering the sensitivity of CheckUser I do not think that this is acceptable.

soxred93 wrote:

The patch is pretty outdated, and I don't even think it works with the current version. I'm planning on finishing this during the summer, when I finally have an abundance of time to work on it.

Reverted the commit of the old patch in r88728; please clean it up and remove code duplication before recommitting.

(The module was disabled already in r88719.)

Bryan.TongMinh wrote:

*** Bug 31723 has been marked as a duplicate of this bug. ***

john wrote:

Solved in Bug 31723

*** This bug has been marked as a duplicate of bug 31723 ***