Page MenuHomePhabricator

Display ipinfo log lines on Special:Log
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

In T263756: Create a table to store which users have access to IPInfo, and the timestamp when access was granted [L], Anti-Harassment decided to log the desired log lines to Special:Log. T292842: Log when the user's access level changes, T294657: Create a log entry when the LogHandler is called, and T294658: Create a log entry when the RevisionHandler is called cover logging those log lines. This task covers displaying those log lines on Special:Log.

NOTE: Logging to Special:Log can be implemented independently of displaying log lines on Special:Log correctly.
AC
  • The ipinfo log can be viewed on Special:Log
  • The ipinfo log is only visible to users with the ipinfo-view-log right (will be assigned to WMF staff and CheckUser users in a separate task)
  • Log lines in the ipinfo log are formatted as follows:
ActionExample
User A viewed information about 1.1.1.1 with basic access via the popupUser:A viewed limited IP Information popup for 1.1.1.1 at $time
User B viewed information about 2.2.2.2 with full access via the popupUser:B viewed full IP Information popup for 2.2.2.2 at $time
User C viewed information about 3.3.3.3 with basic access via the infoboxUser:C viewed limited IP Information infobox for 3.3.3.3 at $time
User D viewed information about 4.4.4.4 with full access via the infoboxUser:D viewed full IP Information infobox for 4.4.4.4 at $time
  • No log entry appears if the user had the ipinfo right but no rights to see any information
Notes

Related Objects

Event Timeline

Change 736267 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/IPInfo@master] WIP: Add IPInfo log to Special:Log

https://gerrit.wikimedia.org/r/736267

Change 736267 abandoned by Phuedx:

[mediawiki/extensions/IPInfo@master] WIP: Add IPInfo log to Special:Log

Reason:

This patch is just a demonstration.

https://gerrit.wikimedia.org/r/736267

@phuedx does this task also covers gating access to the log lines? We should not display these logs to everyone. They will be limited to T&S (staff) and checkusers to begin with.

@phuedx does this task also covers gating access to the log lines? We should not display these logs to everyone. They will be limited to T&S (staff) and checkusers to begin with.

Thanks. I've updated the task's description.

@phuedx and @Niharika: shouldn't this be limited to Staff, CU, Stewards and OmbCom members? The reasoning being the possibility of audit by the last two, that already have access to the CU log on all wikis.

@phuedx and @Niharika: shouldn't this be limited to Staff, CU, Stewards and OmbCom members? The reasoning being the possibility of audit by the last two, that already have access to the CU log on all wikis.

@Niharika What are your thoughts on this?

User:A viewed limited IP Information popup for 1.1.1.1 at $time

@Niharika Can we remove at $time since the timestamp is at the start of the log row anyway?

To specify full vs limited information, we'll need to update the Logger, since we don't currently record this. I've made a subtask for this: T300773: Log whether a user views limited or full IP information

  • The ipinfo log is only visible to WMF staff and CheckUser users

@Niharika Instead of making these logs visible to WMF staff, can we make them visible to the staff group? Otherwise we run into the difficulties outlined in T277223: Add a new user-group for WMF staff accounts. Judging from T295017#7515763 this seems OK?

I've clarified in the task description that this should be done via assigning a new right, (e.g. ipinfo-view-log) to the relevant groups. We'll need to do the assignation in a separate task (in the config repo), which I'll make once we know which groups we're assigning it to.

@Niharika Another question - sorry!

User A viewed information about 1.1.1.1 with basic access via the popupUser:A viewed limited IP Information popup for 1.1.1.1 at $time
User B viewed information about 2.2.2.2 with full access via the popupUser:B viewed full IP Information popup for 2.2.2.2 at $time

There's another possibility: in T292626 it was decided that the ipinfo-view-basic right be created explicitly, rather than just letting users with the ipinfo right see basic information. Therefore, it is possible for users to have ipinfo but neither ipinfo-view-basic nor ipinfo-view-full. These users can see the popup and infobox, but will see no data (it will all appear unavailable). In these cases, presumably there is no need to make a log entry? Otherwise, if we should make a log entry, what should it say?

Change 759563 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/IPInfo@master] Display access log lines on Special:Log

https://gerrit.wikimedia.org/r/759563

@Niharika @Prtksxna I have a question about translations. The task description asks for two possible log lines, depending on the user's access level. E.g. for the popup:

User:A viewed limited IP Information popup for 1.1.1.1

or:

User:B viewed full IP Information popup for 2.2.2.2

This can be done by inserting a message into the log line's message translation, e.g.:

User:A viewed {ipinfo-log-access-basic} IP Information popup for 1.1.1.1

...but the placement of that message is quite specific to English. Could we instead add something in at the end, so that it would work for any language? E.g.:

User:A viewed IP Information popup for 1.1.1.1. {ipinfo-log-access-basic}
User:A viewed IP Information popup for 1.1.1.1. Basic information viewed.

@Niharika @Prtksxna I have a question about translations. The task description asks for two possible log lines, depending on the user's access level. E.g. for the popup:

User:A viewed limited IP Information popup for 1.1.1.1

or:

User:B viewed full IP Information popup for 2.2.2.2

This can be done by inserting a message into the log line's message translation, e.g.:

User:A viewed {ipinfo-log-access-basic} IP Information popup for 1.1.1.1

...but the placement of that message is quite specific to English. Could we instead add something in at the end, so that it would work for any language? E.g.:

User:A viewed IP Information popup for 1.1.1.1. {ipinfo-log-access-basic}
User:A viewed IP Information popup for 1.1.1.1. Basic information viewed.

I am not an expert and I don't have a opinion on this but an alternative approach would to create a message per desired log line format. The obvious con is that we repeat ourselves but a pro is that each message can be translated as a whole in isolation.

Summarising a conversation with @Niharika:

@Niharika Can we remove at $time since the timestamp is at the start of the log row anyway?

@Niharika Instead of making these logs visible to WMF staff, can we make them visible to the staff group? Otherwise we run into the difficulties outlined in T277223: Add a new user-group for WMF staff accounts. Judging from T295017#7515763 this seems OK?

There's another possibility: in T292626 it was decided that the ipinfo-view-basic right be created explicitly, rather than just letting users with the ipinfo right see basic information. Therefore, it is possible for users to have ipinfo but neither ipinfo-view-basic nor ipinfo-view-full. These users can see the popup and infobox, but will see no data (it will all appear unavailable). In these cases, presumably there is no need to make a log entry?

Could we instead add something in at the end, so that it would work for any language? E.g.:
User:A viewed IP Information popup for 1.1.1.1. {ipinfo-log-access-basic}

Niharika agreed to these.

I am not an expert and I don't have a opinion on this but an alternative approach would to create a message per desired log line format. The obvious con is that we repeat ourselves but a pro is that each message can be translated as a whole in isolation.

@phuedx Thanks for outlining this approach - it would be doable by overriding getMessageKey. I think I marginally prefer the separate message for the access level, in case more access levels (or UI elements) are added - then we only need to add messages for each access level, rather than each access level x each UI element. (It might be worth mentioning that both approaches are used in the BlockLogFormatter when dealing with log parameters - a whole different message key for the log line is chosen depending on the partal/sitewide parameter, but separate sub-messages are used for the other parameters.)

Could we instead add something in at the end, so that it would work for any language? E.g.:
User:A viewed IP Information popup for 1.1.1.1. {ipinfo-log-access-basic}

There was a question about copy for the log lines on T292842, so I've made a follow-up task to sort these questions out: T302000: Fix messages for Special:Log/ipinfo log lines

Change 759563 merged by STran:

[mediawiki/extensions/IPInfo@master] Display access log lines on Special:Log

https://gerrit.wikimedia.org/r/759563

I was able to view the following logs

Screen Shot 2022-02-21 at 5.02.52 PM.png (378×2 px, 177 KB)

Image shows IP logs for LogHandler and RevisionHandler events
User is set up with ipinfo-view-log rights
Screen Shot 2022-02-22 at 10.49.49 PM.png (1×2 px, 402 KB)

Just a couple of questions for @Niharika:

One
When a user requests IP information, the log entry is written to the database with a deferred job. The timestamp for the log entry is the time the deferred job completed, not the time the user requested the IP information.

I don't know how big a possible delay there might be between the user making the request and the job completing (especially on production servers during a busy time). Is this a problem? How accurate does the time need to be? To be fair, Special:Log only shows times to the nearest minute.

Two
There might be scenarios where a user requests information for the same IP more than once in a 48 hour period, but only one log event is recorded.

For example, this might happen when the browser caches the IPInfo response as described in the diagram:

sequence_diagram_logging_scenario_1.png (785×1 px, 42 KB)

How important is it to have a new entry every 24 hours?

Question for @Tchanders:

The Logger checks a user's rights and won't log anything if they don't have either ipinfo-view-basic or ipinfo-view-full.

I understand that user rights can be cached. Is it possible that Logger would see an outdated set of rights for a user? For example, if they have only just been added to the ipinfo group the Logger might think they have no rights and so not log anything. Or, by the time we have got to logging we have already looked up the user's rights (e.g. in the RevisionHandler) and so it should return the same set of rights both times?

I have briefly tried to reproduce this locally, but no success. But, I don't know how things stand on production regarding consistency across servers, replica lag, etc.

EDIT: I also worry if someone makes a change to the code which breaks the authorisation logic in LogHander or RevisionHandler and a user sees IPInfo who does not have the ipinfo-* rights, this won't be logged and we won't know.

Just a couple of questions for @Niharika:

One
When a user requests IP information, the log entry is written to the database with a deferred job. The timestamp for the log entry is the time the deferred job completed, not the time the user requested the IP information.

I don't know how big a possible delay there might be between the user making the request and the job completing (especially on production servers during a busy time). Is this a problem? How accurate does the time need to be? To be fair, Special:Log only shows times to the nearest minute.

If the delay is fairly small (say less than a minute), which I think is usually the case, then I don't think it matters much.

Two
There might be scenarios where a user requests information for the same IP more than once in a 48 hour period, but only one log event is recorded.

For example, this might happen when the browser caches the IPInfo response as described in the diagram:

sequence_diagram_logging_scenario_1.png (785×1 px, 42 KB)

How important is it to have a new entry every 24 hours?

Wow, thanks for making this Dom. Super useful! I will need to check in with Legal on this one. I'll share this with them.

@imaigwilo I am going to move this back into QA as there are still some questions outstanding and there might be a bit more testing to do on this.

I did some testing around key risks (as I see them):

Accessing ipinfo does not get logged

I have had scripts making loads of requests to the IPInfo API and so far I have not noticed any which were not logged, unless it would be a duplicate of a log entry already made based on performer, IP, context, right and time (as per T294658).

I found the logic around when a log entry is made a little complicated, so it is possible I have missed something. I might do more testing later.

As mentioned in T295017#7729296, I don't think we can guarantee one log entry per IP accessed in a 24 hour period. However, I think we should have at least one log entry per IP accessed in a 48 hour period.

As mentioned in T295017#7729314, we only record a log entry if the user has either ipinfo-view-basic or ipinfo-view-full. If a user has somehow managed to see ip information without those two rights (or if the logging code does not think they have those two rights), it won't be recorded.

I haven't tested how this will perform in production (where there might be higher load).

Access is logged when it should not be (i.e. someone who has not accessed ip information nevertheless has it logged that they have)

We only submit the logging job just before returning data to the user. Therefore, if anything prevents us returning data to the user it will also prevent the job being submitted.

I guess one exception might be if there is a network issue preventing the user receiving the response, but I don't think we have much control over this.

Or possibly the user receives the data but a UI bug prevents the user actually seeing it. But I don't think we can tell whether this has happened.

Log leaks more information than it should

We only record a limited amount of data in the log and (the plan is) only a small number of people will see it.

A user with suppressor rights can always change the visibility of a log entry if it is troublesome.

In the logging table we record:

  • user who looked up the info
  • IP they looked up
  • whether it was popup or infobox
  • what right they had when they did it
  • timestamp

I also noticed that if you don't have ipinfo-view-log you cannot view ip information for an ipinfo log, and therefore cannot see what the IP address associated with that log is. I didn't test this very systematically though. I will do this separately.

People have access to the logs who should not

We have setup access control logic in a standard way (using extension.json). I did briefly check that a user who does not have ipinfo-view-log could not view ipinfo log lines either via Special:Log or the API (action=query&list=logevents&letype=ipinfo).

Accuracy

So far I have not seen any inaccuracies in the data recorded in the logging table, with the exception noted above for the timestamp (T295017#7729296).

Could a deferred job fail?

We write to the logging table using a deferred job and we return the response to the user before this job has completed. I guess it is possible that a user could see ip information even if the job subsequently fails.

If we made the write synchronous on the request I guess we could make it so that if writing to the logging table fails the whole request fails, but I think this violates certain performance guidelines regarding performing database writes on GET requests.

I don't know much about the reliability of deferred jobs.

@imaigwilo let me know if anything you saw contradicts anything I have written above.

@Niharika let me know if there are any other risks you are concerned about and would like further testing on.

I will leave this in QA for now to give @Tchanders a chance to respond to T295017#7729314.

Test environment: Local docker MediaWiki 1.38.0-alpha (4a373f6) 07:23, 24 February 2022. IP Info 0.0.0 (d851112) 19:55, 23 February 2022. (latest version tested)

Thanks for raising these questions @dom_walden

Just a couple of questions for @Niharika:

One
When a user requests IP information, the log entry is written to the database with a deferred job. The timestamp for the log entry is the time the deferred job completed, not the time the user requested the IP information.

I don't know how big a possible delay there might be between the user making the request and the job completing (especially on production servers during a busy time). Is this a problem? How accurate does the time need to be? To be fair, Special:Log only shows times to the nearest minute.

If the delay is fairly small (say less than a minute), which I think is usually the case, then I don't think it matters much.

I think it would be a good idea to pass the timestamp into the job, so that if we are going slowly at any point, we still see the original time. I've raised T302501 for this - @Niharika would you mind taking a look?

Two
There might be scenarios where a user requests information for the same IP more than once in a 48 hour period, but only one log event is recorded.

For example, this might happen when the browser caches the IPInfo response as described in the diagram:

sequence_diagram_logging_scenario_1.png (785×1 px, 42 KB)

How important is it to have a new entry every 24 hours?

Wow, thanks for making this Dom. Super useful! I will need to check in with Legal on this one. I'll share this with them.

@Niharika In case it helps with Legal, I just wanted to highlight that when the user sees a cached response, they will see the same version of the info as they saw when the log was recorded, a little bit like the user looking again later at a screenshot they took when they initially saw the data.

Question for @Tchanders:

The Logger checks a user's rights and won't log anything if they don't have either ipinfo-view-basic or ipinfo-view-full.

I understand that user rights can be cached. Is it possible that Logger would see an outdated set of rights for a user? For example, if they have only just been added to the ipinfo group the Logger might think they have no rights and so not log anything. Or, by the time we have got to logging we have already looked up the user's rights (e.g. in the RevisionHandler) and so it should return the same set of rights both times?

I have briefly tried to reproduce this locally, but no success. But, I don't know how things stand on production regarding consistency across servers, replica lag, etc.

EDIT: I also worry if someone makes a change to the code which breaks the authorisation logic in LogHander or RevisionHandler and a user sees IPInfo who does not have the ipinfo-* rights, this won't be logged and we won't know.

This is a really good point. I think we should pass the user's rights to the job to ensure they're the same, along with the timestamp. I've included this in T302501, mentioned above.

@Niharika In case it helps with Legal, I just wanted to highlight that when the user sees a cached response, they will see the same version of the info as they saw when the log was recorded, a little bit like the user looking again later at a screenshot they took when they initially saw the data.

Thanks @Tchanders. I'll clarify this when I talk to Legal.

I don't think there is anything more outstanding. Moving into Done.

@phuedx and @Niharika: shouldn't this be limited to Staff, CU, Stewards and OmbCom members? The reasoning being the possibility of audit by the last two, that already have access to the CU log on all wikis.

@Tks4Fish do you know the process for handing out the access to OmbCom members? Is it supposed to be granted manually by someone?

Since Legal has given the OK, it can be added to the global group by the stewards.