Page MenuHomePhabricator

Provide a way to search all of the XFF addresses listed in the XFF string
Open, MediumPublicFeature

Description

I recently ran some checks on multiple accounts in a enwiki SPI case which came back with:

IP: xxx.xxx.xxx.xxx XFF: xxx.xxx.xxx.xxx, 136.226.255.25, 10.132.0.109

The intermediate address (136.226.255.25) is listed by Shodan as a webhost (see https://bullseye.toolforge.org/ip/136.226.255.25) and is blocked as a zscaler open proxy (https://en.wikipedia.org/wiki/Special:Contributions/136.226.255.25).

Using checkuser, I was able to search for the exit node (xxx.xxx.xxx.xxx), but what I really wanted to do was search to see all the traffic that was coming through the same 136.226.255.25 proxy intermediary. It is my understanding that while the intermediate XXF address is stored in the CU database, it is not stored in a way which makes it searchable, per my email conversation with Dreamy Jazz:

The reason why that check returned no results is that because CheckUser only currently stores the IP for xxx.xxx.xxx.xxx as a hex. This means while you see the three XFF IPs, you can only search by the last XFF IP which is xxx.xxx.xxx.xxx. I think this also extends to be the last trusted XFF IP.

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedDreamy_Jazz
OpenNone
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedMarostegui
DeclinedNone
ResolvedPapaul
OpenNone
ResolvedMarostegui

Event Timeline

Dreamy_Jazz renamed this task from Provide a way to search by XFF addresses to Provide a way to search all of the XFF addresses listed in the XFF string.Aug 22 2022, 7:34 PM
Dreamy_Jazz subscribed.

I think this is something that's needed based on the given use case. To achieve this would require a change in how the DB data is stored in the xff_hex column or a change to the DB to allow multiple XFF IPs. The solutions that I see are:

  • By some method store multiple XFF IPs in the xff_hex column. However, this makes searching by XFF a more computationally difficult task as the query will require a LIKE query. This would add SQL processing time especially as it would have to include wildcards on both sides. This could be done via a list separated by "|" characters like "|XFF hex 1|XFF hex 2|" with a LIKE query of "%|XFF hex 1|%".
  • Add extra columns to the DB to store up to certain number of XFF hexes. This would likely cover most cases of a couple of XFF hexes, but still limits and adds unnecessary columns in most cases.
  • Create a new table that stores the hexes for both IPs and XFF IPs (along with possibly other information such as the string representation) which is linked together via a many-to-many relationship to the cu_changes table. This would add two new tables but would move towards solving T305930 (as this would normalise the IP and XFF)

One issue with the third option is whether pruning the data would apply to the table with IPs and their hexes. This shouldn't contain any information unique to the client, but would have all the IPs used by all users. The many-to-many table would have to be also pruned but this would be a relatively easier task.