Page MenuHomePhabricator

Provide a way to search all of the XFF addresses listed in the XFF string
Open, LowPublicFeature

Description

I recently ran some checks on multiple accounts in a enwiki SPI case which came back with:

IP: xxx.xxx.xxx.xxx XFF: xxx.xxx.xxx.xxx, 136.226.255.25, 10.132.0.109

The intermediate address (136.226.255.25) is listed by Shodan as a webhost (see https://bullseye.toolforge.org/ip/136.226.255.25) and is blocked as a zscaler open proxy (https://en.wikipedia.org/wiki/Special:Contributions/136.226.255.25).

Using checkuser, I was able to search for the exit node (xxx.xxx.xxx.xxx), but what I really wanted to do was search to see all the traffic that was coming through the same 136.226.255.25 proxy intermediary. It is my understanding that while the intermediate XXF address is stored in the CU database, it is not stored in a way which makes it searchable, per my email conversation with Dreamy Jazz:

The reason why that check returned no results is that because CheckUser only currently stores the IP for xxx.xxx.xxx.xxx as a hex. This means while you see the three XFF IPs, you can only search by the last XFF IP which is xxx.xxx.xxx.xxx. I think this also extends to be the last trusted XFF IP.

Event Timeline

Dreamy_Jazz renamed this task from Provide a way to search by XFF addresses to Provide a way to search all of the XFF addresses listed in the XFF string.Aug 22 2022, 7:34 PM
Dreamy_Jazz subscribed.

I think this is something that's needed based on the given use case. To achieve this would require a change in how the DB data is stored in the xff_hex column or a change to the DB to allow multiple XFF IPs. The solutions that I see are:

  • By some method store multiple XFF IPs in the xff_hex column. However, this makes searching by XFF a more computationally difficult task as the query will require a LIKE query. This would add SQL processing time especially as it would have to include wildcards on both sides. This could be done via a list separated by "|" characters like "|XFF hex 1|XFF hex 2|" with a LIKE query of "%|XFF hex 1|%".
  • Add extra columns to the DB to store up to certain number of XFF hexes. This would likely cover most cases of a couple of XFF hexes, but still limits and adds unnecessary columns in most cases.
  • Create a new table that stores the hexes for both IPs and XFF IPs (along with possibly other information such as the string representation) which is linked together via a many-to-many relationship to the cu_changes table. This would add two new tables but would move towards solving T305930 (as this would normalise the IP and XFF)

One issue with the third option is whether pruning the data would apply to the table with IPs and their hexes. This shouldn't contain any information unique to the client, but would have all the IPs used by all users. The many-to-many table would have to be also pruned but this would be a relatively easier task.

Dreamy_Jazz lowered the priority of this task from Medium to Low.Feb 27 2026, 2:03 PM

Re-prioritising this as it's been several years at Medium and I'm not sure there are plans to do this any time soon. If this is still needed and should be considered more a priority, then please do say.