Log source port for anonymous users and expose it for sysops/checkusers
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	eranroz
	Nov 26 2017, 7:19 PM

Description

Some ISPs deploy carrier grade NAT designs, making use of the port in addition to the IP to resolve the client connection.
With this design when sysops report to ISPs on trolls and vandals and providing the specific IP and the time of edit, it isn't always enough to determine the specific client - hence it is important to also provide the source port in such cases.

It would be nice to log the source port of the client ($_SERVER['REMOTE_PORT'] ?), so some privileged users (sysops or checkusers) could access it later if needed for reporting for ISPs.

Details

	Subject	Repo	Branch	Lines +/-
	varnish: include X-Client-Port in X-Analytics	operations/puppet	production	+6 -0
	varnish: include X-Client-Port in X-Analytics	operations/puppet	production	+11 -6

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		None	T181368 Log source port for anonymous users and expose it for sysops/checkusers
					Restricted Task

Event Timeline

eranroz created this task.Nov 26 2017, 7:19 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 26 2017, 7:20 PM

Framawiki subscribed.Nov 26 2017, 7:21 PM

Huji subscribed.Nov 26 2017, 7:29 PM

<Krenair> I would expect $_SERVER['REMOTE_PORT'] to be useless inside WMF infrastructure
<Krenair> I would not expect nginx and varnish to use the same source port as the client

Restricted Application added a project: SRE. · View Herald TranscriptNov 26 2017, 7:29 PM

• ema triaged this task as Medium priority.Nov 27 2017, 8:33 AM

In T181368#3787516, @Krenair wrote:

<Krenair> I would expect $_SERVER['REMOTE_PORT'] to be useless inside WMF infrastructure

Yeah I'm also not really sure if adding the client source port to the current list of fields available in webrequests would help much. What type of questions could we answer if we had such field that we cannot now? Analytics, any input?

<Krenair> I would not expect nginx and varnish to use the same source port as the client

The source port as seen by nginx is indeed the remote end's source port. We could thus simply map
$remote_port to a new header as follows, and then add the relevant varnishkafka config:

proxy_set_header X-Real-Port $remote_port

• ema moved this task from Backlog to Caching on the Traffic board.Nov 27 2017, 8:47 AM

In T181368#3787869, @ema wrote:

In T181368#3787516, @Krenair wrote:

<Krenair> I would expect $_SERVER['REMOTE_PORT'] to be useless inside WMF infrastructure

Yeah I'm also not really sure if adding the client source port to the current list of fields available in webrequests would help much. What type of questions could we answer if we had such field that we cannot now? Analytics, any input?

The request here is to make the source port available to MediaWiki's CheckUser extension, so that trusted users can look it up when filing abuse reports with ISPs, that need to know the source port to identify the source of the abuse.

Per @Legoktm this has nothing to do with Analytics as far as I am aware.

In T181368#3787869, @ema wrote:
<Krenair> I would not expect nginx and varnish to use the same source port as the client

The source port as seen by nginx is indeed the remote end's source port. We could thus simply map
$remote_port to a new header as follows, and then add the relevant varnishkafka config:
proxy_set_header X-Real-Port $remote_port

Yeah, so nginx will see the actual remote source port, then use it to set an X-Real-Port header which I guess Varnish will simply pass on to MW servers.

I did a bit of searching around and found some people talking about X-Forwarded-Port, but it sounds like they are using it for the destination port rather than source port - https://mattrobenolt.com/handle-x-forwarded-port-header-in-django/
Is there a chance the outermost layer of LVS might screw up anything in this plan?

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 9:36 PM

Is this still desirable for checkusers? Infrastructure has changed since then and is still-changing, but we could probably find a way to pass the data along in a header.

It is desirable when there are trolls using ISPs which use CGN (maybe other cases) - I think this is quite rare case - but when it is required it's important to have that.

For wikimedia wikis: I'm not familiar enough with the infra regarding that (and how much effort it requires)- but I guess it may be good to have it logged at least somewhere (MW database accessible to checkuser extension or some logs - available for ops and analytics), so when needed can get that data.

as for the right logs and whether or not to keep it logged in MW database - this may involve privacy/legal/bureaucratic aspect. My guess the easiest would be to log it somewhere, and analytics can give this data to checkusers upon request - and if this became a common request (very unlikely) we can explore how to do the extra mile of exposing it to checkusers directly to the extension.

• Niharika subscribed.Sep 25 2020, 9:20 PM

In T181368#6488698, @BBlack wrote:

Is this still desirable for checkusers? Infrastructure has changed since then and is still-changing, but we could probably find a way to pass the data along in a header.

It is indeed desirable. Exactly today I attempted to disclose an IP data (after WMF approval) to an ISP, however, they said that IPs are not enough and source ports are also needed.

This would have typically helped me to allow an ISP to identify a long-term abuser (LTA) today.

NickK awarded a token.Sep 29 2020, 2:41 PM

L235 subscribed.Sep 29 2020, 4:09 PM

stwalkerster subscribed.Sep 29 2020, 4:52 PM

Tks4Fish subscribed.Oct 2 2020, 8:24 PM

jrbs added a subtask: Restricted Task.Oct 15 2020, 11:29 PM

Huji added a comment.Oct 16 2020, 2:35 PM

This comment was removed by Huji.

While I understand how this can be helpful when reporting abusers to ISPs, this use case is narrow and uncommon. If we decide to add this to CU logs, we should certainly not show it in typical CU results; it would clutter the interface.

However, I'm not sure if this even should be in every CU's hands ever. The use case (of reporting an abuser to their ISP) is something that I think WMF should handle, not volunteer CUs. So even if we decide that CU logs are the most appropriate place to store this data, it should not be shown in any of the CU interfaces. Only those with shell access (through a direct query of the DB) should be able to pull this, and in my opinion, volunteers with shell access should be expected not to take care of such requests either. If we do want to have a separate web-based view in CU that exposed this data, it should be restricted through another permission setting that is false for CUs and only true for certain WMF-employee users.

Lastly, if what I just wrote is agreeable, that begs the question: is the CU logs really the best place to store this at all? If varnish/nginx already keeps a log of all requests, could that be matched by the timestamp to the CU logs and port data be extracted on demand? Why store something in two places, if it is stored in one place and can be queried with a reasonable amount of effort?

If T265692 ends up being easy to do, that supports my last point above.

Ladsgroup subscribed.Nov 1 2020, 3:06 AM

I think this shouldn't go in mw side of things, it should be part of the analytics data lake (webrequest hadoop table for example).

JAllemandou mentioned this in T271953: Add client TCP source port to webrequest.Jan 13 2021, 3:32 PM

I'm inclined to close this as declined in favor of T271953: Add client TCP source port to webrequest which basically gives people who have access to hadoop to be able to see the source port. In case it's needed for reporting, let people know and they can get it for you. I assume this happen pretty rarely.

taavi subscribed.Jan 20 2021, 9:56 AM

Change 657416 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] varnish: include X-Client-Port in X-Analytics

https://gerrit.wikimedia.org/r/657416

gerritbot added a project: Patch-For-Review.Jan 20 2021, 9:42 PM

Change 657416 abandoned by Effie Mouzeli:
[operations/puppet@production] varnish: include X-Client-Port in X-Analytics

Reason:
rebase probs

https://gerrit.wikimedia.org/r/657416

Change 658567 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] varnish: include X-Client-Port in X-Analytics

https://gerrit.wikimedia.org/r/658567

Framawiki unsubscribed.Jan 26 2021, 10:37 PM

Change 658567 merged by Vgutierrez:
[operations/puppet@production] varnish: include X-Client-Port in X-Analytics

https://gerrit.wikimedia.org/r/658567

Base subscribed.Feb 4 2021, 12:31 PM

This is half resolved/half declined. The data is now available in the data lake and can be disclosed by people with access if needed. But we shouldn't add this information to mediawiki which can identify users even easier to CUs and admins unless there's a strong benefit from it (which I can't see)

Maintenance_bot removed a project: Patch-For-Review.Mar 21 2021, 12:10 PM

Urbanecm closed subtask Restricted Task as Declined.Jan 23 2023, 8:00 PM

Log source port for anonymous users and expose it for sysops/checkusersClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Log source port for anonymous users and expose it for sysops/checkusers
Closed, ResolvedPublic
Actions

Related Objects
Search...