Page MenuHomePhabricator

Add proxies, tunnels API endpoints to ipoid service
Closed, ResolvedPublic

Description

We want to be able to feed IP reputation data to our CDN caches. We're currently importing the data directly from the feed, but once iPoid is up and running, we'd like to be able to get the data from it instead of implementing a second feed import mechanism.

We would need the following endpoints:

  • /proxies returning a {LABEL: [ip1, ip2,...], LABEL2: ...} json map
  • /proxies/LABEL returning only the list of IPs
  • /vpns returning a similar map to the one returned by proxies, but for vpn tunnels
  • /vpns/LABEL again more of the above.

I would suggest we don't set up pagination from the start, as it shouldn't be needed IMHO.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add proxies, tunnels API endpoints to ipoid servicerepos/mediawiki/services/ipoid!49tsepothoabalaT344273-add-endpointsmain
Customize query in GitLab

Event Timeline

Joe renamed this task from Add proxies API endpoint to ipoid service to Add proxies, tunnels API endpoints to ipoid service.Aug 16 2023, 8:46 AM
Joe updated the task description. (Show Details)

I have created a separate ticket to write tests on T345747

@TThoabala I am finding the requests to /vpns and /vpn/:label can be very slow.

I imported part of the Spur data, about 600000 rows in actor_data. Because the tunnels table did not have anything in the type column (see T346634), I set it to VPN in every row. This might not be realistic.

Calls to /vpn/:label can take anywhere from a few seconds to 30 minutes.

For example, /vpn/ABCPROXY_PROXY takes about 10 minutes (I ran the request in both curl and querying the database directly). It only returns 139 IPs.

/vpn/LUMINATI_PROXY took about 30 minutes and returned ~1300 IPs.

The call to /vpns took over 10 minutes before I cancelled it.

This is running on docker.

The :label in /vpn/:label refers to a proxy (from the proxies table) not to a tunnel operator (operator column in the tunnels table). Is this right? What if an IP has multiple proxies but one tunnel? Is it assumed the tunnel operator is running all the proxies? What about multiple tunnels but one proxy?

With a large amount of data, sometimes the numbers of IPs returned for a particular proxy is different depending whether you call /proxies or /proxy/:label.

For example, importing the attached data, the number of IPs for the LUMINATI_PROXY is 74232 for /proxies but is 99858 for /proxy/LUMINATI_PROXY.

I haven't tested yet whether a similar thing happens for vpns.

{F37741556}

The results of /vpn/:label and /vpns sometimes include duplicate IP addresses.

For example, with the test data from T344273#9181633, /vpn/LUMINATI_PROXY produces duplicates for 5 IPs (I won't mention which they are here). The same duplicate IPs appear for that proxy in /vpns.

Further testing of the accuracy of the data returned by the new endpoints is hard due to T346643 (meaning I cannot reliably import data to test), the performance issues in T344273#9177308 and (to a lesser extent) T346634. Moving to blocked just for now.

Aklapper added a subscriber: TThoabala.

Removing inactive task assignee (please do so as part of offboarding steps).

Further testing of the accuracy of the data returned by the new endpoints is hard due to T346643 (meaning I cannot reliably import data to test),

This was fixed (a duplicate task was filed and cited in the commit message, so I've merged the original in as a duplicate).

the performance issues in T344273#9177308

These should hopefully be fixed by that patches to T345156: DBA audit of ipoid database

and (to a lesser extent) T346634. Moving to blocked just for now.

This was half fixed, and the other half is up for discussion (see comment on the task for more info).

@dom_walden The actual issues you found via testing haven't been fixed yet. Do you think we should wait for those to be fixed before further QA?

@dom_walden The actual issues you found via testing haven't been fixed yet. Do you think we should wait for those to be fixed before further QA?

Thanks. I will raise the remaining bugs separately and move this back into the QA column.

With a large amount of data, sometimes the numbers of IPs returned for a particular proxy is different depending whether you call /proxies or /proxy/:label.

Raised as T348745.

So far, the only bug I have found is T348745. More testing will be done after the fix for that.

But, otherwise, the IPs being returned from the API have been accurate.

I note that:

  • /proxies and /proxy/:label returns information for client.proxies
  • /vpns and /vpn/:label returns information for tunnels of type VPN

We don't do anything with tunnels of type PROXY.

I also don't know if these API endpoints need documenting more publicly than in code comments. Perhaps if they are only used by SRE it does not matter.

Testing script: https://gitlab.wikimedia.org/dwalden/ipmasking-testing/-/blob/main/ipoid_api_accuracy.py

STran claimed this task.