Page MenuHomePhabricator

Maxmind: GeoIP Download Failed
Closed, ResolvedPublic

Description

noc@ got this email from Maxmind:

Dear MaxMind Customer,

There was a recent attempt to update your GeoIP Legacy Region Database database using the automated GeoIP Update program from the IP address 2620:0:861:1:208:80:154:32 located in United States. The update failed because the term covered by your most recent invoice for this product has ended. To renew the invoice, contact sales@maxmind.com. Alternatively, to switch to online purchasing through your account portal, contact support@maxmind.com.

If you have any questions, please contact us at support@maxmind.com.

Sincerely,
The Team at MaxMind

2620:0:861:1:208:80:154:32 is install1003, eqiad Squid proxy.

The full list of hosts querying Maxmind through the proxies can be looked up in the Squid access list.
install1003:/var/log/squid$ grep -i maxmind access.log | grep -o "client.ip[^,]*," (thanks @Volans for the query)

But if I remember correctly it's mostly used by Traffic, Analytics and Service Ops.

Opening this task as procurement as it seems contract/invoice related.

Event Timeline

ayounsi created this task.
ayounsi created this object in space Restricted Space.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I sent them an email asking them to confirm what time the event happened at, however the download job should only be active on puppetmaster1001

So i have looked into this a bit further and i noticed a number of things. We currently have two systems timers geoip_update_legacy.timer and geoip_update.timer. further there was an old cron job in /etc/cron.d/geoipupdate which i have now removed. Theses timers use different config files (/etc/GeoIP.conf & /etc/GeoIPInfo.conf) including different Edition IDs, account ID's and licence keys. (seems the later job was created as part of T288844)

the job that is failing is the legacy job and it is only failing for the when trying to fetch the edition id 115 ( GeoIPRegion.dat)

Received an unexpected HTTP status code of 403 from https://updates.maxmind.com/geoip/databases/115/update?db_md5=00000000000000000000000000000000:
Invalid product ID or subscription expired for 115

@Dzahn recently looked at the maxmind stuff so may now something useful

Ok seems confirmed, the issue is puppetmaster1001 not been able to download dataset 115 due to expired subscription. i think we just need to track down who needs to pay this bill

@jbond Confirmed the existence of 2 separate update scripts /timers. That was on purpose, to separate classic DBs from a much later task to add additional DBs, which were using a new license acquired by a different team. To avoid any issues with the legacy DBs I kept those separate. And we could NOT just download all of them together using the new license.

Deleting an old cron that is replaced by timers is correct though! that should not have been there. Thanks!

And regarding the actual issue in this ticket, yes, I think we got the warnings that it would expire (maybe T228533) and this is just a licensing issue for the "legacy account".

Problem is that it was tied to an individual many years ago and they are not working for WMF anymore. So, yes, we need to find someone to pay this bill, but it's not as easy as finding the previous bill payer I'm afraid.

@jbond It's about the file modules/secret/secrets/geoip/GeoIP.conf on puppetmaster1001. see the comment in there " UserId, LicenseKey, ProductIds from ngautam@wikimedia.org account" / UserId 55111. ngautam@ hasn't been here for a long time but that's the one we need to replace and it's independent of other newer tasks you might see that were about adding extra databases (not legacy).

So.. I think another fix here, maybe easier maybe not, could be if the Anti-Harrassment-Team (@phuedx but they can't see this ticket) would add the legacy databases to their newer license. See how they said "The team has recently purchased an extended license" on T288844. If they could add "ProductIds 106 133 115" to that license.. then we could download all DBs, legacy and extended, using the single newer license. Then we could also get rid of the 2 timers and reduce it to just 1. But I am not sure if that is possible and what new budget questions it may cause. I could contact their team directly but maybe this should have manager involvement?

I sent an email to Sam Smith asking if this is feasible, CCed @jbond and @cmooney .

@Dzahn thanks for the update i have also been speaking with max mind support and have a few other data points. First and foremost the legacy datasets will not be available after May. As such i wonder if its even worth renewing theses , how can we check and ensure all consumeres of this data are using the newer data sets.

I also asked for a mapping of legacy to current data sets and we get the following

Legacy datasetGeoIP2 datasetNotes
115, 132,133GeoIP2-City(Geo-115) does not have a direct GeoIP2 replacement however GeoIP2-City provides all the data in 115.
171, 177GeoIP2-Connection-Type
106GeoIP2-CountryThis data is also available in GeoIP2-City
GeoIP2-ISP

Further to this i also asked why only 115 expired and got the following response:

As for why only the GeoIP Legacy Region database is failing, it looks like your subscriptions are updating on different dates. That subscription lapsed yesterday (3/1). Your GeoIP2 City (with GeoIP2 Country included) and GeoIP2 Connection Type subscriptions are due to end on May 1st, and your GeoIP2 ISP subscription will end on December 4th.

how can we check and ensure all consumeres of this data are using the newer data sets

I am not sure ANY consumer is using the new data sets yet. If anyone is using it then it's the Anti-Harrassment-Team but their tickets about that might still be open even though SRE did their part and made them available on appservers.

And our legacy setup has not been changed. Old and new datasets are in separate directories on the appservers.

I am wondering _how_ urgent this really is. Like.. what is the worst that happens if we stop getting updates.. users might be served by a non-optimal DC .. but how long would it take until changes start to become more serious.

Then separately I think the question is "should WMF have a single unified Maxmind account or should separate teams have their own".

We can either go the route to try and add legacy databases to the AH-team license or SRE/serviceops can have their own license and talk to Maxmind directly without going through another team.

I am not sure ANY consumer is using the new data sets yet.

Just to clarify the only legacy datasets are the numbered ones. The datasets named GeoIP2*, which have been available since at least 2014, will still remain. Are you saying that theses are not been used?

Then separately I think the question is "should WMF have a single unified Maxmind account or should separate teams have their own".

I guess this is more q question for finance? but i would vote for a single licence

We can either go the route to try and add legacy databases

Again just to point out theses legacy datasets will not work in 2 months time, i don't think its worth migrating them anywhere.

I am not sure ANY consumer is using the new data sets yet.

Just to clarify the only legacy datasets are the numbered ones. The datasets named GeoIP2*, which have been available since at least 2014, will still remain. Are you saying that theses are not been used?

Sorry, the term "legacy" was overloaded. I used it meaning "the setup before we started getting the 'extended' ones for the IPInfo extension".

To clarify:

USED => what is inside /usr/share/GeoIP/ on appservers, like f.e. mwdebug1001 and all mw*

these are downloaded using the old license key and include the GeoIP2*:

GeoIP2-City.mmdb
6.2M -rw-r--r-- 1 root root 6.2M Mar  2 03:48 GeoIP2-Connection-Type.mmdb
6.6M -rw-r--r-- 1 root root 6.6M Mar  1 06:53 GeoIP2-Country.mmdb
 14M -rw-r--r-- 1 root root  14M Mar  1 06:53 GeoIP2-ISP.mmdb
3.8M -rw-r--r-- 1 root root 3.8M Feb 16  2021 GeoIPASNum.dat
4.5M -rw-r--r-- 1 root root 4.5M Feb 16  2021 GeoIPASNumv6.dat
 48M -rw-r--r-- 1 root root  48M Mar  1 06:53 GeoIPCity.dat
2.2M -rw-r--r-- 1 root root 2.2M Mar  1 06:53 GeoIP.dat
1.2M -rw-r--r-- 1 root root 1.2M Mar  2 03:48 GeoIPNetSpeedCell.dat
1.3M -rw-r--r-- 1 root root 1.3M Mar  2 03:48 GeoIPNetSpeed.dat
7.9M -rw-r--r-- 1 root root 7.9M Mar  1 06:53 GeoIPRegion.dat
1.4M -rw-r--r-- 1 root root 1.4M Feb 16  2021 GeoIPv6.dat
 31M -rw-r--r-- 1 root root  31M Feb 16  2021 GeoLite2-City.mmdb
 16M -rw-r--r-- 1 root root  16M Feb 16  2021 GeoLiteCity.dat
 16M -rw-r--r-- 1 root root  16M Feb 16  2021 GeoLiteCityv6.dat
744K -rw-r--r-- 1 root root 743K Feb 16  2021 GeoLite.dat

NOT sure if used already => what is inside /usr/share/GeoIPInfo on appservers

These are downloaded using the new license key, onwed by anti-harrassment, it includes "enterprise" which the other one does not have.

9.8M -rw-r--r--   1 root root 9.8M Mar  3 04:48 GeoIP2-Anonymous-IP.mmdb
366M -rw-r--r--   1 root root 365M Mar  1 06:53 GeoIP2-Enterprise.mmdb

additionally the file modules/secret/secrets/geoip/GeoIP.conf on puppetmaster1001 which I mentioned before holds the same old license key (UserID 55111) but only lists 3 product IDs "106 133 115".

This added to confusion but looks like it's not actually used. Instead the user IDs and license keys are both in passwords::geoip and the list of product IDs is in public, as you have linked to.

It looks very much like a remnant from before we moved the config file to a template and the passwords to the passwords module.

@Dzahn thanks for the update i think its specifically the following that will no longer be available from may do we know who is the best person to check with and confirm if changes need to be made?

3.8M -rw-r--r-- 1 root root 3.8M Feb 16  2021 GeoIPASNum.dat
4.5M -rw-r--r-- 1 root root 4.5M Feb 16  2021 GeoIPASNumv6.dat
 48M -rw-r--r-- 1 root root  48M Mar  1 06:53 GeoIPCity.dat
2.2M -rw-r--r-- 1 root root 2.2M Mar  1 06:53 GeoIP.dat
1.2M -rw-r--r-- 1 root root 1.2M Mar  2 03:48 GeoIPNetSpeedCell.dat
1.3M -rw-r--r-- 1 root root 1.3M Mar  2 03:48 GeoIPNetSpeed.dat
7.9M -rw-r--r-- 1 root root 7.9M Mar  1 06:53 GeoIPRegion.dat
1.4M -rw-r--r-- 1 root root 1.4M Feb 16  2021 GeoIPv6.dat

@jbond I am not sure either about that part. But that's basically why Lukasz did T302864#7750787 to bring it to the attention of Alex and Kwaku for serviceops and traffic. I noticed though Kwaku can't read this ticket which is limited to SRE.

The data engineering team is the primary contact on the subscription. The legacy GeoIP dataset will be available until May 1. I have asked to extend the subscription until then. We have an invoice. Once it is settled we can resume the download.

The deprecation of that dataset in May creates some urgency to switch over. Here is the ticket to handle migration to GeoIP2: https://phabricator.wikimedia.org/T302989.

The invoice has been provisioned and once the payment the dataset should be available. I will provide an update when it is.

The GeoIP Legacy Region Database subscription has been extended to April 7. @ayounsi does the SRE manage the download?

The GeoIP Legacy Region Database subscription has been extended to April 7. @ayounsi does the SRE manage the download?

I can confirm that the download job on the puppet master (which is how they ultimately get distributed to the appserveres) is able to download all datasets again, update files should have already been deployed elsewhere

Dzahn shifted this object from the Restricted Space space to the S1 Public space.May 5 2022, 4:20 PM

Change 789648 merged by Dzahn:

[operations/puppet@production] puppetmaster::geoip: remove legacy product IDs even for fallback option

https://gerrit.wikimedia.org/r/789648