Page MenuHomePhabricator

PHP Warning: geoip_country_code_by_name(): Required database not available at /usr/share/GeoIP/GeoIP.dat.
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Which came from MediaWiki-extensions-WikimediaEvents

normalized_message
[{reqId}] {exception_url}   PHP Warning: geoip_country_code_by_name(): Required database not available at /usr/share/GeoIP/GeoIP.dat.
exception.trace
from /srv/mediawiki/php-1.42.0-wmf.5/extensions/WikimediaEvents/includes/BlockUtils.php(134)
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.42.0-wmf.5/extensions/WikimediaEvents/includes/BlockUtils.php(134): geoip_country_code_by_name(string)
#2 /srv/mediawiki/php-1.42.0-wmf.5/extensions/WikimediaEvents/includes/BlockUtils.php(101): WikimediaEvents\BlockUtils::getCountryCode()
#3 /srv/mediawiki/php-1.42.0-wmf.5/extensions/WikimediaEvents/includes/EditPage/EditPageHooks.php(45): WikimediaEvents\BlockUtils::logBlockedEditAttempt(MediaWiki\User\User, MediaWiki\Title\Title, string, string)
#4 /srv/mediawiki/php-1.42.0-wmf.5/includes/HookContainer/HookContainer.php(161): WikimediaEvents\EditPage\EditPageHooks->onEditPage__showReadOnlyForm_initial(MediaWiki\EditPage\EditPage, MediaWiki\Output\OutputPage)
#5 /srv/mediawiki/php-1.42.0-wmf.5/includes/HookContainer/HookRunner.php(1593): MediaWiki\HookContainer\HookContainer->run(string, array)
#6 /srv/mediawiki/php-1.42.0-wmf.5/includes/editpage/EditPage.php(1004): MediaWiki\HookContainer\HookRunner->onEditPage__showReadOnlyForm_initial(MediaWiki\EditPage\EditPage, MediaWiki\Output\OutputPage)
#7 /srv/mediawiki/php-1.42.0-wmf.5/includes/editpage/EditPage.php(993): MediaWiki\EditPage\EditPage->displayViewSourcePage(WikitextContent, string)
#8 /srv/mediawiki/php-1.42.0-wmf.5/includes/editpage/EditPage.php(664): MediaWiki\EditPage\EditPage->displayPermissionsError(array)
#9 /srv/mediawiki/php-1.42.0-wmf.5/includes/actions/EditAction.php(66): MediaWiki\EditPage\EditPage->edit()
#10 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(583): EditAction->show()
#11 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(363): MediaWiki->performAction(Article, MediaWiki\Title\Title)
#12 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(960): MediaWiki->performRequest()
#13 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(613): MediaWiki->main()
#14 /srv/mediawiki/php-1.42.0-wmf.5/index.php(50): MediaWiki->run()
#15 /srv/mediawiki/php-1.42.0-wmf.5/index.php(46): wfIndexMain()
#16 /srv/mediawiki/w/index.php(3): require(string)
#17 {main}

Another one from LandingCheck

exception.trace
from /srv/mediawiki/php-1.42.0-wmf.5/extensions/LandingCheck/includes/SpecialLandingCheck.php(111)
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.42.0-wmf.5/extensions/LandingCheck/includes/SpecialLandingCheck.php(111): geoip_country_code_by_name(string)
#2 /srv/mediawiki/php-1.42.0-wmf.5/includes/specialpage/SpecialPage.php(727): MediaWiki\Extension\LandingCheck\SpecialLandingCheck->execute(NULL)
#3 /srv/mediawiki/php-1.42.0-wmf.5/includes/specialpage/SpecialPageFactory.php(1637): MediaWiki\SpecialPage\SpecialPage->run(NULL)
#4 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(357): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)
#5 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(960): MediaWiki->performRequest()
#6 /srv/mediawiki/php-1.42.0-wmf.5/includes/MediaWiki.php(613): MediaWiki->main()
#7 /srv/mediawiki/php-1.42.0-wmf.5/index.php(50): MediaWiki->run()
#8 /srv/mediawiki/php-1.42.0-wmf.5/index.php(46): wfIndexMain()
#9 /srv/mediawiki/w/index.php(3): require(string)
#10 {main}
Impact

That is the first entry which started happening on Monday 11/27 at 8:24:26 UTC. It only affects MediaWiki on Kubernetes as far as I can tell, I guess the image we build is missing the proprietary GeoIP database.

Note how the trace are for 1.42.0-wmf.5 which is the MediaWiki code from two weeks ago. So it is not really related to this week deployment (1.42.0-wmf.7 T350083) but seems to be an issue with how we define the MediaWiki on Kubernetes image.

Notes

SAL entries around that time for Nov 27:

08:43 	<taavi@deploy2002> 	Finished scap: Backport for [[gerrit:966598|Add virtual domain mapping for OATHAuth (T348484)]] (duration: 07m 53s) 	[production]
08:41 	<godog> 	restart prometheus/k8s-staging in eqiad - T343529 	[production]
08:37 	<taavi@deploy2002> 	taavi: Continuing with sync 	[production]
08:36 	<taavi@deploy2002> 	taavi: Backport for [[gerrit:966598|Add virtual domain mapping for OATHAuth (T348484)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) 	[production]
08:35 	<taavi@deploy2002> 	Started scap: Backport for [[gerrit:966598|Add virtual domain mapping for OATHAuth (T348484)]] 	[production]
08:29 	<taavi@deploy2002> 	Finished scap: Backport for [[gerrit:976804|GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143)]] (duration: 19m 54s) 	[production]
08:23 	<taavi@deploy2002> 	taavi and sgimeno: Continuing with sync 	[production]
08:18 	<taavi@deploy2002> 	taavi and sgimeno: Backport for [[gerrit:976804|GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) 	[production]
08:14 	<moritzm> 	installing dpkg bugfix updates on bullseye 	[production]
08:09 	<taavi@deploy2002> 	Started scap: Backport for [[gerrit:976804|GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143)]]

Which is merely the first deployment of the week which caused a new image to be build. The actual root cause would be in the image definition.

Details

MediaWiki Version
1.42.0-wmf.5
Request URL
https://en.wikipedia.org/w/index.php?action=edit&title=*

Event Timeline

hashar triaged this task as Unbreak Now! priority.Nov 28 2023, 9:52 AM
hashar created this task.

The problem is that the new kubernetes nodes don't have a copy of the .dat files... because those files have been discontinued in April 2022 and we should've converted any use of such files a long time ago, per T269475, using the geoip2/geoip2 library instead.

The problem was introduced in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/baaf1182661d6430991499f4a54e9da6fb4f061b which happened well after we should have discontinued using geoipv1

We can re-provide those files, but I think we should instead fix the code, here and anywhere else we're still using the old geoip module

Change 978034 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/WikimediaEvents@master] [WIP] BlockUtils: Use IPInfo's GeoLite2InfoRetriever to get country name

https://gerrit.wikimedia.org/r/978034

@jbond @Muehlenhoff It appears that we are missing GeoIP.dat, GeoIPRegion.dat, and GeoIPCity.dat from the new puppetmasters. Would it be alright if we'd just copied them over from the old puppetmasters for now?

@jbond @Muehlenhoff It appears that we are missing GeoIP.dat, GeoIPRegion.dat, and GeoIPCity.dat from the new puppetmasters. Would it be alright if we'd just copied them over from the old puppetmasters for now?

@jijiki yes you can copy them all to puppetserver1001:/srv/puppet_fileserver/volatile and then you can run cumin A:puppetserver 'systemctl start sync-puppet-volatile.service' to get them pulled in everywhere else

The problem is that the new kubernetes nodes don't have a copy of the .dat files... because those files have been discontinued in April 2022 and we should've converted any use of such files a long time ago, per T269475, using the geoip2/geoip2 library instead.

The problem was introduced in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/baaf1182661d6430991499f4a54e9da6fb4f061b which happened well after we should have discontinued using geoipv1

We can re-provide those files, but I think we should instead fix the code, here and anywhere else we're still using the old geoip module

I would go with reproviding the geoip files since their removal has lead to code being broken. I guess remaining usage of the legacy geoip_country_code_by_name() could have been investigated and migrated before the removal, potentially with a guard in CI to prevent from being reinstated :)

The problem is that the new kubernetes nodes don't have a copy of the .dat files... because those files have been discontinued in April 2022 and we should've converted any use of such files a long time ago, per T269475, using the geoip2/geoip2 library instead.

The problem was introduced in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/baaf1182661d6430991499f4a54e9da6fb4f061b which happened well after we should have discontinued using geoipv1

We can re-provide those files, but I think we should instead fix the code, here and anywhere else we're still using the old geoip module

I would go with reproviding the geoip files since their removal has lead to code being broken. I guess remaining usage of the legacy geoip_country_code_by_name() could have been investigated and migrated before the removal, potentially with a guard in CI to prevent from being reinstated :)

The usages were allegedly all been removed, minus one, back in the day; and that code was copied over after the migration to a second extension (this one).

Change 978031 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/WikimediaEvents@master] BlockUtils: Don't use geoip v1 methods

https://gerrit.wikimedia.org/r/978031

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/978031 would remove the code from WikimediaEvents, which is probably OK to get the train moving forward?

Change 978031 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] BlockUtils: Don't use geoip v1 methods

https://gerrit.wikimedia.org/r/978031

Change 978058 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetserver::rsync_module: pass the ca_server

https://gerrit.wikimedia.org/r/978058

jijiki claimed this task.

Files copied over :)

Change 978058 merged by JHathaway:

[operations/puppet@production] puppetserver::rsync_module: pass the ca_server

https://gerrit.wikimedia.org/r/978058

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/978031 would remove the code from WikimediaEvents, which is probably OK to get the train moving forward?

That was an infrastructure issue (the old legacy GeoIP data were no more available) rather than a code issue. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/978031 should be reverted to restore GeoIP coding in producing for WikimediaEvents.

Mentioned by Giuseppe above, geoip_country_code_by_name is the legacy GeoIP v1 (which relies on the files that were removed) and the two extensions using it should be moved to the geoip2 library (what ever it is). I guess those should be filed as new tasks for MediaWiki-extensions-WikimediaEvents
and LandingCheck and made blockers to whatever task we have to remove GeoIP v1.

As far as this week MediaWiki train and I are concerned, the issue got resolved but the above actions should be filed and acted on.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/978031 would remove the code from WikimediaEvents, which is probably OK to get the train moving forward?

That was an infrastructure issue (the old legacy GeoIP data were no more available) rather than a code issue. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/978031 should be reverted to restore GeoIP coding in producing for WikimediaEvents.

No, it's a code issue as those files are not updated since 2022. So any geoip lookup on those stale files is actually harmful.

I need to reiterate that this remains a UBN! software issue even if it's not a train blocker.

Joe reopened this task as Open.EditedNov 29 2023, 8:34 AM

The task is not resolved until both uses of the old geoip library calls aren't removed. We can either leave this UBN! open (and assign it to the maintainers of the libraries) or create UBN! subtasks for those.

Frankly, I'd be inclined to just remove that fallback lookup now, and design a better geolookup system in core.

I created the subtasks, assigned the one about WikimediaEvents to @kostajh.