Page MenuHomePhabricator

Stop pulling netbox-exported-dns repo from Phabricator Diffusion (which mirrors netbox-exports.wikimedia.org)
Closed, ResolvedPublic

Description

In parent T405596, we'd like to reduce / switch off Diffusion IO.
Looking at Phabricator pull logs for the last 30 days via https://phabricator.wikimedia.org/diffusion/pulllog/?repositories=PHID-REPO-hje5czxxz2myfpljuoul , netbox-exported-dns is regularly pulled from one single internal IP which belongs to the cloud-instances realm.

Codesearch points to itself: https://gerrit.wikimedia.org/r/plugins/gitiles/labs/codesearch/+/refs/heads/master/write_config.py includes the line
conf['repos']['netbox DNS'] = phab_repo('netbox-exported-dns')

That line should ideally be switched to using the canonical git repository instead of the mirror in Phabricator Diffusion, however, caveat, as pointed out by taavi on IRC:

<taavi> andre: fwiw there is no netbox-exported-dns repo in Gerrit, the phab repo is there to mirror from the netbox host as cloning it from there is suuuuper slow

Indeed URI configuration at https://phabricator.wikimedia.org/source/netbox-exported-dns/manage/uris/ states that Phabricator Diffusion mirrors from https://netbox-exports.wikimedia.org/dns.git

In Codesearch's write_config.py, entire phab_repo() should likely get removed as we do not host anything canonically on Phabricator Diffusion anymore.
May need a new, special-case netbox_exports_repo() code here. But performance (see above).

Event Timeline

amir@amir:~$ time git clone https://netbox-exports.wikimedia.org/dns.git dns2
Klone nach 'dns2'...
Anfordern der Objekte: 29530, fertig.

real	4m11,948s
user	0m7,969s
sys	0m2,358s

And it wasn't a shallow clone. I think it's fiiine.

Sigh

amir@amir:~$ time git clone --depth 1 https://netbox-exports.wikimedia.org/dns.git dns3
Klone nach 'dns3'...
Schwerwiegend: Dumb HTTP-Transport unterstützt keine shallow-Funktionen

real	0m0,823s
user	0m0,000s
sys	0m0,011s

How slow are we talking about? :D

From Phab:

taavi@runko:/tmp $ time g clone https://phabricator.wikimedia.org/source/netbox-exported-dns/
Cloning into 'netbox-exported-dns'...
remote: Enumerating objects: 29530, done.
remote: Counting objects: 100% (29530/29530), done.
remote: Compressing objects: 100% (7530/7530), done.
remote: Total 29530 (delta 23009), reused 28412 (delta 21953), pack-reused 0
Receiving objects: 100% (29530/29530), 5.84 MiB | 9.46 MiB/s, done.
Resolving deltas: 100% (23009/23009), done.

real	0m3.279s
user	0m4.514s
sys	0m0.308s

From Netbox:

taavi@runko:/tmp $ time g clone https://netbox-exports.wikimedia.org/dns.git
Cloning into 'dns'...
Fetching objects: 29530, done.

real	4m17.921s
user	0m7.439s
sys	0m1.759s

So I don't have a way to point to the file. Since except the .git file, nothing exists in netbox-exports endpoint:

root@netbox1003:/srv/netbox-exports# ls 
dns.git  netbox-dns  netbox-hiera

You might be like oh there is a netbox-dns directory, there, right there. But it's the .git directory with no actual easily accessible content:

root@netbox1003:/srv/netbox-exports/netbox-dns# ls
branches  config  description  HEAD  hooks  info  objects  refs

The branches directory is empty and find . doesn't bring any useful file only git stuff (which can be used to build the actual data but I can't point to them in the result of codesearch:

root@netbox1003:/srv/netbox-exports/netbox-dns# find .
.
./description
./objects
./objects/ca
./objects/ca/a0833d3aeb41b8b0ddd841b39a5254cc84e8d7
./objects/ca/8ef1592fb3a4e84fb33b15ee616baea21c1a37
./objects/ca/4c0a1731965268f958ed8df75644eab934c9bd
./objects/df
./objects/df/fe67ae6b5968f231c989526ae9ab73bba2fc09
./objects/68
./objects/68/d6531823abd59e2f119f4549142f97c5a9367b
./objects/08
./objects/08/5ee5ac01c3e581ac6e8188b452a2a4d12303a2
./objects/08/8f6edc9b14d452cf7e4f15c3ba1d817984f007
./objects/f0
./objects/f0/e795411cef05628e73f9a3da895861a54785b3
./objects/34
./objects/34/5543879dd9c6157bf0767cf0e889c4fa4074c5
./objects/9f
./objects/9f/f1514140640b48517ec0c41842f3238683a8a5
./objects/b4
./objects/b4/f3555a42a9fe4df46e130d1369ace3687300ac
[a lot of similar directories and files]
./objects/ce/13129f33ebf4e1ec5d23eaeddc0423933e4061
./branches
./hooks
./hooks/update.sample
./hooks/post-update
./hooks/pre-receive.sample
./hooks/push-to-checkout.sample
./hooks/post-update.sample
./hooks/prepare-commit-msg.sample
./hooks/fsmonitor-watchman.sample
./hooks/pre-commit.sample
./hooks/applypatch-msg.sample
./hooks/commit-msg.sample
./hooks/pre-push.sample
./hooks/pre-rebase.sample
./hooks/pre-applypatch.sample
./hooks/pre-merge-commit.sample
./info
./info/exclude
./info/refs
./HEAD
./refs
./refs/heads
./refs/heads/master
./refs/tags
./config

Change #1208445 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[labs/codesearch@master] [WIP] Switch off netbox-exports from diffusion

https://gerrit.wikimedia.org/r/1208445

(honestly it wouldn't be the end of the world if we just switch indexing it off)

Change #1208445 merged by jenkins-bot:

[labs/codesearch@master] Switch off netbox-exports from diffusion

https://gerrit.wikimedia.org/r/1208445

Ladsgroup claimed this task.

It'll be live in 24 hours, please reopen if it doesn't.

Hmm, https://phabricator.wikimedia.org/diffusion/pulllog/?repositories=PHID-REPO-hje5czxxz2myfpljuoul still looks like alive and kicking when it comes to pulling approx every 90min from 172.16.4.133

I'm forcing a restart of the services, let's see if that makes an impact.

Write config is broken:

Nov 28 14:24:49 codesearch9 write_config.py[705103]: Skip unsupported remote URL: https://git.push-f.com/mw-code/
Nov 28 14:24:49 codesearch9 write_config.py[705103]: Traceback (most recent call last):
Nov 28 14:24:49 codesearch9 write_config.py[705103]:   File "/srv/codesearch/write_config.py", line 632, in <module>
Nov 28 14:24:49 codesearch9 write_config.py[705103]:     main()
Nov 28 14:24:49 codesearch9 write_config.py[705103]:   File "/srv/codesearch/write_config.py", line 581, in main
Nov 28 14:24:49 codesearch9 write_config.py[705103]:     make_conf('search', args,
Nov 28 14:24:49 codesearch9 write_config.py[705103]:   File "/srv/codesearch/write_config.py", line 548, in make_conf
Nov 28 14:24:49 codesearch9 write_config.py[705103]:     new = extract_urls(conf)
Nov 28 14:24:49 codesearch9 write_config.py[705103]:           ^^^^^^^^^^^^^^^^^^
Nov 28 14:24:49 codesearch9 write_config.py[705103]:   File "/srv/codesearch/write_config.py", line 568, in extract_urls
Nov 28 14:24:49 codesearch9 write_config.py[705103]:     return {repo['url'] for repo in conf['repos'].values()}
Nov 28 14:24:49 codesearch9 write_config.py[705103]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 28 14:24:49 codesearch9 write_config.py[705103]:   File "/srv/codesearch/write_config.py", line 568, in <setcomp>
Nov 28 14:24:49 codesearch9 write_config.py[705103]:     return {repo['url'] for repo in conf['repos'].values()}
Nov 28 14:24:49 codesearch9 write_config.py[705103]:             ~~~~^^^^^^^
Nov 28 14:24:49 codesearch9 write_config.py[705103]: TypeError: string indices must be integers, not 'str'

Don't ask how I got this:

Nov 28 14:38:09 codesearch9 write_config.py[705745]: <class 'dict'> {'url': 'https://gerrit-replica.wikimedia.org/r/labs/countervandalism/stillalive.git', 'url-pattern': {'base-url': 'https://gerrit.wikimedia.org/g/labs/countervandalism/stillalive/+/{rev}/{path}{anchor}', 'anchor': '#{line}'}, 'ms-between-poll': 5400000}
Nov 28 14:38:09 codesearch9 write_config.py[705745]: <class 'str'> https://github.com/toolforge/paws
Nov 28 14:38:09 codesearch9 write_config.py[705745]: <class 'int'> 5400000

Change #1212595 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[labs/codesearch@master] Fix PAWS

https://gerrit.wikimedia.org/r/1212595

Change #1212595 merged by jenkins-bot:

[labs/codesearch@master] Fix PAWS

https://gerrit.wikimedia.org/r/1212595

It has finally stopped