Page MenuHomePhabricator

fix puppet issues when applying role::gerrit::server in labs
Closed, ResolvedPublic

Event Timeline

Getting error

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass ipv4 to Class[Role::Gerrit::Server] on node gerrit-test3.git.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Dzahn renamed this task from fix puppet issues when applying role:;gerrit::server in labs to fix puppet issues when applying role::gerrit::server in labs.Aug 1 2016, 6:16 PM

hosts/lead.yaml:role::gerrit::server::ipv4: '208.80.154.85'

You'll have to set a value for ipv4 in Hiera. In Labs you can either do that in the repo or on the special wiki page.

I now get

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass ipv6 to Class[Role::Gerrit::Server] on node gerrit-test3.git.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Now I get

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item gerrit::host in any Hiera data file and no default supplied at /etc/puppet/modules/role/manifests/gerrit/server.pp:7 on node gerrit-test3.git.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Alright, that's the same thing, just IPv6 instead of IPv4

hosts/lead.yaml:role::gerrit::server::ipv6: '2620:0:861:3:208:80:154:85'

Now I get

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass replication to Class[Gerrit::Jetty] at /etc/puppet/modules/gerrit/manifests/init.pp:9 on node gerrit-test3.git.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

I now get error

Notice: /Stage[main]/Role::Gerrit::Server/Interface::Ip[role::gerrit::server_ipv6]/Exec[ip addr add /128 dev eth0]/returns: Error: an inet prefix is expected rather than "/128".
Error: ip addr add /128 dev eth0 returned 1 instead of one of [0,2]
Error: /Stage[main]/Role::Gerrit::Server/Interface::Ip[role::gerrit::server_ipv6]/Exec[ip addr add /128 dev eth0]/returns: change from notrun to 0 2 failed: ip addr add /128 dev eth0 returned 1 instead of one of [0,2]

It does not let you get away with setting a blank value for IPv6. It wants to see a real IP there.

you can try using one from 2001:0DB8::/32 which is reserved for testing in rfc3849

Now I get

Error: Could not set uid on user[gerrit2]: Execution of '/usr/sbin/usermod -u 444 gerrit2' returned 6: usermod: user 'gerrit2' does not exist in /etc/passwd

The user gerrit2 cannot be created in labs due to it probably already being created in ldap.

According to this error

Error: Could not set uid on user[gerrit2]: Execution of '/usr/sbin/usermod -u 444 gerrit2' returned 6: usermod: user 'gerrit2' does not exist in /etc/passwd

But actually is created on the instance.

root@gerrit-test3:/home/paladox# id gerrit2
uid=2069(gerrit2) gid=1002(nda) groups=1005(labsadminbots),1002(nda)

@demon how to handle puppetized system users in labs when they conflict with LDAP users?

Change 302356 had a related patch set uploaded (by Chad):
Gerrit: Default to no replication

https://gerrit.wikimedia.org/r/302356

Change 302356 merged by Dzahn:
Gerrit: Default to no replication

https://gerrit.wikimedia.org/r/302356

Change 302491 had a related patch set uploaded (by Dzahn):
gerrit: ensure symlink /etc/default/gerritcodereview

https://gerrit.wikimedia.org/r/302491

Hitting error

root@gerrit-test3:/var/lib/gerrit2/review_site/logs# journalctl -xn

  • Logs begin at Mon 2016-08-01 18:08:54 UTC, end at Tue 2016-08-02 18:49:43 UTC. --

Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Apache/Service[apache2]) Dependency Service[gerrit] has failures: true
Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Apache/Service[apache2]) Skipping because of failed dependencies
Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Gerrit::Proxy/Letsencrypt::Cert::Integrated[gerrit]/Exec[acme-setup-acme-gerrit]) Dependency Service[gerrit] has failures: true
Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Gerrit::Proxy/Letsencrypt::Cert::Integrated[gerrit]/Exec[acme-setup-acme-gerrit]) Skipping because of failed dependencies
Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: Finished catalog run in 11.10 seconds
Aug 02 18:49:26 gerrit-test3 sudo[1385]: pam_unix(sudo:session): session closed for user root
Aug 02 18:49:42 gerrit-test3 sudo[2153]: diamond : TTY=unknown ; PWD=/ ; USER=puppet ; COMMAND=list /bin/cat /var/lib/puppet/state/last_run_summary.yaml
Aug 02 18:49:43 gerrit-test3 sudo[2154]: diamond : TTY=unknown ; PWD=/ ; USER=puppet ; COMMAND=/bin/cat /var/lib/puppet/state/last_run_summary.yaml
Aug 02 18:49:43 gerrit-test3 sudo[2154]: pam_unix(sudo:session): session opened for user puppet by (uid=0)
Aug 02 18:49:43 gerrit-test3 sudo[2154]: pam_unix(sudo:session): session closed for user puppe

root@gerrit-test3:/var/lib/gerrit2/review_site/logs# bash -x /etc/init.d/gerrit start
+ test 1 -gt 0
+ ACTION=start
+ shift
+ test 0 -gt 0
+ test -z ''
+ NO_START=0
+ test -z ''
+ START_STOP_DAEMON=1
+ test -f /etc/default/gerritcodereview
+ . /etc/default/gerritcodereview
++ GERRIT_SITE=/var/lib/gerrit2/review_site
++ GERRIT_WAR=/var/lib/gerrit2/review_site/bin/gerrit.war
+ test -z ''
+ TMP=/tmp
+ TMPJ=/tmp/j3426
+ GERRIT_INSTALL_TRACE_FILE=etc/gerrit.config
+ type git
+ : OK
+ test -z /var/lib/gerrit2/review_site
+ test -z /var/lib/gerrit2/review_site
++ pwd
+ INITIAL_DIR=/var/lib/gerrit2/review_site/logs
+ cd /var/lib/gerrit2/review_site
++ pwd
+ GERRIT_SITE=/var/lib/gerrit2/review_site
+ GERRIT_CONFIG=/var/lib/gerrit2/review_site/etc/gerrit.config
+ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
+ test -r /var/lib/gerrit2/review_site/etc/gerrit.config
+ GERRIT_PID=/var/lib/gerrit2/review_site/logs/gerrit.pid
+ GERRIT_RUN=/var/lib/gerrit2/review_site/logs/gerrit.run
+ GERRIT_TMP=/var/lib/gerrit2/review_site/tmp
+ export GERRIT_TMP
+ JAVA_HOME_OLD=
++ get_config --get container.javaHome
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--get = x--int
++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --get container.javaHome
+ JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
+ test -z /usr/lib/jvm/java-7-openjdk-amd64/jre
+ test -z /usr/lib/jvm/java-7-openjdk-amd64/jre
+ test -z '' -a -n /usr/lib/jvm/java-7-openjdk-amd64/jre -a -x /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -a '!' -d /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
+ JAVA=/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
+ test -z /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
+ test -z ''
+ JSTACK=/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/jstack
++ get_config --get-all container.javaOptions
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--get-all = x--int
++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --get-all container.javaOptions
+ GERRIT_OPTIONS=
+ test -n ''
++ get_config --get container.heapLimit
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--get = x--int
++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --get container.heapLimit
+ GERRIT_MEMORY=28g
+ test -n 28g
+ JAVA_OPTIONS=' -Xmx28g'
++ get_config --int core.packedGitOpenFiles
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--int = x--int
+++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --int core.packedGitOpenFiles
++ n=4096
++ test x0 = x4096
++ echo 4096
+ GERRIT_FDS=4096
+ test -z 4096
++ expr 4096 + 4096
+ GERRIT_FDS=8192
+ test 8192 -lt 1024
++ get_config --get container.user
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--get = x--int
++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --get container.user
+ GERRIT_USER=gerrit2
+ ulimit -c 0
+ ulimit -d unlimited
+ ulimit -f unlimited
+ ulimit -m
+ ulimit -m unlimited
+ ulimit -n 8192
+ ulimit -t unlimited
+ ulimit -v unlimited
+ ulimit -x
+ ulimit -x unlimited
+ test -z /var/lib/gerrit2/review_site/bin/gerrit.war
+ test -z /var/lib/gerrit2/review_site/bin/gerrit.war
+ test -z /var/lib/gerrit2/review_site/bin/gerrit.war -a -n gerrit2
+ test -z /var/lib/gerrit2/review_site/bin/gerrit.war
+ test -z gerrit2
+ RUN_ARGS='-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site'
++ get_config --bool container.slave
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--bool = x--int
++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --bool container.slave
+ test '' = true
++ get_config --get-all container.daemonOpt
++ test -f /var/lib/gerrit2/review_site/etc/gerrit.config
++ test x--get-all = x--int
++ git config --file /var/lib/gerrit2/review_site/etc/gerrit.config --get-all container.daemonOpt
+ DAEMON_OPTS=
+ test -n ''
+ test -n ' -Xmx28g'
+ RUN_ARGS=' -Xmx28g -jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site'
+ test -x /usr/bin/perl
+ export JAVA
+ RUN_EXEC=/usr/bin/perl
+ RUN_Arg1=-e
+ RUN_Arg2='$x=$ENV{JAVA};exec $x @ARGV;die $!'
+ RUN_Arg3='-- GerritCodeReview'
+ case "$ACTION" in
+ printf %s 'Starting Gerrit Code Review: '
Starting Gerrit Code Review: + test 1 = 0
+ test -z 0
++ date +%s
+ RUN_ID=1470165935.3426
+ RUN_ARGS=' -Xmx28g -jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site --run-id=1470165935.3426'
+ test 1 = 1
+ type start-stop-daemon
+ test 0 = 0
+ CH_USER='-c gerrit2'
+ start-stop-daemon -S -b -c gerrit2 -p /var/lib/gerrit2/review_site/logs/gerrit.pid -m -d /var/lib/gerrit2/review_site -a /usr/bin/perl -- -e '$x=$ENV{JAVA};exec $x @ARGV;die $!' -- GerritCodeReview -Xmx28g -jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site --run-id=1470165935.3426
+ : OK
+ test 0 = 0
++ cat /var/lib/gerrit2/review_site/logs/gerrit.pid
+ PID=3449
+ test -f /proc/3449/oom_score_adj
+ echo -1000
+ TIMEOUT=90
+ sleep 1
+ running /var/lib/gerrit2/review_site/logs/gerrit.pid
+ test -f /var/lib/gerrit2/review_site/logs/gerrit.pid
++ cat /var/lib/gerrit2/review_site/logs/gerrit.pid
+ PID=3449
+ ps -p 3449
+ return 1
+ echo FAILED
FAILED
+ exit 1

Getting error

root@gerrit-test3:/var/lib/gerrit2# /usr/bin/java -jar gerrit.war reindex -d review_site --threads 4
fatal: DbInjector failed
fatal: Unable to determine SqlDialect
fatal: caused by com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
fatal:
fatal: The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
fatal: caused by java.net.ConnectException: Connection refuse

I fixed It by setting

"gerrit::jetty::db_pass": passwordhere

It randomly creates your db password for you in gerrit.

You will find the password in /var/lib/gerrit2/review_site/etc/secure.config

You will also need to install MySQL before hand.

and follow https://gerrit-review.googlesource.com/Documentation/install.html#createdb_mysql

remember to replace the db user with gerrit not gerrit2 for now.

Change 302491 merged by Dzahn:
gerrit: ensure symlink /etc/default/gerritcodereview

https://gerrit.wikimedia.org/r/302491

Change 303146 had a related patch set (by Paladox) published:
Gerrit: Support http only, configured by a config

https://gerrit.wikimedia.org/r/303146

Change 303435 had a related patch set uploaded (by Paladox):
Gerrit: Support labs https

https://gerrit.wikimedia.org/r/303435

Yay, we now have

https://gerrit.git.wmflabs.org/r/#/q/status:open

working. It should not need any manual hacks anymore.

But we want to proof this by deleting the instance one more time and recreating it by applying the puppet role and nothing else.

We still have the MySQL part that was done manually, but documented at P3637

Change 303146 abandoned by Chad:
Gerrit: Support http only, configured by a config

Reason:
Gerrit needs HTTPS in all environments, per IRC and other discussions. Our puppet manifests are written for letsencrypt support out of the box

https://gerrit.wikimedia.org/r/303146

What's actually left to do here?

Nothing much now.

We just need to go through it again. I.E. retest it by deleting test instance and recreating it.

Change 303435 abandoned by Paladox:
Gerrit: Support labs https

https://gerrit.wikimedia.org/r/303435

sooo.. we did this and re-created it one more time to proof everything is actually fixed now.

Also we split the DB part of it into a separate instance as we talked about before with Chad,
letting us easily re-create fully puppetized gerrits while leaving the DB backend the same.

  • created new instance gerrit-mysql, following docs at P3939
  • deleted instance gerrit-test3
  • re-created gerrit-test3, following docs at P3637, configured to use role::gerrit::server
  • ran puppet and got errors on P3957
  • added security group to let gerrt-test3 connect to gerrit-mysql on mysql port 3306
  • let the db server listen on it's LAN IP instead of just 127.0.0.1
  • adjusted mysql GRANTs
  • ran puppet again ..

blocked because one instance can't talk mysql to the other instance, which looks like it's caused by T142165

Dzahn changed the task status from Open to Stalled.Aug 31 2016, 11:18 PM
Dzahn triaged this task as Medium priority.

I'm not convinced that it's T142165:

  • gerrit-test.git.eqiad.wmflabs and jenkins-slave-01.git.eqiad.wmflabs can traceroute gerrit-mysql and telnet gerrit-mysql 3306
  • gerrit-test3.git.eqiad.wmflabs and alex-test.git.eqiad.wmflabs cannot
  • All can sudo traceroute gerrit-mysql -T and sudo traceroute gerrit-mysql -I

@Andrew?

gerrit-test3 uses kernel

Linux gerrit-test3 4.4.0-1-amd64 #1 SMP Debian 4.4.2-3+wmf3 (2016-07-28) x86_64 Debian GNU/Linux 8.5 (jessie)

and gerrit-MySQL uses

Linux gerrit-mysql 4.4.0-1-amd64 #1 SMP Debian 4.4.2-3+wmf2 (2016-05-11) x86_64 Debian GNU/Linux 8.5 (jessie)

gerrit-mysql can run telnet gerrit-mysql 3306 but fails on gerrit-test3 with the newer kernel.

root@gerrit-mysql:/home/paladox# telnet gerrit-mysql 3306
Trying 10.68.23.211...
Connected to gerrit-mysql.git.eqiad.wmflabs.
Escape character is '^]'.
GHost '10.68.23.211' is not allowed to connect to this MariaDB serverConnection closed by foreign host.

but on gerrit-test3 it just hangs on Trying 10.68.23.211...

But it dosent seem to be the kernel that is the problem.

Maybe the image but not sure.

Only two days ago I setup gerrit-mysql and that works but setting up gerrit-test3 today it seems it wont connect to mysql on gerrit-mysql which is strange but probably a bug in the image.

Adding labs project since were stuck until it is fixed.

Paladox changed the task status from Stalled to Open.Sep 1 2016, 3:47 PM

@chasemp @Paladox

confirmed, working now (yea, Access denied = "works" in this case, before it was all timeout)

thank you!

dzahn@gerrit-test3:~$ mysql -h gerrit-mysql
ERROR 1045 (28000): Access denied for user 'dzahn'@'10.68.19.170' (using password: NO)

We went through the instructions one more time and edited them slightly.

It's done now, we have instructions how to get gerrit up and running in labs with just puppet that are repeatable.

We will now just dump the pastebin content on wikitech and we'll have docs to follow in the future.

Paladox claimed this task.

Closing as resolved now.