Page MenuHomePhabricator

codfw1dev: rabbitmq is not working because some auth failures
Closed, ResolvedPublic

Description

The rabbitmq logs are showing something like this:

root@cloudcontrol2004-dev:/var/log/rabbitmq# tail -f rabbit@rabbitmq02.codfw1dev.wikimediacloud.org.log
[..]
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>     supervisor: {<0.9920.26>,rabbit_channel_sup}
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>     errorContext: shutdown_error
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>     reason: killed
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>     offender: [{pid,<0.9924.26>},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                {id,channel},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                {mfargs,
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                    {rabbit_channel,start_link,
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                        [1,<0.9874.26>,<0.9921.26>,<0.9874.26>,
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                         <<"172.20.5.6:53760 -> 172.20.5.6:5671">>,
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                         rabbit_framing_amqp_0_9_1,
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                         {user,<<"neutron">>,[],
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                             [{rabbit_auth_backend_internal,none}]},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                         <<"/">>,
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                         [{<<"authentication_failure_close">>,bool,true},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                          {<<"connection.blocked">>,bool,true},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                          {<<"consumer_cancel_notify">>,bool,true}],
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                         <0.9912.26>,<0.9923.26>]}},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                {restart_type,intrinsic},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                {shutdown,70000},
2024-09-04 12:30:06.567879+00:00 [error] <0.9920.26>                {child_type,worker}]
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>     supervisor: {<0.9930.26>,rabbit_channel_sup}
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>     errorContext: shutdown_error
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>     reason: killed
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>     offender: [{pid,<0.9934.26>},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                {id,channel},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                {mfargs,
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                    {rabbit_channel,start_link,
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                        [1,<0.9891.26>,<0.9931.26>,<0.9891.26>,
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                         <<"172.20.5.6:53768 -> 172.20.5.6:5671">>,
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                         rabbit_framing_amqp_0_9_1,
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                         {user,<<"neutron">>,[],
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                             [{rabbit_auth_backend_internal,none}]},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                         <<"/">>,
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                         [{<<"authentication_failure_close">>,bool,true},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                          {<<"connection.blocked">>,bool,true},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                          {<<"consumer_cancel_notify">>,bool,true}],
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                         <0.9916.26>,<0.9932.26>]}},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                {restart_type,intrinsic},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                {shutdown,70000},
2024-09-04 12:30:06.568412+00:00 [error] <0.9930.26>                {child_type,worker}]

Event Timeline

aborrero triaged this task as Medium priority.
aborrero added a project: User-aborrero.

Mentioned in SAL (#wikimedia-cloud) [2024-09-05T08:46:49Z] <arturo> [codfw1dev] restart rabbitmq @ codfw1dev T374002

aborrero lowered the priority of this task from Medium to Low.Sep 5 2024, 9:00 AM
aborrero moved this task from Backlog to Radar/observer on the User-aborrero board.

comment by andrew: this is maybe a consequence of rabbit being collocated in cloudcontrols. Consider having them running on separate hardware like in eqiad1.

I believe this is resolved, reopen if necessary.