Page MenuHomePhabricator

ATS ts_lua coredumps on config reload
Closed, ResolvedPublic

Description

Yesterday after applying some apparently innocuous changes on ATS remap rules, some traffic_server instances coredumped upon reload.
Triggering changes:

We've observed at least two similar stacktraces involving libtslua.so TSRemapDeleteInstanceand libtslua.so TSRemapNewInstance

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Vgutierrez triaged this task as Medium priority.Mar 31 2020, 4:53 AM
Vgutierrez moved this task from Backlog to Caching on the Traffic board.
TSRemapDeleteInstance stacktrace
Mar 30 12:07:56 cp2013 traffic_manager[32876]: traffic_server: received signal 11 (Segmentation fault)
Mar 30 12:07:56 cp2013 traffic_manager[32876]: traffic_server - STACK TRACE:
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x55744cb7b010]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f03732ec730]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(+0x1627d)[0x7f029100f27d]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(+0xbe36)[0x7f0291004e36]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(+0x368f0)[0x7f029102f8f0]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(lua_gc+0xd8)[0x7f0291050638]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/lib/trafficserver/modules/tslua.so(+0x1338a)[0x7f033001938a]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/lib/trafficserver/modules/tslua.so(TSRemapDeleteInstance+0x16)[0x7f033000ee56]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN11url_mappingD1Ev+0xec)[0x55744cc5de3c]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN4TrieI11url_mappingE5ClearEv+0x28)[0x55744cc65c08]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN19UrlMappingPathIndexD2Ev+0x4b)[0x55744cc64efb]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN19UrlMappingPathIndexD0Ev+0x9)[0x55744cc64f99]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN10UrlRewriteD1Ev+0x7d)[0x55744cc5e72d]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN10UrlRewriteD0Ev+0x9)[0x55744cc5ec59]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN19DeleterContinuationI10UrlRewriteE8dieEventEiPv+0x13)[0x55744cbfc6e3]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x92)[0x55744ce60112]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9Link_linkEEPiS5_+0x27e)[0x55744ce60b1e]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x18f)[0x55744ce60fff]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /usr/bin/traffic_server(+0x3ad9fa)[0x55744ce5f9fa]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f03732e1fa3]
Mar 30 12:07:56 cp2013 traffic_manager[32876]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f0372eea4cf]
TSRemapNewInstance
Mar 30 12:25:19 cp2010 traffic_manager[38653]: traffic_server: received signal 11 (Segmentation fault)
Mar 30 12:25:19 cp2010 traffic_manager[38653]: traffic_server - STACK TRACE:
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa0)[0x55de69e02010]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7ffbf3b9e730]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(+0x15ead)[0x7ffb71845ead]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(+0xbe36)[0x7ffb7183be36]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(+0x36584)[0x7ffb71866584]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libluajit-5.1.so.2(lua_gc+0xd8)[0x7ffb71887638]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/lib/trafficserver/modules/tslua.so(+0x135c8)[0x7ffbe80775c8]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/lib/trafficserver/modules/tslua.so(TSRemapNewInstance+0x2ef)[0x7ffbe806cd0f]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_Z17remap_load_pluginPPKciP11url_mappingPciiPi+0x704)[0x55de69ede3a4]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(+0x1a8bb7)[0x55de69ee1bb7]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_Z18remap_parse_configPKcP10UrlRewrite+0x9a)[0x55de69ee2aea]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_ZN10UrlRewrite10BuildTableEPKc+0x71)[0x55de69ee6f11]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_ZN10UrlRewriteC1Ev+0x270)[0x55de69ee72c0]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_Z16reloadUrlRewritev+0x49)[0x55de6a045ab9]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_ZN21UR_UpdateContinuation19file_update_handlerEiPv+0x9)[0x55de6a0461d9]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x92)[0x55de6a0e7112]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9Link_linkEEPiS5_+0x27e)[0x55de6a0e7b1e]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(_ZN7EThread15execute_regularEv+0x18f)[0x55de6a0e7fff]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /usr/bin/traffic_server(+0x3ad9fa)[0x55de6a0e69fa]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7ffbf3b93fa3]
Mar 30 12:25:19 cp2010 traffic_manager[38653]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ffbf379c4cf]

This issue seems to be identified by upstream at https://github.com/apache/trafficserver/pull/6403 but the fix hasn't been backported to ATS 8.x

Change 584812 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/debs/trafficserver@master] Release 8.0.6-1wm5

https://gerrit.wikimedia.org/r/584812

Another ongoing issue which causes traffic_server to crash upon configuration reloads and related to tslua is T242952.

Change 584812 merged by Vgutierrez:
[operations/debs/trafficserver@master] Release 8.0.6-1wm5

https://gerrit.wikimedia.org/r/584812

Mentioned in SAL (#wikimedia-operations) [2020-03-31T09:05:25Z] <vgutierrez> upload trafficserver 8.0.5-1wm6 to apt.wm.o (buster) - T248938

Mentioned in SAL (#wikimedia-operations) [2020-03-31T12:23:00Z] <vgutierrez> rolling upgrade of ATS to version 8.0.6-1wm5 - T248938

Change 589551 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] ATS: Disable KA on cp1077

https://gerrit.wikimedia.org/r/589551

Change 589551 merged by Vgutierrez:
[operations/puppet@production] ATS: Disable KA on cp1077

https://gerrit.wikimedia.org/r/589551

Mentioned in SAL (#wikimedia-operations) [2020-04-17T09:07:39Z] <vgutierrez> disable KA between ats-tls and varnish-fe on cp1077 - T248938

BBlack added a subscriber: BBlack.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

@Vgutierrez Three years later, have you experienced these?

BCornwall claimed this task.

Since the patch has long since been merged and we're well upgraded, assuming this is fixed.