T389734 noted emergency timeout exceptions occurring when the specified time had not elapsed. This was due to glibc delivering a timer event with the wrong sigevent (BZ#32833).
The workaround in MW was to disable Lua profiling. The bug will still happen at a lower rate. Excimer and LuaSandbox are usable outside of MediaWiki and should not have such bugs in them. It's not acceptable to wait for a fix to be merged in glibc and then to wait for the glibc fix to filter down to all our users.
The broad implementation options are:
- Delayed deletion. Don't call timer_delete() until we are pretty sure that there are no events in flight.
- Event validation. Discard bad events based on some property of the event.
- Reimplement the glibc timer handler using SIGEV_THREAD_ID.
Option 3 would be similar to this patch, although not so similar as to violate the author's copyright, since Excimer is not GPL licensed. Per my comment there, it would be convenient for us if the handler thread were joinable, because that guarantees that no events are delivered after timer_delete(), allowing us to simplify the calling code. It is not possible to use a joinable thread in glibc because POSIX specifies that the thread is not joinable.
SIGEV_THREAD_ID is specific to Linux, although most other UNIX-based operating systems can use the kqueue implementation. The documentation says "This flag is intended only for use by threading libraries" which would be fine if our threading library was doing a decent job. I suppose we can call Excimer a threading library.