In VictorOps / Splunk On-Call parlance, business hours oncall is implemented as an "Escalation Policy".
The Batphone is also an "Escalation Policy", and, one escalation policy can trigger another policy if alert(s) go un-acked for a certain interval of time.
This is all fine and good, however the docs state:
Note: If there is no on-call user scheduled in a rotation at the time when this escalation action is triggered, the resulting behavior is that no page will occur in this step. The time delay before the next step will remain as configured. For example, if an incident triggers an Escalation Policy during off-hours and there is no one on call in the rotation to immediately page, the escalation policy will page no one and then wait however long is specified before executing step two.
The VictorOps entity that provides "glue" between an incoming alert and the appropriate Escalation Policy is known as a "Routing Key".
Currently we have a few relevant ones: icinga, netops, sre-batphone, and the default (fall-through). All four are set to trigger the SRE Business Hours escalation policy.
So therefore, when that policy is empty, alerts are delayed by 5 minutes.
In the interests of expediency I recommend working around this ourselves.
We already have code that can check the membership of the current business hours rotation -- it was committed to Klaxon last week.
This code is already installed and configured and running on role::alerting_host hosts.
We could create a new routing key, direct-batphone, that would page the batphone escalation policy directly, skipping business hours.
Then, it would be trivial to poll this VO API once a minute, and (say) write out a file to disk somewhere with the current Routing Key to be used -- the usual value if there are business hours oncallers, or instead force-batphone if there are not.
