Background:
wmf_data_ops.data_quality_metrics is a centralized metrics collection table, and we expect to have many writers doing INSERTs and DELETEs on it. On T386114, we found ourselves deadlocked, unable to progress on generating metrics, as one such writer failed ungracefully, and its Hive Metastore exclusive lock on the table was not released, making any other mutations on the table impossible.
Although this table is currently the one with the most risk, we have seen deadlocks elsewhere, and a custom mechanism to deal with this was proposed on T365563. On this ticket, however, we suggest to use the built-in Hive Metastore mechanism for clearing deadlocks, as it is purpose built, would minimal maintenance, and would also allow us to use the SHOW LOCKS statement to inspect anything fishy for any table.
Long story short, the Hive folks built this transaction manager for their support for ACID tables (unrelated to Iceberg). However, we can leverage one particular mechanism from this manager called the AcidHouseKeeperService, which was built specifically for our problem:
AcidHouseKeeperService
This process looks for transactions that have not heartbeated in hive.txn.timeout time and aborts them. The system assumes that a client that initiated a transaction stopped heartbeating crashed and the resources it locked should be released.
In fact, we have being implicitly using parts of the Hive transaction management, as the Iceberg code, upon commit, does automatically request a Lock from the Hive Metastore, and the Hive Metastore does honor this transactionally via the HIVE_LOCKS metastore table (See T386114#10546814 for an example of the content of that table). What we want now is to officially use this mechanism, and have proper deadlock management so that on failures, other Iceberg writers can still progress.
Work to be done:
For our use cases, it looks like this is what we need in our hive-site.xml:
# this allows us to do SHOW LOCKS hive.support.concurrency = true hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager # this enables an always running daemon as part of the Hive Metastore process that will remove deadlocked/expired locks from the HIVE_LOCKS table # It also enables other Hive services that we do not need, but this is the best we can do with Hive 2.3.6. hive.compactor.initiator.on = true
Roughly, we want to:
- Change hive-site.xml to have these settings. (SRE)
- Roll new conf to the test cluster, restart Hive Metastore (SRE)
- Confirm reaper is working, perhaps by setting hive logging to DEBUG to catch this log: https://github.com/apache/hive/blob/2c2fdd524e8783f6e1f3ef15281cc2d5ed08728f/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L3069 (@xcollazo) (Not confirmed, but seems like it should work :D )
- Confirm SHOW LOCKS on hive cli is working (@xcollazo)
- Roll new conf to prod cluster, restart Hive Metastore (SRE)
- Confirm with same tests as above (@xcollazo)
- After a successful release of this mechanism, we should revert the pool definition deployed as a temporary fix on T386114. (@xcollazo)
- Monitor system for a while to see whether we get into a deadlock situation or if the fix works. (@xcollazo)
Sources:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-Transaction/LockManager
https://stackoverflow.com/questions/56930375/hive-acid-table-locks-deadlock-never-expires
https://issues.apache.org/jira/browse/HIVE-17967