Page MenuHomePhabricator

Enable lock transaction management on the Hive Metastore
Closed, ResolvedPublic

Description

Background:

wmf_data_ops.data_quality_metrics is a centralized metrics collection table, and we expect to have many writers doing INSERTs and DELETEs on it. On T386114, we found ourselves deadlocked, unable to progress on generating metrics, as one such writer failed ungracefully, and its Hive Metastore exclusive lock on the table was not released, making any other mutations on the table impossible.

Although this table is currently the one with the most risk, we have seen deadlocks elsewhere, and a custom mechanism to deal with this was proposed on T365563. On this ticket, however, we suggest to use the built-in Hive Metastore mechanism for clearing deadlocks, as it is purpose built, would minimal maintenance, and would also allow us to use the SHOW LOCKS statement to inspect anything fishy for any table.

Long story short, the Hive folks built this transaction manager for their support for ACID tables (unrelated to Iceberg). However, we can leverage one particular mechanism from this manager called the AcidHouseKeeperService, which was built specifically for our problem:

AcidHouseKeeperService
This process looks for transactions that have not heartbeated in hive.txn.timeout time and aborts them. The system assumes that a client that initiated a transaction stopped heartbeating crashed and the resources it locked should be released.

In fact, we have being implicitly using parts of the Hive transaction management, as the Iceberg code, upon commit, does automatically request a Lock from the Hive Metastore, and the Hive Metastore does honor this transactionally via the HIVE_LOCKS metastore table (See T386114#10546814 for an example of the content of that table). What we want now is to officially use this mechanism, and have proper deadlock management so that on failures, other Iceberg writers can still progress.


Work to be done:

For our use cases, it looks like this is what we need in our hive-site.xml:

# this allows us to do SHOW LOCKS
hive.support.concurrency = true
hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

# this enables an always running daemon as part of the Hive Metastore process that will remove deadlocked/expired locks from the HIVE_LOCKS table
# It also enables other Hive services that we do not need, but this is the best we can do with Hive 2.3.6.
hive.compactor.initiator.on = true

Roughly, we want to:


Sources:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-Transaction/LockManager
https://stackoverflow.com/questions/56930375/hive-acid-table-locks-deadlock-never-expires
https://issues.apache.org/jira/browse/HIVE-17967

Details

Other Assignee
brouberol
Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Revert mutex pool for data_quality_metrics.repos/data-engineering/airflow-dags!1387xcollazorevert-092b9d31main
Customize query in GitLab

Event Timeline

Manual test of SHOW LOCKS:

xcollazo@stat1011:~$ hostname -f
stat1011.eqiad.wmnet

hive -hiveconf hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager -hiveconf hive.support.concurrency=true

hive (default)> show locks;
OK
lockid	database	table	partition	lock_state	blocked_by	lock_type	transaction_id	last_heartbeat	acquired_at	user	hostname	agent_info
Lock ID	Database	Table	Partition	State	Blocked By	Type	Transaction ID	Last Heartbeat	Acquired At	User	Hostname	Agent Info
Time taken: 0.059 seconds, Fetched: 1 row(s)

Note how these particular confs are only needed on the client side for SHOW LOCKS to work. However, let's add it do the hive-site.xml so that any client would pick it up by default.

Our Hive version as of today:

$ hive --version
Hive 2.3.6
Git file:///ws/output/hive/hive-2.3.6 -r Unknown
Compiled by root on Thu Jun 22 12:19:21 UTC 2023
From source with checksum 378697e1b585b3ee945a8a37d465a24a

The AcidHouseKeeperService is indeed available in this version: https://github.com/apache/hive/blob/rel/release-2.3.6/ql/src/java/org/apache/hadoop/hive/ql/txn/AcidHouseKeeperService.java

Unfortunately, this version does not have the metastore.task.threads.always conf flag, as this version does not support running the Hive Metastore in standalone mode, so we need to find another way to enable this.

This other flag seems to be hive.compactor.initiator.on = true as per https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Compactor, and as per code. Note we do get other unnecessary threads to run via this flag, but that is the best we can do at this version.

Will update description with these findings.


To confirm this is working, we can set hive logging to DEBUG to catch this log: https://github.com/apache/hive/blob/2c2fdd524e8783f6e1f3ef15281cc2d5ed08728f/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L3069

Change #1125486 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Enable lock transaction management in the hive metastore on hadoop_test

https://gerrit.wikimedia.org/r/1125486

I have created the first patch, which applies the settings to the hadoop test cluster: Enable lock transaction management in the hive metastore on hadoop_test
We can deploy that on Monday, if that suits you @xcollazo.

I have created the first patch, which applies the settings to the hadoop test cluster: Enable lock transaction management in the hive metastore on hadoop_test
We can deploy that on Monday, if that suits you @xcollazo.

Left some comments on patchset, but yes Monday works for me.

Change #1125486 merged by Brouberol:

[operations/puppet@production] Enable lock transaction management in the hive metastore on hadoop_test

https://gerrit.wikimedia.org/r/1125486

The change has been deployed!

The change has been deployed!

Thanks @brouberol!

This is the first part. Now I need to verify that all is good in the analytics test cluster.

oh sorry, I was overzealous with phabricator, wasn't I?

Actually, we need to restart the Hive metastore from the test cluster. Could you help with that @brouberol ?

Gehel triaged this task as High priority.Mar 21 2025, 9:40 AM

Actually, we need to restart the Hive metastore from the test cluster. Could you help with that @brouberol ?

Bump. :)

brouberol@an-test-coord1001:~$ sudo systemctl restart hive-metastore.service
brouberol@an-test-coord1001:~$

Done! I'm so sorry about the delay, I completely missed the notification.

I've updated the description above to be clear about who owns the steps.

Next up is two tasks for me, thus grabbing this task, and I will give it back to @brouberol when we are ready for next step for SRE.

Config looks good:

xcollazo@an-test-client1002:/etc/hive/conf$ cat hive-site.xml | grep compactor -B 1 -A 6
  <property>
      <name>hive.compactor.initiator.on</name>
      <value>true</value>
  </property>
  <property>
      <name>hive.txn.manager</name>
      <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
  </property>

Confirmed SHOW LOCKS works without extra settings on the test cluster:

~$ hostname -f
an-test-client1002.eqiad.wmnet

$ hive
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/conf.analytics-test-hadoop/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> SHOW LOCKS;
OK
lockid	database	table	partition	lock_state	blocked_by	lock_type	transaction_id	last_heartbeat	acquired_at	user	hostname	agent_info
Lock ID	Database	Table	Partition	State	Blocked By	Type	Transaction ID	Last Heartbeat	Acquired At	User	Hostname	Agent Info
Time taken: 1.347 seconds, Fetched: 1 row(s)

Unfortunately, I was not able to confirm whether the reaper was on or not (i.e the helper thread that should be on with the hive.compactor.initiator.on flag), as I don't see an easy way to enable DEBUG logging on the test cluster's Hive Metastore. I'd say, however, that the settings are safe enough that we can move them to prod, as they have not had any adverse effect on the test cluster.

WDYT @brouberol, @BTullis?

I'm happy to have it be deployed prod, as we've enabled it in staging for a month an a half w/o nefarious effect. I'll send a patch.

Change #1145762 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] analytics-hive: Enable lock transaction management in the production hive metastore

https://gerrit.wikimedia.org/r/1145762

Change #1145762 merged by Brouberol:

[operations/puppet@production] analytics-hive: Enable lock transaction management in prod hive metastore

https://gerrit.wikimedia.org/r/1145762

The change has been deployed to the hive metastore, and the services has been restarted. I'll let you monitor and close if appropriate.

The change has been deployed to the hive metastore, and the services has been restarted. I'll let you monitor and close if appropriate.

Great, thanks @brouberol!

Production config looks good:

xcollazo@stat1011:/etc/hive/conf$ cat hive-site.xml | grep compactor -B 1 -A 6
  <property>
      <name>hive.compactor.initiator.on</name>
      <value>true</value>
  </property>
  <property>
      <name>hive.txn.manager</name>
      <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
  </property>

Confirmed SHOW LOCKS works without extra settings on the production cluster:

$ hostname -f
stat1011.eqiad.wmnet

$ hive
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/conf.analytics-hadoop/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> show locks;
OK
lockid	database	table	partition	lock_state	blocked_by	lock_type	transaction_id	last_heartbeat	acquired_at	user	hostname	agent_info
Lock ID	Database	Table	Partition	State	Blocked By	Type	Transaction ID	Last Heartbeat	Acquired At	User	Hostname	Agent Info
Time taken: 0.536 seconds, Fetched: 1 row(s)
hive (default)> show locks;
xcollazo lowered the priority of this task from High to Medium.May 30 2025, 2:48 PM
xcollazo updated the task description. (Show Details)

Work remaining here as per description:

  • After a successful release of this mechanism, we should revert the pool definition deployed as a temporary fix on T386114. (@xcollazo)

Will be doing this soon.

Work remaining here as per description:

  • After a successful release of this mechanism, we should revert the pool definition deployed as a temporary fix on T386114. (@xcollazo)

Will be doing this soon.

Ok now we need to monitor for a bit to see if all is fine. I will do the monitoring.

(Manually deleted pool mutex_for_wmf_data_ops_data_quality_metrics via Airflow UI)