While working on T321605 and T331300 , we noticed [[ https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?from=1683651995441&orgId=1&to=1683675992002&var-cluster_name=wdqs&var-k8sds=codfw%20prometheus%2Fk8s&var-opsds=codfw%20prometheus%2Fops&var-site=codfw&viewPanel=6 | a performance discrepancy between wdqs2022 (newest active host) and prior hosts ]] . The linked graph suggests that the older hosts are 20-40% faster at triples ingestion, a key metric for WDQS. Data import times are recorded in T241128.
Hardware differences are noted [[ https://phabricator.wikimedia.org/P48178 | here ]] . wdqs2022 is our first R450 in production, and it's also the first Bullseye host running the WDQS stack.
I also noticed that our CPU frequency governors are set to 'powersave' and they should probably be 'performance'. Per IRC conversation in #wikimedia-sre , tickets T225713 T315398 and T328957 have some history and insights on past efforts to choose a CPU performance governor. Note that even the older hosts are using 'powersave,' so this is probably not our root cause.
Creating this ticket to:
- Identify the root cause of performance differences
- Make adjustments to optimize performance on newer hosts