Page MenuHomePhabricator

The staging and production deployments of datahub share an Opensearch cluster
Closed, ResolvedPublic

Description

We deploy datahub to wikikube via the deployment charts repository here:
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/services/datahub

When designing the system, we had intended to use a single Opensearch cluster of three nodes to support both the production and staging.

Whilst this wasn't perfect isolation between the two deployments, we had intended for the staging version to use its own indices within the cluster.

One of the values in the values-staging.yaml file is this:

global:
  elasticsearch:
    prefix: "staging-"

The helm chart for the frontend component is supposed to set this here.

{{- with .Values.global.elasticsearch.indexPrefix }}
- name: ELASTIC_INDEX_PREFIX
  value: {{ . }}
{{- end }}

However, there is a typo.

This prefix is therefore not used and this means that the staging deployment shares the same cluster as the production deployment.

This means that we can't safely T329514: Upgrade Datahub to v0.10.4 because this requires a reindexing of the Opensearch indices.

Event Timeline

BTullis triaged this task as High priority.Mar 30 2023, 4:03 PM
BTullis moved this task from Backlog to In Progress on the Data-Catalog board.

Expediting into the current sprint, since it is unexpected work (an incident) and we need to fix it before we can upgrade DataHub in production.

We were expecting to see indices here with the prefix of staging-, but there are none.

btullis@datahubsearch1001:~$ curl -s http://localhost:9200/_cat/indices|grep staging
btullis@datahubsearch1001:~$ curl http://localhost:9200/_cat/indices
green open chart_chartusagestatisticsaspect_v1                                    rjhY340LTuqjvNXfSECGNA 1 1     0     0    416b    208b
green open dashboardindex_v2_1661860547255                                        DvHLtcmCTTSJFtFCm7l1xA 1 1   225     0 137.2kb  68.6kb
green open mlmodelgroupindex_v2_1661860485210                                     -l2ZCJz0TDmfbKcnKirovA 1 1     0     0    416b    208b
green open datahub_usage_event-000228                                             T09QvX_IRLSDmMxC9T9cgg 1 1     0     0    416b    208b
green open datahub_usage_event-000227                                             JFTHh4VqTcmuGybdEzqlHQ 1 1     0     0    416b    208b
green open corpuserindex_v2_1661860574348                                         i9mlB57mSqekkaNx8P8DfQ 1 1  5081   217   2.4mb   1.2mb
green open invitetokenindex_v2_1661860496607                                      kx2_Yz8OQSCMh5or6hqeXw 1 1     1     0   8.5kb   4.2kb
green open datahub_usage_event-000229                                             LpBbPdfQTXqlZy48fAAuOw 1 1     0     0    416b    208b
green open datahubretentionindex_v2_1661860563546                                 3-5qwSrfSPGGiH2x9wtSbg 1 1     0     0    416b    208b
green open graph_service_v1                                                       rcEIo52WT3GR2s8pthhOdg 1 1  5776     2   1.3mb 678.5kb
green open .tasks                                                                 iZip7NxVTuCxgoLC0_CpOg 1 1   225     0 169.7kb  84.8kb
green open dataflowindex_v2_1661860601418                                         sudwCTxVQ6C7A-I3O_eH7g 1 1     1     0  25.6kb  12.8kb
green open testindex_v2_1661860579720                                             SvHrlnEuQKup3DKjm4oM3Q 1 1     0     0    416b    208b
green open mlmodeldeploymentindex_v2_1661860558114                                ZhpDuOGxTpeFk5yrCWqCKQ 1 1     0     0    416b    208b
green open datahubaccesstokenindex_v2_1661860451498                               rxAfLR7JTZiX3Daw4IvSxA 1 1     0     0    416b    208b
green open datahubexecutionrequestindex_v2_1661860491196                          9SXzyEZKTZi-uMNmjFqeVQ 1 1  1457    21 440.9kb   212kb
green open dataprocessinstance_dataprocessinstanceruneventaspect_v1_1655305168714 TXUsmYC9RbyW9OPBV03VYA 1 1     0     0    416b    208b
green open postindex_v2                                                           hvY8_ViTQk6kRy3Ur7RNLQ 1 1     0     0    416b    208b
green open dataset_assertionruneventaspect_v1                                     inc1cx8PSPq_LrcNCul7wQ 1 1     0     0    416b    208b
green open datahub_usage_event-000279                                             -eMS0G72SNyvTw9vybL7CA 1 1     0     0    416b    208b
green open datahub_usage_event-000278                                             I6tiHjNETWKPSlTiYsGu0g 1 1     0     0    416b    208b
green open dataplatforminstanceindex_v2_1661860516506                             rQAThVVpTl2A5oo9LrQEUg 1 1     0     0    416b    208b
green open tagindex_v2_1661860528365                                              0rBsGeauTSutRXEpLZY9RA 1 1    14     8 148.3kb  74.1kb
green open datahub_usage_event_backup                                             LviArtaRR_qOqhSRoo_MiA 1 1  1083     0   1.6mb 821.8kb
green open datahub_usage_event-000271                                             ZlN5MOA7QYyy1r5BuZi1jA 1 1     0     0    416b    208b
green open datahub_usage_event-000270                                             iFqEmCmjTWWwcFP3EVJc_g 1 1     0     0    416b    208b
green open datahub_usage_event-000273                                             VNhmTKytTt-26fKl1yScYQ 1 1     0     0    416b    208b
green open notebookindex_v2_1661860552689                                         761QzXEiTCibNtujDeC1LQ 1 1     0     0    416b    208b
green open datahub_usage_event-000272                                             cOuM8-czTMCg7st6rwSeBg 1 1     0     0    416b    208b
green open datahub_usage_event-000275                                             PbvkaVbrT9K6sHI8Tt569w 1 1     0     0    416b    208b
green open datahub_usage_event-000274                                             7nZcWa7dSLWIbT-D1ZvcbQ 1 1     0     0    416b    208b
green open datahub_usage_event-000277                                             wh1O8w4lSLy1cA4mBIN8jg 1 1     0     0    416b    208b
green open datahub_usage_event-000276                                             w6PO_wjJR6OXFT5N6fjqSw 1 1     0     0    416b    208b
green open datahub_usage_event-000280                                             W7MGlw12Rd6eYE3l4Qw74g 1 1     0     0    416b    208b
green open datahubupgradeindex_v2_1661860457764                                   LWijtMrbTX-WjU4zgbqXbQ 1 1     0     0    416b    208b
green open dataset_datasetprofileaspect_v1_1655305152728                          uzuNZrbST9WsrxptgpP-8Q 1 1  9719     0   5.3mb   2.6mb
green open datajob_datahubingestioncheckpointaspect_v1_1655305142033              IX2yn5KJSGqHuZQ6XwlAbg 1 1     0     0    416b    208b
green open datahubpolicyindex_v2_1655304529793                                    PVNn6FwoRPy0QlczZTTkRw 1 1     6     0  17.7kb   8.8kb
green open datahub_usage_event-000282                                             7CmaQPAiR1-GTuwin1TbQQ 1 1     0     0    416b    208b
green open datahub_usage_event-000281                                             -YNgy39yQ8Ww6WV-xlfvpA 1 1     0     0    416b    208b
green open datahub_usage_event-000284                                             e6vPnLYrTiW_4Ofgz9_veA 1 1     0     0    416b    208b
green open system_metadata_service_v1_1649957649071                               zjlKP3zuR22O_o9YYDrhsA 1 1 55513  1052   9.9mb   4.9mb
green open datahubpolicyindex_v2_1661860445517                                    yV5VuaDtRtGQRBNgW9JmcA 1 1    16     9 158.1kb  81.5kb
green open datahub_usage_event-000283                                             ucGfShALQVa3820t7pHENQ 1 1     0     0    416b    208b
green open datahub_usage_event-000286                                             ZVeD44uHTTiwBAu0m04AxQ 1 1     0     0    416b    208b
green open datahub_usage_event-000285                                             iAYHikX6SayvrODz6mQRng 1 1     0     0    416b    208b
green open .opendistro-job-scheduler-lock                                         wFrGeN2ySGKez1rbU_UWgg 1 1    60 71520 143.3mb  71.8mb
green open assertion_assertionruneventaspect_v1_1655305147370                     Sz26XSOrTaOPvCBc4YsmKw 1 1     0     0    416b    208b
green open datahub_usage_event-000257                                             yiX0EnCoTViZEZIj8MLUQg 1 1     0     0    416b    208b
green open datahub_usage_event-000256                                             FrlJs7juSS2dqxqtI4OK4Q 1 1     0     0    416b    208b
green open datahub_usage_event-000259                                             x7I_QBSbQeacRtXmWdN4ZA 1 1     0     0    416b    208b
green open datahub_usage_event-000258                                             eWEnh2Q7RPuAtRZYQv5OpA 1 1     0     0    416b    208b
green open datahub_usage_event-000251                                             jLybaWeyTau_FWEBRgJdmA 1 1     0     0    416b    208b
green open chartindex_v2_1661860633899                                            cCzdMxBHS2WJJta5jlg0PQ 1 1  2027     0   1.3mb 689.4kb
green open datahub_usage_event-000250                                             r2S2ZL4sQMy2PlJnj7uetw 1 1     0     0    416b    208b
green open datahub_usage_event-000253                                             7_ptI56-TC2TTOLOCK_UvA 1 1     0     0    416b    208b
green open datajob_datahubingestionrunsummaryaspect_v1_1655305136712              tZIXWMAKSUCohTISYJ-4RQ 1 1     0     0    416b    208b
green open datahub_usage_event-000252                                             R5WuR2IuRGGBNQGdBIBotA 1 1     0     0    416b    208b
green open dataset_operationaspect_v1_1655305163393                               ck13gaAcSHOGYCYxw2b69Q 1 1     0     0    416b    208b
green open datahub_usage_event-000255                                             bKrJK87MTEORAUjglo-y4w 1 1     0     0    416b    208b
green open datahub_usage_event-000254                                             S5GCbrIIR_OmdjxJOdlP9w 1 1     0     0    416b    208b
green open datahubsecretindex_v2_1661860623124                                    DxhtqEYRTfmj1YTSelIl_A 1 1     0     0    416b    208b
green open dataprocessinstanceindex_v2_1661860639329                              UOKyK8CjRBmbl_kqt2g-2A 1 1     0     0    416b    208b
green open mlfeatureindex_v2_1661860595969                                        fl1MkvVlSrep8UKut5v8Og 1 1     0     0    416b    208b
green open datahub_usage_event-000268                                             erkMFtrLTg-yTQu-QYOb5g 1 1     0     0    416b    208b
green open dataprocessindex_v2_1661860471088                                      2dgnztiKTP-3g_RVoIQhxw 1 1     0     0    416b    208b
green open datahub_usage_event-000267                                             8cSfuJULRuCnEgr7XwtRDQ 1 1     0     0    416b    208b
green open datahub_usage_event-000269                                             WhD4ekEFSiKEW7lfZjecQQ 1 1     0     0    416b    208b
green open datasetindex_v2_1661860628516                                          9Ksu8YZgQxO69xqtRg2bEg 1 1  1672  2942  11.3mb   7.9mb
green open mlfeaturetableindex_v2_1661860478914                                   OE4dkaU9SVaXODUOyVKdrA 1 1     0     0    416b    208b
green open datahub_usage_event-000260                                             Nl2NrSdqQh-4qQ9M1leaWQ 1 1     0     0    416b    208b
green open schemafieldindex_v2_1661860522520                                      GAwGuL0KS4G1to3MvvoOmg 1 1     0     0    416b    208b
green open datahub_usage_event-000262                                             6B2Y3hSvRNS7X1JtbMO3DA 1 1     0     0    416b    208b
green open datahub_usage_event-000261                                             BSW7Xvy7Q9usBnorkdKA_g 1 1     0     0    416b    208b
green open datahub_usage_event-000264                                             EzLQKYQgRRS-4nkr_markw 1 1     0     0    416b    208b
green open datahub_usage_event-000263                                             2sf4cYmnTy2HX4bnNmkXtg 1 1     0     0    416b    208b
green open datahubingestionsourceindex_v2_1661860606806                           w7wAQHSRQ266SENrZIeLkA 1 1     5     3  63.6kb  55.8kb
green open datahub_usage_event-000266                                             jtwc3VP3Tmaiw9s84X4nzw 1 1     0     0    416b    208b
green open datahub_usage_event-000265                                             YXhg1Y1OTj6QCcSuSnNbVQ 1 1     0     0    416b    208b
green open dataset_datasetusagestatisticsaspect_v1_1655305158052                  MRecfSbrQ2KIz0AfAuQ1oA 1 1     0     0    416b    208b
green open corpgroupindex_v2_1661860463677                                        _1BFeVZaTNGw17ToMOBunw 1 1   603    18 458.8kb 229.4kb
green open datahubroleindex_v2                                                    GYUfoUvvRQ65Xcmm8cywEg 1 1     3     0  15.6kb   7.8kb
green open datahub_usage_event-000235                                             HqA0hCiCSQ27BHC1IRYSfg 1 1     0     0    416b    208b
green open mlprimarykeyindex_v2_1661860540976                                     vRPcnYHNS0SiUrK9ZFwGbw 1 1     0     0    416b    208b
green open datahub_usage_event-000234                                             JZuhZG-KS8Cw1VV38o5cgg 1 1     0     0    416b    208b
green open datahub_usage_event-000237                                             wiG7rPZ6QN-2lX9LmAlApw 1 1     0     0    416b    208b
green open datahub_usage_event-000236                                             kERWosAFShu1_cMwKpJ5hA 1 1     0     0    416b    208b
green open datahub_usage_event-000239                                             rx2WzcDNQKOiQWyaROy-kQ 1 1     0     0    416b    208b
green open mlmodelindex_v2_1661860585129                                          E6kWoTTWSB6iVrTYj6pjww 1 1     0     0    416b    208b
green open datahub_usage_event-000238                                             sv4NwtyDTT-PopawdGpiCQ 1 1     0     0    416b    208b
green open assertionindex_v2_1661860509015                                        Ddh1t1M_Sj-0YdLjMoQdtQ 1 1     0     0    416b    208b
green open datahub_usage_event-000231                                             mbzSaJbFR4yroEA9RIiSPA 1 1     0     0    416b    208b
green open datahub_usage_event-000230                                             YP6lvurpTr6mFJdjTSwXyA 1 1     0     0    416b    208b
green open datahub_usage_event-000233                                             CwfXqsgQTAKSim94E12SHw 1 1     0     0    416b    208b
green open datahub_usage_event-000232                                             YpfZyqFcSVamXLM1LsA5xQ 1 1     0     0    416b    208b
green open glossarytermindex_v2_1661860534809                                     -dYhw3ePT8KZmSBBAXHtOA 1 1    20     9 384.2kb 192.1kb
green open telemetryindex_v2_1661860617691                                        FwZlVUiJT4ytQenUCx59Sw 1 1     0     0    416b    208b
green open dashboard_dashboardusagestatisticsaspect_v1                            IZu9esQeQB6YNRWWZpd0aw 1 1     0     0    416b    208b
green open datahub_usage_event-000246                                             qKqvQ8TeQMKqw6cGbBMhpw 1 1     0     0    416b    208b
green open dataplatformindex_v2_1661860568990                                     oh507n5HTd6bUAogREup6A 1 1     0     0    416b    208b
green open datahub_usage_event-000245                                             uDSeTQs9SjSejAq9Yg4lxw 1 1     0     0    416b    208b
green open datahub_usage_event-000248                                             8nEhXTUZTVaiVXk_CM3kqQ 1 1     0     0    416b    208b
green open domainindex_v2_1661860612272                                           o2ZsjpHjSROZ327RoNXTAg 1 1     0     0    416b    208b
green open datahub_usage_event-000247                                             J5-OKbMWT62edeGNn_EFpQ 1 1     0     0    416b    208b
green open datahub_usage_event-000249                                             ehqG27NqRlGo7-iuNY4UAA 1 1     0     0    416b    208b
green open glossarynodeindex_v2_1661860590525                                     PkiQw1XJSseefr-kyljXfg 1 1     3     0  46.7kb  23.3kb
green open datajobindex_v2_1661860502981                                          q59VMEo2RmGTU1qnhaYlMg 1 1     0     0    416b    208b
green open datahub_usage_event-000240                                             adir28tXTCKloHEaCDpajA 1 1     0     0    416b    208b
green open datahub_usage_event-000242                                             Bv0ngf9oRNSMqrMtwH1hQA 1 1     0     0    416b    208b
green open datahub_usage_event-000241                                             DieMVo0oQ9CuVAcQRdIRjA 1 1     0     0    416b    208b
green open datahub_usage_event-000244                                             GN_jzK1uQ56YQsEfUS12hw 1 1     0     0    416b    208b
green open datahub_usage_event-000243                                             m10GOBT-RTKvmFBCJUl-gw 1 1     0     0    416b    208b
green open containerindex_v2_1661860439293                                        ER9-al1TTLim3tc3lxm-ng 1 1    18    55 137.6kb   111kb

Change 904571 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Correct the datahub elasticsearch index prefix for staging

https://gerrit.wikimedia.org/r/904571

Change 904571 merged by jenkins-bot:

[operations/deployment-charts@master] Correct the datahub elasticsearch index prefix for staging

https://gerrit.wikimedia.org/r/904571

Mentioned in SAL (#wikimedia-analytics) [2023-03-31T12:23:56Z] <btullis> deploying datahub to staging T333580

Change 904782 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Bump the main datahub chart version

https://gerrit.wikimedia.org/r/904782

Change 904782 merged by jenkins-bot:

[operations/deployment-charts@master] Bump the main datahub chart version

https://gerrit.wikimedia.org/r/904782

BTullis moved this task from In Progress to Done on the Data-Catalog board.

The INDEX_PREFIX environment variable is now applied to the pods in the staging deployment:

btullis@deploy2002:~$ kube_env datahub staging
btullis@deploy2002:~$ kubectl describe pod datahub-gms-main-646cbdb857-6b44l datahub-frontend-main-78dd5bbcfb-vsqr2 |grep PREFIX
      INDEX_PREFIX:                 staging-
      INDEX_PREFIX:                        staging-

Now, I think that the database and search indices are empty, which is OK. I can go ahead and try a test ingestion.