User Details
- User Since
- Oct 7 2014, 4:49 PM (452 w, 3 d)
- Availability
- Available
- LDAP User
- EBernhardson
- MediaWiki User
- EBernhardson (WMF) [ Global Accounts ]
Mon, Jun 5
Fri, May 26
Wed, May 24
Mon, May 22
Thu, May 18
These queries look to be running as expected now.
Wed, May 17
Tue, May 16
To get an idea of what we need to optimize i ran an experiment. This experiment stands up a fresh elasticsearch instance, creates 100 indexes with the same settings, and restarts the instance every 10 indexes. I measure how long the instance takes to come up and how long indices take to create. Ran this experiment with 4 different index configurations:
Mon, May 15
reindex complete, looks to have resolved the issue as expected.
Search backend error during entity_full_text search for 'test' after 35: Parse error on Cannot search on field [labels.en] since it is not indexed.
has been deployed for a month without issues, lets hope this is resolved.
Thu, May 11
The most notable difference in metrics I see is in the disk utilization per host of the cluster overview dashboard. During the backfilling period all the other codfw hosts are reporting a max per-disk value of around 10%. For 2002 half the disks were at 25% and the other half at 45-50%.
May 10 2023
May 9 2023
May 8 2023
May 5 2023
May 4 2023
May 3 2023
As part of migrating our integration testing out of vagrant I put together an environment based off of mwcli / mwdd that runs enough of CirrusSearch to pass the integration tests: https://gitlab.wikimedia.org/repos/search-platform/cirrus-integration-test-runner/
May 2 2023
Apr 27 2023
Released a new version of the plugin to maven central, 7.10.2-wmf8. Once the debian packaging is done and available we can start updating the places that use it.
Apr 26 2023
Apr 25 2023
We can use an elasticsearch query to find the oldest dated completion indices. This query will give us the 5 titlesuggest indices with the oldest batch_id (~= indexing timestamp) when issued against the :9243 cluster:
Poking at the logs for the script that builds the daily autocomplete indices, we may be missing errors that happen there. The logs show that the enwiki completion index failed its daily build from dec 9 2022 through jan 20 2023. This was not identified by any of our monitoring, we should correct that so these errors bubble up sooner and get fixed immediately.
To check that titles are making it into the primary search index i ran a quick python script (P47281) and ran it for the last 7 days worth of new pages according to the recent changes api. It found 12572 pages that were created and should exist in the enwiki search index. Of these 12 were not found in the search index. A manual check shows them all to be redirects to redirects which we don't index. This looks to be generally working, although there could certainly be edge cases that are not handled.
I tried to clarify the section of docs about updates to include the distinction between full-text and title completion search indexes: https://www.mediawiki.org/w/index.php?title=Help:CirrusSearch&diff=prev&oldid=5897819
Apr 24 2023
Took a quick pass at the docs on wikitech as well: https://wikitech.wikimedia.org/wiki/Cindy_The_Browser_Test_Bot
Updated the dependencies on the cirrus side from node 10 / wdio v5 to node 14 / wdio v7. This brings us a few years into the future. Didn't try the node 16/ wdio v8. One issue we will soon run into is that this test suite was written with synchronous wdio, but thats been discontinued starting in nodejs v16.
We should measure the gain. We already have a component that can deduplicate (AnalysisFilter) but we should test if this has any useful effect. Based on the change in index creation time we can decided if it should move forward.
Apr 20 2023
Progress so far:
Apr 10 2023
re-index is currently running, mappings look as expected for indices that have completed reindexing already. It's up to r, will probably take a few more days and we can ship the keyword in next weeks train with any luck.
Mar 31 2023
Mar 30 2023
Mar 27 2023
Mar 22 2023
Mar 20 2023
We talked a bit about this, the plan right now is to prevent the library we use from disabling the last available connection. That should allow the retries to work as we expect regardless of the error type. It seems this connection disabling is more suited for cases that have many instances in their pool, rather than a single DNS backed by LVS.
needs https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/304 to properly pass templated values to the submitted skein spec and let this run
Mar 17 2023
@bking has delt with these issues before, might have ideas
Mar 13 2023
confirmed the instance seems to be working, remaining updates are to be made in the data-engineering airflow-dags repo
Mar 9 2023
Mar 7 2023
Mar 6 2023
Took a closer look for the wikidata failure, but i've turned up nothing. With the output ending without any failure messages it suggests to me that the process died, perhaps a force kill or a segfault. I couldn't find anything in the system logs that correlate. Sadly syslog doesn't go back that far (oldest syslog entry is feb 27, this died on feb 22) but theres no certainty it would have had useful information.