User Details
- User Since
- Oct 7 2014, 4:49 PM (608 w, 4 d)
- Availability
- Available
- LDAP User
- EBernhardson
- MediaWiki User
- EBernhardson (WMF) [ Global Accounts ]
Thu, Jun 4
Wed, Jun 3
Tue, Jun 2
Fri, May 29
First round of defining some user stories to guide potential implementation:
Thu, May 28
The test is rolled out, but the fix only applies to wmf.4. When analyzing the results we must ignore all events prior to May 29th.
Wed, May 27
Tue, May 26
Thu, May 21
Next steps:
- Release a new version of the search-extra plugin for 2.x
- Update the plugins .deb
- Update docker-registry.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image
Tue, May 19
Mon, May 18
The AB test shows a significant fraction (~40% of enwiki sessions) receiving multiple AB test buckets. Verified by joining search id's from AB testing against our backend logging, indeed we used different buckets on different requests. We have all the input data to bucketing in the logs except the x-forwarded-for header. Patch above is scheduled for deployment today and should get us enough data to understand where things went wrong.
Wed, May 13
Tue, May 12
Mon, May 11
Went down a few wrong paths, but should have a fix ready for this today or tomorrow. A quick summary:
May 7 2026
Graph for (abc|def)xyz
May 6 2026
May 5 2026
>>! In T424820#11888501, @HakanIST wrote:
I looked into this locally and it seems like null-ngram transitions from wildcard ranges are consuming the maxTransitions budget (max_ngrams_extracted=100), which may cause trigrams like wev from DFA fallback paths instead of valid ones. I tested a small change in NGramAutomaton.traceRemainingStates() that only counts trigram-producing transitions toward the limit, and it appears to fix the issue while passing all existing tests.
May 4 2026
Research has decided to switch to a different model, we can reopen a new task once we have a full handle on the specific models we will use in the future.
Apr 29 2026
I started looking over this, unfortunately it's a bit complex. Something appears to be going awry during regexp query acceleration. We essentially rewrite something like Clover into a boolean trigram expression like (clo AND lov AND ove AND ver), but also handling regexp syntax. This tends to generate much longer expressions than necessary, so there is also a step that simplifies the boolean expression. Plugging this query into our test suite it appears we are generating some non-sensical trigrams from the .* portion, and then the simplify stage happens to be taking those. I haven't pinned it down exactly yet, but i suspect the query variants that work are due to the simplification stage choosing different trigrams.
Apr 28 2026
After going through all the pages, I don't have all the search results so I need to do multiple runs though the pages to get all the search results
Apr 27 2026
This got backburnered, but on review it looks like everything necessary is inplace.
Apr 23 2026
Apr 21 2026
We talked about this at our wednesday meeting and decided it's going to be a mid-sized investment to get this working. We need to pick a new unique id (plausibly log_id, but needs verification), then we would need to migrate to the new ids. We've never done a migration of doc ids so while we have some ideas, it will need further exploration and evaluation to determine how that change can be done in production without disabling archive search while the change is in progress.
Apr 20 2026
I don't believe I've used the event_santized tables either. We do use some of the data beyond 90 days, but that's in a separate rollup table. It should be safe, afaik, to drop searchsatisfaction from the event_sanitized database.
Apr 2 2026
Apr 1 2026
By chance are you using postgresql? SearchHighlighter::highlightSimple is documented as using the result of SearchDatabase::regexTerm. That looks to be applied in sqlite and mysql, but i suspect it is not being applied in the postgresql context.
The patch is not 100% related, but also addresses this issue as part of updating the messages posted to gerrit.
Mar 31 2026
For Integrated Technology Group(ITG) the problem looks to be that log_page is 0, but we use log_page as the unique id of the page. There is a relevant ar_page_id in the archive table, but for reasons i don't remember the archive indexing works off the logging table, not off the archive table. These particular rows are from 2014, querying enwiki shows there are 0 delete logs since jan 1 2026 with log_type='delete' and log_action='delete' and log_page = 0, making me suspect this is a historical artifact. We could potentially change ForceSearchIndex to recognize log_page = 0 and try and look it up in the archive table.
Mar 30 2026
One change the integration test suite found:
This could have a better error message, what happens is no junit logs were created which is where the count comes from, but the pass/fail comes from the return code of running the tests. We could at least have a better message. The 0 failures seems to happen when docker gets wedged and refuses to bring up new containers.
Mar 26 2026
Mar 23 2026
It looks like the search components are doing as expected, the issue is in an external add-on. The issue will need to be addressed in that add on.