We talked about this at our wednesday meeting and decided it's going to be a mid-sized investment to get this working. We need to pick a new unique id (plausibly log_id, but needs verification), then we would need to migrate to the new ids. We've never done a migration of doc ids so while we have some ideas, it will need further exploration and evaluation to determine how that change can be done in production without disabling archive search while the change is in progress.
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Yesterday
In T420239#11787578, @MidnightLG wrote:Thanks for the fix!
Will it be backported to 1.45.x and released sooner than 1.46?
Mon, Apr 20
I don't believe I've used the event_santized tables either. We do use some of the data beyond 90 days, but that's in a separate rollup table. It should be safe, afaik, to drop searchsatisfaction from the event_sanitized database.
Thu, Apr 2
Wed, Apr 1
By chance are you using postgresql? SearchHighlighter::highlightSimple is documented as using the result of SearchDatabase::regexTerm. That looks to be applied in sqlite and mysql, but i suspect it is not being applied in the postgresql context.
The patch is not 100% related, but also addresses this issue as part of updating the messages posted to gerrit.
Tue, Mar 31
For Integrated Technology Group(ITG) the problem looks to be that log_page is 0, but we use log_page as the unique id of the page. There is a relevant ar_page_id in the archive table, but for reasons i don't remember the archive indexing works off the logging table, not off the archive table. These particular rows are from 2014, querying enwiki shows there are 0 delete logs since jan 1 2026 with log_type='delete' and log_action='delete' and log_page = 0, making me suspect this is a historical artifact. We could potentially change ForceSearchIndex to recognize log_page = 0 and try and look it up in the archive table.
Mon, Mar 30
One change the integration test suite found:
This could have a better error message, what happens is no junit logs were created which is where the count comes from, but the pass/fail comes from the return code of running the tests. We could at least have a better message. The 0 failures seems to happen when docker gets wedged and refuses to bring up new containers.
Thu, Mar 26
Mon, Mar 23
It looks like the search components are doing as expected, the issue is in an external add-on. The issue will need to be addressed in that add on.
Mar 20 2026
Mar 18 2026
Mar 17 2026
Problem was traced down to null timestamps coming out of query_clicks_hourly. This was due to an overly specific format specifier and the source data adding millisecond precision to the timestamp. Timestamp conversion was changed to a more permissive conversion. The last three months of query_clicks_hourly and query_clicks_daily were backfilled. mjolnir dag was unpaused and completed a run.
Mar 16 2026
Mar 13 2026
Mar 12 2026
Mar 11 2026
Mar 10 2026
Had a bit of time to start looking into this, some findings:
Checked with some people that have admin, it is still returning different results for each. The cased query returns 4 results, one of the results found is the lower cased variant. The uncased query returns 2 results, one cased and one uncased. Curiously of the two results that go missing one is an exact match other than casing.
I poked through the code and tested this localy, I wasn't able to reproduce. Potentially the problem has been fixed over the last number of years, or maybe it requires more specific conditions to be triggered. I don't have rights to test on enwiki directly, would appreciate if someone could re-verify the links in the ticket.
Mar 9 2026
index/memory ratio has been a bit vague, to be more concrete:
Thanks for the report! It looks like your fix should do the trick, i've put it up into gerrit for review.
This has expanded a bit, it now also handles roles and role groups, since we need to set permissions such that the incoming requests can execute msearch and load models.
@Reedy It looks like in 1.42 the vendor/ directory was included, but in 43-45 it was not. This feels like a change in the way mediawiki is packaged, but i didn't find anything in the release notes. Any ideas?
Mar 6 2026
There are probably still a number of rough edges, but this is generally working now: frwiki example
Mar 5 2026
Potentially related: T418976. Not certainly, but that ticket involved changes to the helm bits that serve qwen3 and had deployments today.
I put together a self-contained .html page that will request the querys that are executed for two different deepcat queries and report on the differences in categories that will be included/excluded:
If we adjust the second query to exclude English-language SVG maps of the world instead of English-language SVG maps we get matching result counts of 1086 for both:
I finally had a chance to dig into this one. As far as i can tell, English-language SVG maps is not excluded in the first query, but is explicitly added as an exclusion in the second query. So the result descripency is likely to be due to this addition.
Mar 4 2026
I was thinking of access as not solving the current issue, as we have a plan forward for that, but as more of addressing possibilities on a longer-term basis. It seems like once or twice a year I run into something that would go easier if I had more access. I see from the puppet data.yaml file that we have a couple, but very few, engineers with ops access. This isn't the first time the question of ops level access has come up, but in the past I've pushed off requesting access as it seemed not strictly necessary. It's still not strictly necessary, but I'm leaning towards this easing some of the work I do. The full solutions, like the readahead support being setup now, would still be the end-state we would be looking for, but the additional access would better allow figuring out where these things need to be before the full solution is ready to be deployed.
Mar 3 2026
Mar 2 2026
First tests with the full frwiki semantic search dataset showed high latency and significant ceph IO at ~4GB/sec. This appears to be a problem with readahead on the ceph-backed storage system. It defaults to 8MB which is far too much for the random-access nature of knn search.