User Details
- User Since
- Jun 9 2015, 9:03 AM (505 w, 2 d)
- Availability
- Available
- IRC Nick
- dcausse
- LDAP User
- DCausse
- MediaWiki User
- DCausse (WMF) [ Global Accounts ]
Yesterday
Wed, Feb 12
@Ladsgroup thanks for the quick fix! The scripts appear to have run properly this time, all alerts related to categories lag resolved today.
Tue, Feb 11
@Isaac awesome, thanks! I'll get something ready this week and report back here.
Please detach 'DCausse' from SUL, rename it to 'DCausse (WMF)', and reattach to SUL.
@brouberol sorry I forgot to link T385005 from here, plugins should be ready to be packaged in https://gerrit.wikimedia.org/r/c/operations/software/opensearch/plugins/+/1118553
stconvert has its own 1.3.x branch at https://gitlab.wikimedia.org/repos/search-platform/opensearch-analysis-stconvert/-/tree/1.3.x?ref_type=heads
@MolecularPilot have you considered using the intitle: search keyword for achieving this use-case? (c.f. https://www.mediawiki.org/wiki/Help:CirrusSearch#Intitle_and_incategory)
@Isaac we have so utilities to push weighted tags into the search index from a spark job, 43M is a lot and we should be careful not to slow down real-time indexing while doing so. I'll prepare something and update the weighted tags documentation so that we can easily re-use it for future work.
Unfortunately our system requires a bit more info than what you have in your CSV:
- namespace_id
- page_title
- page_id
Mon, Feb 10
Related to T270033 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/667358 specifically I think
Thu, Feb 6
Tue, Feb 4
Wed, Jan 29
Tue, Jan 28
I'm not sure we can safely change the schema_title of an existing stream so we might have to create separate streams and adapt the pipeline to consume from multiple update streams.
Mon, Jan 27
A dataset to fix missing suggestions has been imported and I confirm seeing 89k of them on eswiki.
The dataset has been imported and I confirm seeing 89k suggestion on eswiki.
page_title and page_id should already be part of the mediawiki.page_change.v1 events, IIRC the outlink model seems to have access to the whole event could this be using the same technique here to avoid fetching something additional via the mw-api?
Thu, Jan 23
Tue, Jan 21
Starting from MW 1.44.0-wmf.14 CirrusSearch should no longer push any metrics to graphite.
Mon, Jan 20
Fri, Jan 17
Wed, Jan 15
Jan 14 2025
Re-opening because the approach taken might not work if the duplicates are spread across the list of endpoints, please see T374021#10458231 for a possible solution.
Jan 13 2025
@Michael I think this is relatively stable now, since search is not owning the individual sources of tags I think it might be better to have more fine-grained alerts (per tag?) on your side if you want, on our side I might set a very broad alert to capture only obvious problems (i.e. no tags updated in the last hour) but it might not cover failures specific to a particular tag.
@Jdforrester-WMF ack, thanks.
@DMartin-WMF regarding inlanguage:en also matching Q1860 I'm not sure about the implementation details and it might be possible that CirrusSearch would have to do some lookups too (or have a map defined in its config) if it provided such feature. First I wanted to know if re-using inlanguage would be OK since the description explicitly asked for a new keyword haslang.
Jan 10 2025
we have this data in relforge so I don't think we'll need this in druid, but feel free to re-open if we believe it might still be useful.
this dataset is cleaned by airflow now
Jan 9 2025
Updated https://superset.wikimedia.org/superset/dashboard/530 with minimal info in the Fulltext Abandonment chart. If I understood the metric properly only ~1% of sessions get a virtual PV without a click, I don't know if this is enough to consider it as a meaningful interaction nor if could be considered as successful sessions.
It is not entirely obvious how to integrate this data in current charts of the dashboard, it might make sense to have a "Fulltext Engagement" section where we could graph the various ways user interacts with the SERP.
This ticket is describing a problem in the blazegraph's query planner but I believe that you are interested in the impact of the split on Wikibase-Quality-Constraints, please see T355298 for this.
Regarding this particularly query, as explained in Internal_Federation_Guide property paths have to be made explicit if one link is crossing multiple subgraphs. In this case, indeed, the link ?entity wdt:P31 ?class might cross subgraphs.
But for Quality-Constraints my understanding is that this query has evolved to only check the class hierarchy and is now more like:
ASK { ?classOfTheEntity wdt:P279* wd:Q386724. hint:Prior hint:gearing "forward". }
Where ?classOfTheEntity is injected from the PHP side (which is good, Wikibase-Quality-Constraints should not assume that the data of the entity it is checking is already available in the sparql backend).
Jan 8 2025
@MSantos I can certainly attempt to write a patch and continue the discussion in gerrit but perhaps before doing so I wanted to know if:
- is it on purpose that the fragment is omitted when building the Special:GoToInterwiki from Title#getFullUrlForRedirect()?
- if not would there be any objections in propagating the Title fragment when building the Special:GoToInterwiki link and if there are any special considerations to take into consideration doing so
Linking T379288 since we might explore security features too for the (upcoming) WMF internal opensearch cluster used for search.
The indirection via Special:GoToInterwiki was added in T122209: Special:Search allows redirects to any interwiki link.
I'm not clear if omitting the fragment was intentional or not, I believe that it might be possible to propagate it but I'm unclear on the consequences.
In the case of local interwiki I think the fragment will be propagated by the browser (if not specified in the Location Header): https://de.wikipedia.org/wiki/Special:GoToInterwiki/en:Wikipedia#History (works for me with firefox)
In the case of external interwiki I think we might have to adapt the splash page to reconstruct it in the link presented to the user (using via javascript?)
Special consideration might have to be made for external interwiki links that already declare a fragment, e.g. https://en.wikipedia.org/wiki/Special:GoToInterwiki/mixnmatch:collection where it might be impossible to propagate the user provided fragment.
Jan 7 2025
This the expected behavior, in some conditions the search syntax is allowed to escape the namespace filtering provided by other means in the UI or query parameters.
This happens for:
- the case you mention with a namespace followed by a colon
- the prefix keyword which also allows passing a namespace prefix
This is documented at https://www.mediawiki.org/wiki/Help:CirrusSearch#Prefix_and_namespace
Thanks for pointing this out, you were correct, mediawiki.page_outlink_topic_prediction_change.v1 is indeed the new stream being populated and used by the search update pipeline, I updated the doc with new links and stream names. I think that predictions can be run calling liftwing, from https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language_agnostic_link-based_article_topic:
curl https://api.wikimedia.org/service/lw/inference/v1/models/outlink-topic-model:predict -X POST -d '{"page_title": "Frida_Kahlo", "lang": "en", "threshold": 0.1}' -H "Content-type: application/json"
Should generate the predictions, but I believe that using pre-computed predictions from hive if possible is certainly better.
Jan 6 2025
The example mentions wikidata, one can exclude items referring to disambiguation pages by adding -haswbstatement:P31=Q4167410 in their search query: https://www.wikidata.org/w/api.php?action=query&list=search&srsearch=test+-haswbstatement%3AP31%3DQ4167410&format=json&formatversion=2 (excludes items that are instances of Wikimedia disambiguation page).
Indeed, https://grafana-rw.wikimedia.org/d/2DIjJ6_nk/cirrussearch-saneitizer-historical-fix-rate?orgId=1 shows a bump mid October and is now back to "normal", cause is unknown and might be hard to investigate now, boldly closing.
Dec 20 2024
Dec 19 2024
Dec 18 2024
Blocked by T382398
Dec 17 2024
@Lucas_Werkmeister_WMDE should be done, only one schema is now missing an english label (E15): https://test.wikidata.org/w/index.php?search=EntitySchema%3A-haslabel%3Aen&title=Special:Search&profile=default&fulltext=1 (from the UI it's not clear if it's empty or unset)