Page MenuHomePhabricator

MGerlach (Martin Gerlach)
Senior Research Scientist

Today

  • No visible events.

Tomorrow

  • No visible events.

Wednesday

  • No visible events.

User Details

User Since
Sep 9 2019, 9:50 AM (344 w, 6 d)
Availability
Available
IRC Nick
mgerlach
LDAP User
MGerlach
MediaWiki User
MGerlach (WMF) [ Global Accounts ]

Recent Activity

Fri, Mar 27

MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

  • We have made some progress on collecting the results for the two remaining experiments
  • We generated the embeddings and the search results of the semantic search model for 5 different languages (en, de, es, fr, id). We should have the evaluation metrics ready next week.
  • We have started running experiments with the multilingual-e5-large-instruct model which was used for the first prototype. We are currently working out how to run the other embedding models in our existing pipeline using spark-nlp. This is bringing some challenges but not yet a blocker. If necessary, we might need to adapt the set of embeddings models in the evaluation.
Fri, Mar 27, 4:46 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

  • we have concluded the first set of exploratory analysis
  • Next steps are:
    • mapping the maintenance templates to a set of the main policies and guidelines (for English Wikipedia)
    • running models to predict maintenance templates in untagged articles
Fri, Mar 27, 4:32 PM · Research (FY2025-26-Research-January-March)
MGerlach closed T414793: Implementation and dissemination of the reader research direction as Resolved.

Final update:

  • heavily revised the reader research direction based on feedback (latest version here)
    • mainly re-organized existing content to make it shorter/more compact and align better with objectives for the next FY
    • shared the latest version with Leila; will pick this up in Q4 based on feedback I receive and cover the work in a separate task
  • had more detailed discussion about two proposed projects for Q4 on understanding reader retention and a natural experiment for estimating the causal effect: new content leads to additional pageviews
    • my current understanding is that I will start working on those in Q4.
    • the work on these projects will be covered in separate tasks
Fri, Mar 27, 1:30 PM · Research (FY2025-26-Research-January-March)

Wed, Mar 25

MGerlach updated the task description for T419409: Get search results from semantic search using MIRACL benchmark dataset.
Wed, Mar 25, 12:26 PM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search

Tue, Mar 24

MGerlach renamed T421103: [wiki-nlp-tools] Sentence Tokenization: Keeping track of parentheses and quotations from Sentence Tokenization: Keeping track of parentheses and quotations to [wiki-nlp-tools] Sentence Tokenization: Keeping track of parentheses and quotations.
Tue, Mar 24, 2:07 PM · Research-engineering
MGerlach created T421113: [mwconstants] Page title normalization function.
Tue, Mar 24, 2:06 PM · Research-engineering
MGerlach created T421112: [mwconstants] Split image formatting options into types.
Tue, Mar 24, 2:06 PM · Research-engineering
MGerlach created T421110: [mwconstants] Expand functions to work for different Wikimedia projects.
Tue, Mar 24, 2:05 PM · Research-engineering
MGerlach created T421109: [wiki-nlp-tools] Sentence Tokenization: leading/trailing whitespace stripping.
Tue, Mar 24, 2:03 PM · Research-engineering
MGerlach created T421106: [wiki-nlp-tools] Sentence Tokenization: adapt sentence-split-logic to take into account right-to-left languages.
Tue, Mar 24, 2:03 PM · Research-engineering
MGerlach created T421103: [wiki-nlp-tools] Sentence Tokenization: Keeping track of parentheses and quotations.
Tue, Mar 24, 2:01 PM · Research-engineering
MGerlach created T421102: [wiki-nlp-tools] Packaging: incorporate mwconstants.
Tue, Mar 24, 2:00 PM · Research-engineering
MGerlach created T421100: [wiki-nlp-tools] Word tokenization: edge-cases for whitespace-delimited languages.
Tue, Mar 24, 2:00 PM · Research-engineering
MGerlach created T421097: [wiki-nlp-tools] Word Tokenization: treat numbers as punctuation.
Tue, Mar 24, 1:59 PM · Research-engineering
MGerlach created T421096: [wiki-nlp-tools] Evaluation: expand the existing testing modules.
Tue, Mar 24, 1:58 PM · Research-engineering
MGerlach created T421094: [wiki-nlp-tools] Tokenizer: update asset loading and initiations.
Tue, Mar 24, 1:57 PM · Research-engineering
MGerlach created T421093: [wiki-nlp-tools] Split sentences on newline characters?.
Tue, Mar 24, 1:56 PM · Research-engineering
MGerlach created T421090: Armenian has low sentence performance due to use of standard colon in Flores data.
Tue, Mar 24, 1:55 PM · Research-engineering
MGerlach created T421089: [wiki-nlp-tools] Expand non-whitespace-required sentence-ending punctuation list.
Tue, Mar 24, 1:54 PM · Research-engineering
MGerlach created T421088: [wiki-nlp-tools] Set the English abbreviation list as a global list.
Tue, Mar 24, 1:54 PM · Research-engineering
MGerlach updated the task description for T421084: [wiki-nlp-tools] Remove numbers from abbreviation list.
Tue, Mar 24, 1:52 PM · Research-engineering
MGerlach created T421084: [wiki-nlp-tools] Remove numbers from abbreviation list.
Tue, Mar 24, 1:52 PM · Research-engineering
MGerlach created T421079: [wiki-nlp-tools] Fix single letter words being treated as abbreviations.
Tue, Mar 24, 1:49 PM · Research-engineering
MGerlach created T421075: [wiki-nlp-tools] spurious leading whitespaces from sentencepiece in non-whitespace languages.
Tue, Mar 24, 1:47 PM · Research-engineering

Mar 20 2026

MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

  • We started running the experiments for semantic search with 3 different models and 5 different languages. For each case, we are generating the set of embeddings of all millions of passages. Once the embeddings are generated, we can obtain the top-10 results for each query and calculate the relevance metric. Hopefully, the data is ready early next week.
Mar 20 2026, 7:45 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

  • we made lots of progress on analyzing statistics of the dataset. for now this is 6 wikis during the exploratory phase, but we will scale this up to more (all) wikis
    • Number of templates
    • Number of articles with a template
    • Distribution of number of templates per article
    • Number of add/remove events
    • Number of currently added templates (backlog)
    • Average time to fix, Fraction fixed in 1st week
    • Time evolution of number of templates added to articles
    • Per template data: Most common templates across languages, Most/least used templates, Templates which take the longest/shortest from being added to being removed (i.e. time to fix)
  • Next step is to organize these results better (plots, tables)
  • We will start mapping templates to policies. This is a crucial piece for demonstrating the usefulness of this dataset to broader audiences:
    • this paper from wikiworkship presents a dataset for studying policy invocation and enactment. Templates (that are mapped to policies and guidelines) could provide a complementary approach as they can be considered implicit invocations.
    • this paper discusses the importance of data about Wikipedia policies for improving natural language understanding tasks of large language models.
Mar 20 2026, 7:32 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly update;

  • reached out/discussed with Rita, Maryana, and Suman about relevent research questions on readers. Started revising the reader direction incorporating the feedback.
  • got feedback about the project on understanding reader retention. This is a question that it relevant to folks in the readers teams. My recommendation would be to start by looking at existing data being collected in ongoing projects (attribution and comparative reader research) to get some initial insights. Based on the feedback, we can set up a dedicated experiment in test kitchen in the next FY, if needed.
Mar 20 2026, 7:23 PM · Research (FY2025-26-Research-January-March)

Mar 18 2026

MGerlach reassigned T419409: Get search results from semantic search using MIRACL benchmark dataset from Trokhymovych to dcausse.
Mar 18 2026, 10:37 AM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search
MGerlach updated the task description for T419409: Get search results from semantic search using MIRACL benchmark dataset.
Mar 18 2026, 10:36 AM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search

Mar 16 2026

MGerlach assigned T419409: Get search results from semantic search using MIRACL benchmark dataset to Trokhymovych.
Mar 16 2026, 1:45 PM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search
MGerlach updated the task description for T419397: Get search results for different embedding models from semantic search.
Mar 16 2026, 1:43 PM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search

Mar 13 2026

MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

Weekly update:

Mar 13 2026, 7:47 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly update:

  • went back to the original draft of the reader direction and am starting to revise it heavily based on the feedback I received. No major new items were brought up so far. Thus, I plan to re-organize and shorten the content into at most 5 major research directions. I will then try to get a new round of feedback.
  • I wrote up different options for a research project on better understanding retention of readers (googledoc). I shared this with Sherry and Hsuanwei to get feedback on what would be most useful for Product.
  • I wrote up a research plan for running an analysis using content translation as a natural experiment (googledoc). Shared this with Debra as a potential project for the reader growth bucket. waiting for feedback on prioritization.
Mar 13 2026, 4:54 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

  • obtained first results for summary statistics of templates in 6 languages. We will refine the analysis based on the paper On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia. Main extension will be to calculate statistics beyond English Wikipedia.
    • Number of templates
    • Usage: How often are they used, what are the most commonly used templates
    • Evolution: How is usage increasing/decreasing over time? How is the backlog increasing/decreasing? Compare against the total number of articles.
    • Time to remove templates
  • For this we will also want to aggregate templates by template types (e.g. verifiability vs style issues)
  • Discussed with several folks that one way to increase the utility of the dataset would be to match maintenance templates with the main content policies and guidelines.
Mar 13 2026, 4:47 PM · Research (FY2025-26-Research-January-March)

Mar 9 2026

MGerlach moved T419409: Get search results from semantic search using MIRACL benchmark dataset from FY2025-26-Research-January-March to Support Needed on the Research board.
Mar 9 2026, 11:16 AM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search
MGerlach created T419409: Get search results from semantic search using MIRACL benchmark dataset.
Mar 9 2026, 11:16 AM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search
MGerlach moved T419397: Get search results for different embedding models from semantic search from Backlog to Support Needed on the Research board.
Mar 9 2026, 10:39 AM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search
MGerlach created T419397: Get search results for different embedding models from semantic search.
Mar 9 2026, 10:39 AM · Discovery-Search (2026.04.06 - 2026.05.01), Research, Semantic Search

Mar 6 2026

MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly update:

  • met and discussed with Debra: the focus should be on identifying causal relationships that affect reader metrics. We think that the retention survey will be restricted to correlational insights. In addition, some of the ongoing work from others (attribution research and comparative reader research) will already provide some insights into retention. We would like to first assess how much these insights could serve questions around retention from product teams before investing more.
  • We also discussed an alternative: The hypothesis is that additional content being available to internet users will lead to additional traffic from external search engines. Content translations could serve as individual natural experiments where some group of internet users are able to access new content (e.g. when an article is translated into ptwiki it is now accessible to speakers of Portuguese) whereas some group of internet users is not able to access the new content (e.g. when an article is translated into ptwiki it does not make a difference for non-Portuguese speakers). These two groups can be considered treatment and control groups and we can do a diff-in-diff comparison of pageviews with search-engine referer between the two groups. I will sketch an outline of this analysis next week.
Mar 6 2026, 5:03 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

  • Collecting feedback from the first round of analysis for digging deeper.
  • We identified different models for testing different variations for the semantic search to assess whether observation from the first round are due to the specific underlying model (qwen-3-0.6B) or hold generally for semantic search:
  • We identified a multilingual benchmark dataset to i) test variation across languages; and ii) compare our current model's performance with results reported in literature
    • MIRACL is available in EN, DE, ES, FR, ID (but not in IT, NL, PT) from the languages that relevant for the current semantic search project
    • it only covers natural language questions (not from actual search-logs so not representative of WP queries), but our first round of results indicated that these are the queries where semantic search offers the most advantage to our current search.
Mar 6 2026, 4:52 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

  • manual check of 100 samples in 6 languages in this spreadsheet. Overall, the data looks reliable. Though we identified some issues in parsing, such as nested templates, multi-tag templates, and some false positives probably due to reverted edits. We plan to fix those in the next iteration.
  • We are starting the analysis of high-level metrics
  • We are starting parsing of the content of templates to map them to the corresponding policy or guideline (e.g. identifying links to Wikipedia namespace). The first approach is to capture all links from the page to the Wikipedia-namespace and then do a manual filtering.
Mar 6 2026, 4:39 PM · Research (FY2025-26-Research-January-March)

Feb 27 2026

MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly updates:

  • still figuring out feasibility of work to understand better retention of users. so far, we are considering two threads
    • retention survey: this would require some engineering work on T417185: Migrate QuickSurveys data collection to Test Kitchen. there are ongoing discussions around prioritization of this work
    • retention action: this work would try to identify which reader actions are correlated with high or low reader retention. This question might be possible to address with the data/analysis from the attribution research.
  • I started to look in more detail into traffic from external search engines over time. Specifically, I stratified by article topic (aggregating timeseries of all articles in English Wikipedia belonging to the same topic). While very exploratory, there are some interesting aspects: for some topics (such as biographies or films) traffic from external search engines has been stable or even increasing; while for other topics (most notably STEM) the traffic from external search engines has been decreasing. https://gitlab.wikimedia.org/mgerlach/external-traffic/-/blob/main/trend-external-traffic_timeseries-topics.ipynb?ref_type=heads
Feb 27 2026, 3:43 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

  • Refactored the dataset pipeline into 2 parts: i) getting the full (raw) dataset, ii) then applying the filtering step. With this, we can now also run the pipeline for enwiki. With this we are fairly confident that we can, in principle, run this on all wikis.
  • We built a dataset for an initial set of 6 languages (those were selected based on language familiarity to being able to manually check results): bn, de, en, hi, pt, simple. We created a smaller random subsample for manual investigation/verification.
  • We started to identify the main summary statistics to report a high-level overview of the dataset (number of templates, number of revisions, number of articles, time between adding/removing a template).
Feb 27 2026, 3:04 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update

  • Compiled results for the offline evaluation of semantic search in English Wikipedia using the benchmark dataset and comparing it with our current lexical search.
  • Specifically, we evaluate different search models on the new benchmark dataset. We consider Wikipedia search (lexical search) and semantic search (the current model for the MVP) with different variations (e.g. re-ranking results after retrieval). For each model, we get the top-10 search results for each query. We calculate different evaluation metrics to quantify the relevance of the search results using the pytrec_eval package: NDCG, Prevision, Recall, Binary preference (bpref). We evaluate the relevance of the retrieved results on the article and the paragraph level by comparing with the annotations in the benchmark dataset.
  • Results can be found in this doc: https://docs.google.com/document/d/1xgdzD0TFIqyAw45mf9uHjzdpeMauugMRefQBlEu8i6I/edit?tab=t.x3x7obtlsqmn
Feb 27 2026, 2:19 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach closed T417242: Get search results for queries from benchmark dataset for semantic search model, a subtask of T414795: Run evaluation of 2 or more search models using benchmark dataset, as Resolved.
Feb 27 2026, 2:15 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach closed T417242: Get search results for queries from benchmark dataset for semantic search model as Resolved.

@dcausse Thanks for generation this dataset.
We succesfully used this to run the offline evaluation with the benchmark dataset.

Feb 27 2026, 2:15 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research, Semantic Search

Feb 20 2026

MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly update:

  • from discussions in #talk-to-experiment-platform (thread), my understanding is that, in principle, it is possible to combine quicksurveys with test-kitchen. However, it seems that some engineering work still needs to be done before this is ready. Thus, we might not be able to start working on this immediately.
  • I started exploring other opportunities. One potential direction is to understand in more detail the traffic from external search; i.e. what are the factors that lead to more/less traffic from external search engines and potentially identifying quasi-causal relationships via natural experiments.
Feb 20 2026, 8:00 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

  • We refined the set of metrics for evaluation: nDCG@k, precision@k, recall@k, MAP@k, bpref@k for both paragraph and article level
  • We collected search results for the 600 queries of the benchmark dataset for Wikipedia search and the semantic search MVP (qwen-model) with different variations (adding re-ranker, additional context) T417242#11636952
  • Next step: calculating metrics
Feb 20 2026, 7:55 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly updates

  • Extracted full dataset for 4 languages: simple, de, bn, hi. en is still pending as we need to figure out settings for resource allocation to avoid memory issues.
  • We are looking manually through a small subset of samples to spot-check any processing issues. One potential issue we have identified is that for some edits, one template is removed and at the same time another one (or more) is added. This might indicate that the former issue is not resolved but rather the latter templates provide a more specific characterization of the issue.
  • Next step: Starting analysis of basic summary stats (e.g. number of templates, affected articles) over time.
Feb 20 2026, 7:43 PM · Research (FY2025-26-Research-January-March)

Feb 13 2026

MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

Feb 13 2026, 2:23 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

Feb 13 2026, 2:20 PM · Research (FY2025-26-Research-January-March)
MGerlach closed T406207: Create a dataset for evaluation of search on Wikipedia as Resolved.

Final update:

  • We collected the final dataset
  • Documentation is available in this google doc (for now internal only)
Feb 13 2026, 2:15 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search

Feb 12 2026

MGerlach moved T417242: Get search results for queries from benchmark dataset for semantic search model from Backlog to Support Needed on the Research board.
Feb 12 2026, 8:33 AM · Discovery-Search (2026.02.02 - 2026.02.27), Research, Semantic Search
MGerlach created T417242: Get search results for queries from benchmark dataset for semantic search model.
Feb 12 2026, 8:32 AM · Discovery-Search (2026.02.02 - 2026.02.27), Research, Semantic Search

Feb 6 2026

MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly update:

  • Started to focus in more detail on one project around understand better retention.
  • One potential approach would be to combine a reader survey via quicksurvey with measurement of retention in test kitchen. First feedback I got was that this is likely possible but my goal next week is to spend some time to figure out in more detail what are the technical options/limitations.
Feb 6 2026, 2:27 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

  • implemented several refinements for the processing pipeline: resolving redirect names of templates, marking edits with specific template that was added/removed
  • next step is to expand the pipeline to include all maintenance templates from a single wiki
Feb 6 2026, 2:17 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

  • confirmed with Jazmin that this should be captured as a hypothesis under WE3.10
  • as we have collected the search result relevance annotation, I am starting to think about the best approach to evaluation.
    • metric: likely, we will use nDCG@10 as this is the main metrics in some of the retrieval benchmarks such as MTEB https://arxiv.org/pdf/2210.07316
    • coordinating with Search to make sure our approach is meaningful
Feb 6 2026, 2:07 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly update:

  • we ran the relevance annotation for the full dataset of 600 queries.
  • will spend another week on cleaning the dataset and putting together documentation before closing
Feb 6 2026, 10:14 AM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T409561: Annotate relevance of search results for sample queries, a subtask of T406207: Create a dataset for evaluation of search on Wikipedia, as Resolved.
Feb 6 2026, 10:13 AM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T409561: Annotate relevance of search results for sample queries as Resolved.

Final update:

Feb 6 2026, 10:13 AM · Research (FY2025-26-Research-January-March)
MGerlach updated the task description for T406207: Create a dataset for evaluation of search on Wikipedia.
Feb 6 2026, 10:10 AM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search

Jan 29 2026

MGerlach added a comment to T414793: Implementation and dissemination of the reader research direction.

weekly update

  • Had a meeting with Miriam and Debra, we reached consensus on a shortlist of potential research projects on readers
  • We identified one top contender around understanding factors that affect retention of readers. I will try to sketch an outline for how this project could look like.
  • Set up follow-up meeting next week to identify/discuss one other focus area
Jan 29 2026, 4:11 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly update

  • We collected candidate results for the 600 final selected queries
  • We are freezing/storing the corpus containing all paragraphs from all enwiki articles using the 20260125 snapshot
Jan 29 2026, 1:28 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach updated the task description for T406207: Create a dataset for evaluation of search on Wikipedia.
Jan 29 2026, 1:26 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T409559: Collect candidate search results for set of sample queries, a subtask of T406207: Create a dataset for evaluation of search on Wikipedia, as Resolved.
Jan 29 2026, 1:26 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T409559: Collect candidate search results for set of sample queries as Resolved.

final update - task is completed

  • We collect top-10 article of results from Wikipedia search and an external search, respectively.
  • We identify the top-10 paragraphs from the selected articles (with at most 2 paragraphs from the same article)
  • We collected candidate results for the 600 final selected queries
Jan 29 2026, 1:26 PM · Research (FY2025-26-Research-January-March)
MGerlach updated the task description for T409559: Collect candidate search results for set of sample queries.
Jan 29 2026, 1:23 PM · Research (FY2025-26-Research-January-March)

Jan 28 2026

MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly updates:

  • no updates this week
Jan 28 2026, 8:17 PM · Research (FY2025-26-Research-January-March)

Jan 16 2026

MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly update:

Jan 16 2026, 7:36 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach added a comment to T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).

weekly update:

Jan 16 2026, 7:07 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T414795: Run evaluation of 2 or more search models using benchmark dataset.

weekly update:

Jan 16 2026, 7:04 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach updated subscribers of T414793: Implementation and dissemination of the reader research direction.

weekly update:

  • set up a meeting with @Miriam and @DKumar-WMF to have a first round of discussion on prioritization
Jan 16 2026, 7:03 PM · Research (FY2025-26-Research-January-March)
MGerlach created T414797: Maintenance templates: Generate dataset and analysis/modeling (first round).
Jan 16 2026, 12:26 PM · Research (FY2025-26-Research-January-March)
MGerlach updated subscribers of T414795: Run evaluation of 2 or more search models using benchmark dataset.
Jan 16 2026, 12:19 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach created T414795: Run evaluation of 2 or more search models using benchmark dataset.
Jan 16 2026, 12:18 PM · Semantic Search, Research (FY2025-26-Research-January-March)
MGerlach updated the task description for T406207: Create a dataset for evaluation of search on Wikipedia.
Jan 16 2026, 12:11 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T408121: Collect a set of representative queries for the benchmark dataset, a subtask of T406207: Create a dataset for evaluation of search on Wikipedia, as Resolved.
Jan 16 2026, 12:11 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T408121: Collect a set of representative queries for the benchmark dataset as Resolved.

final update:

Jan 16 2026, 12:11 PM · Research (FY2025-26-Research-January-March)
MGerlach closed T400030: Draft first version of research direction on readers as Resolved.

final update

Jan 16 2026, 12:04 PM · Research (FY2025-26-Research-October-December)
MGerlach created T414793: Implementation and dissemination of the reader research direction.
Jan 16 2026, 12:01 PM · Research (FY2025-26-Research-January-March)

Jan 9 2026

MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly updates:

  • We collected a set of queries with applied manual filtering (sheet)
  • Submitted a request via L3SC for reviewing Terms of Services of 3rd party search platforms for generation of candidate results (asana ticket)
  • We are updating the processing pipeline to extract paragraphs from all articles from wikitext to Enterprise' structured content snapshots as this provides cleaner representation of the article text.
Jan 9 2026, 7:10 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach updated the task description for T400030: Draft first version of research direction on readers.
Jan 9 2026, 6:54 PM · Research (FY2025-26-Research-October-December)
MGerlach updated subscribers of T400030: Draft first version of research direction on readers.

weekly update:

  • did an iteration to address comments from previous round of feedback.
  • shared also with @leila for high-level feedback. based on that I will decide upon next steps.
Jan 9 2026, 6:54 PM · Research (FY2025-26-Research-October-December)

Dec 19 2025

MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly updates:

  • Overall, we are fully on track to get a search result dataset. Before running a smaller test pilot study we need to make minor tweaks to the query filtering.
  • Collect a set of representative queries in WP search:
    • We implemented a filter for the frequency of queries such that analysis is considered high-level (>=25 users). For this, we also needed to optimize the processing pipeline so that we can consider queries from all 3 months that are available in the logs.
    • We are iterating to improve our query filtering to remove, e.g., navigational queries. One example is to make sure we remove queries that exactly match a page title and including all potential redirects
    • We are adding an additional bucket for queries that are formulated as questions. Even when considering long queries (8+ terms), few are actually in the form of natural language questions. However, we want to capture those in our dataset as well even if they are currently rare in our logs.
Dec 19 2025, 6:19 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T293030: [EPIC] Specify new task for Linking articles as a structured tasks as Resolved.

Closing as currently no work that is planned or ongoing.

Dec 19 2025, 5:11 PM · Research, Epic
MGerlach moved T406207: Create a dataset for evaluation of search on Wikipedia from FY2025-26-Research-October-December to FY2025-26-Research-January-March on the Research board.
Dec 19 2025, 5:09 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach moved T408121: Collect a set of representative queries for the benchmark dataset from FY2025-26-Research-October-December to FY2025-26-Research-January-March on the Research board.
Dec 19 2025, 5:09 PM · Research (FY2025-26-Research-January-March)
MGerlach moved T409559: Collect candidate search results for set of sample queries from FY2025-26-Research-October-December to FY2025-26-Research-January-March on the Research board.
Dec 19 2025, 5:08 PM · Research (FY2025-26-Research-January-March)
MGerlach moved T409561: Annotate relevance of search results for sample queries from FY2025-26-Research-October-December to FY2025-26-Research-January-March on the Research board.
Dec 19 2025, 5:08 PM · Research (FY2025-26-Research-January-March)
MGerlach added a comment to T400030: Draft first version of research direction on readers.

weekly update

  • shared with team and received lots of feedback from @Miriam. I will try to do an iteration and address these comments in the next days.
Dec 19 2025, 5:03 PM · Research (FY2025-26-Research-October-December)
MGerlach closed T406203: Start formal collaboration on understanding the use of maintenance templates, a subtask of T408523: [EPIC] Understanding the use of maintenance template, as Resolved.
Dec 19 2025, 4:38 PM · Epic, Research
MGerlach closed T406203: Start formal collaboration on understanding the use of maintenance templates as Resolved.

weekly update

  • Closing this task as completed
  • formal collaboration has been started and collaborators are fully onboaded
  • we started to generate the dataset of template usage but this will require some additional iterations. This work will be captured in a separate task together with a first exploratory analysis of the dataset.
Dec 19 2025, 4:38 PM · Research (FY2025-26-Research-October-December)
MGerlach assigned T409561: Annotate relevance of search results for sample queries to Trokhymovych.
Dec 19 2025, 4:35 PM · Research (FY2025-26-Research-January-March)

Nov 28 2025

MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly update:

  • Collect a set of representative queries in WP search:
    • Conducted privacy check-in about publishing set of queries. As a one-off dataset for English Wikipedia this was approved.
    • We will implement an additional filter for the frequency of queries such that analysis is considered high-level (>=25 users)
  • Collecting candidate search results:
    • Decided and implemented scheme for selecting top-5 paragraphs as candidate search results
  • Using annotation tool:
    • Requested a privacy survey statement for conducting the data annotation via prolific
    • We set up a test-study with synthetic data in the prolific AI task builder to finalize UI of the annotation
Nov 28 2025, 4:58 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach added a comment to T406203: Start formal collaboration on understanding the use of maintenance templates.

weekly update

  • starting data collection of revisions where maintenance templates are added or removed
Nov 28 2025, 4:39 PM · Research (FY2025-26-Research-October-December)
MGerlach added a comment to T400030: Draft first version of research direction on readers.

weekly update

  • incorporated feedback from Debra, Mike, and Yu-Ming
  • finalized new revised version available in this doc (internal)
Nov 28 2025, 12:21 PM · Research (FY2025-26-Research-October-December)

Nov 21 2025

MGerlach added a comment to T406207: Create a dataset for evaluation of search on Wikipedia.

weekly update:

  • We are continuing the make progress on setting up the full pipeline for the dataset generation.
  • Collect a set of representative queries in WP search:
    • This is completed from a technical side. We have a pipeline to extract a set of representative queries
    • We are waiting for the feedback from the privacy consultation about if and how we can store and publish the selected queries for annotation
  • Collecting candidate search results:
    • We are testing different options to select the most relevant paragraphs from a set of search results obtained from, e.g., Wikipedia search, to present as candidate search results to be annotated. This is important to avoid selection bias by missing potential relevant paragraphs as they will be implicitly marked as irrelevant since they will not be available for annotation.
  • Using annotation tool:
    • We are testing the study setup in prolific by using mock-up data (not from the actual query).
    • In order to conduct the actual study I am requesting a survey privacy statement. Once I have the details figured out (e.g. retention time and publication) I will submit the request, probably early next week.
    • I confirmed that we have available budget in the team to run the study on prolific. I am figuring out the details about the process of how to request/spend the budget correctly.
Nov 21 2025, 6:33 PM · Discovery-Search (2026.02.02 - 2026.02.27), Research (FY2025-26-Research-January-March), Semantic Search
MGerlach closed T410389: Request kerberos identity for AnkitaM, a subtask of T406203: Start formal collaboration on understanding the use of maintenance templates, as Resolved.
Nov 21 2025, 4:08 PM · Research (FY2025-26-Research-October-December)
MGerlach closed T410389: Request kerberos identity for AnkitaM as Resolved.

@BTullis Thank you.

Nov 21 2025, 4:08 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering
MGerlach added a comment to T406203: Start formal collaboration on understanding the use of maintenance templates.

weekly update:

  • collaborators can now access stat-machines
  • only blocker is kerberos access in order to use hive tables in spark T410389: Request kerberos identity for AnkitaM resolved
  • next step is to start collecting the dataset of templates being added/removed
Nov 21 2025, 1:55 PM · Research (FY2025-26-Research-October-December)