Page MenuHomePhabricator

OKarakaya-WMF
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Apr 1 2025, 7:13 AM (45 w, 21 h)
Availability
Available
LDAP User
Ozge
MediaWiki User
OKarakaya-WMF [ Global Accounts ]

Recent Activity

Mon, Feb 9

OKarakaya-WMF renamed T416869: Edit Suggestions - Dataset for hackathon from Dataset for hackathon to Edit Suggestions - Dataset for hackathon.
Mon, Feb 9, 12:20 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF added a comment to T416869: Edit Suggestions - Dataset for hackathon.

Dataset for hackathon is created.
Please see the following notebook for a sample usage:
https://gitlab.wikimedia.org/repos/machine-learning/exploratory-notebook/-/blob/edit_suggestions_dataset/edit_suggestions/generate_suggestions.ipynb?ref_type=heads

Mon, Feb 9, 12:19 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF moved T416869: Edit Suggestions - Dataset for hackathon from Unsorted to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Mon, Feb 9, 12:19 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF created T416869: Edit Suggestions - Dataset for hackathon.
Mon, Feb 9, 12:18 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF moved T415878: Addalink: Impact analysis of the recommendations. from In Progress to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Mon, Feb 9, 12:16 PM · Machine-Learning-Team
OKarakaya-WMF added a comment to T415878: Addalink: Impact analysis of the recommendations..
  • enwiki revert rates are around 1%. We need to consider this when we make a conclusion by using revert data.
  • There is no meaningful correlation between the accept/reject actions and how many links there are to the target article. (pearson: -0.01 )
  • There is no meaningful correlation between the accept/reject actions and the location of the recommendation.
    • section level: -0.01
    • paragraph level: -0.01
    • section level percentage: 0.01
    • paragraph level: percentage -0.005
  • There is no meaningful correlation between the accept/reject actions and probability scores.(pearson: 0.11 although higher than the scores above.)
  • There is no meaningful correlation between the accept/reject actions and the similarity between the source and the target page (0.2 although this is the highest corr we have found so far.)
Mon, Feb 9, 12:10 PM · Machine-Learning-Team

Thu, Jan 29

OKarakaya-WMF added a comment to T415878: Addalink: Impact analysis of the recommendations..

enwiki revert rates and counts:

Thu, Jan 29, 12:18 PM · Machine-Learning-Team
OKarakaya-WMF added a comment to T415878: Addalink: Impact analysis of the recommendations..

Sharing some initial results:

Thu, Jan 29, 10:29 AM · Machine-Learning-Team
OKarakaya-WMF created T415878: Addalink: Impact analysis of the recommendations..
Thu, Jan 29, 9:38 AM · Machine-Learning-Team
OKarakaya-WMF moved T414297: Add a Link: Remove Country and Continent names in suggestions from In Progress to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Thu, Jan 29, 9:33 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

Deployment completed. I've checked some of the wikis and they work fine.

Thu, Jan 29, 9:30 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Tue, Jan 27

OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

Following wikis are deployed to prod and the others are in the queue.
I see large wikis (e.g. dewiki) take long (~4 hours) and small wikis (e.g. hiwiki) get deployed quickly (~10 minutes)
In overall, I think it still makes sense to deploy them sequentially not to add too much load to maria db.

Tue, Jan 27, 10:35 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Mon, Jan 26

OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

final list of wikis to update with the release date today (26/01/2026)

Mon, Jan 26, 11:37 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

training and staging deployments are completed.
Following wikis are below the release threshold. I'll remove them from the deployment and update rest of the wikis.

Mon, Jan 26, 7:54 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Thu, Jan 22

OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

dinwiki has failed it does not have enough data for training and it's one of smallest wikis.

Thu, Jan 22, 8:46 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Tue, Jan 20

OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

Looking into the models that we need to update based on:
Frontend enabled models: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/ext-GrowthExperiments.php
Wikis supported by v2: https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/

Tue, Jan 20, 3:11 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

zhwiki new model checksum.
c4950228598e64c08ae817df316f2f3127d93df27dfbcddfadd5f2550586bdff zhwiki.linkmodel.json

Tue, Jan 20, 8:06 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF moved T414297: Add a Link: Remove Country and Continent names in suggestions from Unsorted to In Progress on the Machine-Learning-Team board.
Tue, Jan 20, 7:59 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Mon, Jan 19

OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

zhwiki v2 model checksum:

Mon, Jan 19, 1:31 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

I've trained a model without countries and continents for zhwiki.
We get similar f1 scores. I'll proceed with deploying it.

Mon, Jan 19, 12:11 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

https://gitlab.wikimedia.org/repos/machine-learning/ml-pipelines/-/merge_requests/90

Mon, Jan 19, 8:22 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF claimed T414297: Add a Link: Remove Country and Continent names in suggestions.
Mon, Jan 19, 8:04 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Fri, Jan 16

OKarakaya-WMF updated subscribers of T414297: Add a Link: Remove Country and Continent names in suggestions.
Fri, Jan 16, 12:20 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

Actually, I had an idea to stop recommending popular links.
So if there are already too many links to a page e.g. a page in 99 percentile . We can stop recommending it.

Fri, Jan 16, 12:18 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

hi @KStoller-WMF ,
crystal clear, thank you!

Fri, Jan 16, 12:11 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team
OKarakaya-WMF added a project to T414297: Add a Link: Remove Country and Continent names in suggestions: Machine-Learning-Team.
Fri, Jan 16, 12:10 PM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Wed, Jan 14

OKarakaya-WMF added a comment to T414297: Add a Link: Remove Country and Continent names in suggestions.

Do we need this change for all wikis or only for zhwiki?
In other words, do we want to change it for all wikis but more urgently for zhwiki?

Wed, Jan 14, 11:33 AM · Machine-Learning-Team, Patch-For-Review, Add-Link-Structured-Task, Growth-Team

Tue, Jan 13

OKarakaya-WMF added a comment to T409863: Q2 FY2025-26 Goal: Generate a list of edit suggestions using machine learning.

models to pick
We currently discuss it here:
https://wikimedia.slack.com/archives/G01A0FNPLG4/p1768310436209719

Tue, Jan 13, 1:23 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF claimed T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Tue, Jan 13, 8:15 AM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF moved T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production from Current Quarter Goals to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Tue, Jan 13, 8:15 AM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Tue, Jan 13, 8:14 AM · OKR-Work, Goal, Machine-Learning-Team

Jan 9 2026

OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.

Reporting 09/01/2026
Progress update on the hypothesis for the week, including if something has shipped:

Jan 9 2026, 12:36 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Jan 9 2026, 12:34 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF moved T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP from In Progress to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Jan 9 2026, 11:52 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Thank you again @dcausse ,

Jan 9 2026, 11:52 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T409863: Q2 FY2025-26 Goal: Generate a list of edit suggestions using machine learning.

Dataset for hackathon is created.
Please see the following notebook for a sample usage:
https://gitlab.wikimedia.org/repos/machine-learning/exploratory-notebook/-/blob/edit_suggestions_dataset/edit_suggestions/generate_suggestions.ipynb?ref_type=heads

Jan 9 2026, 11:41 AM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF moved T413854: errors in revscoring-editquality-goodfaith from In Progress to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Jan 9 2026, 11:36 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF added a comment to T413854: errors in revscoring-editquality-goodfaith.

Closing this task as we can follow it up with the previous one:

Jan 9 2026, 11:36 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF claimed T413854: errors in revscoring-editquality-goodfaith.
Jan 9 2026, 11:33 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Jan 9 2026, 9:12 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Jan 8 2026

OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

I agree! thank you @dcausse

Jan 8 2026, 3:24 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated subscribers of T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

when median length of the query: 77 chars + the prompt (108 chars):

  • max latency: 290ms.
  • 99.9 percentile latency: 280ms.
  • median latency: 34ms
Jan 8 2026, 1:05 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

@dcausse , cool.
I'll update the service and the performance tests accordingly.

Jan 8 2026, 10:12 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

We get better results on prod.

Jan 8 2026, 9:26 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Do we plan to query the api on prod with the following prompt?
We set max length to 300 chars. So that, if the query text length is higher than 300 chars, only the first 300 chars will be used.
We can increase it if we expect longer text.
Following prompt is ~90 chars.

Jan 8 2026, 9:21 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated Other Assignee for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP, added: kevinbazira.
Jan 8 2026, 9:01 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Jan 6 2026

OKarakaya-WMF added a comment to T413854: errors in revscoring-editquality-goodfaith.

I think we have similar behavior in good faith as well.
We have calls with the same rev_id for the successful calls before errors.
Indeed, caching should reduce the load on the mwapi.

Jan 6 2026, 11:07 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF added a comment to T413854: errors in revscoring-editquality-goodfaith.

thank you @kevinbazira
Great! I was looking for this task

Jan 6 2026, 11:02 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF updated subscribers of T413854: errors in revscoring-editquality-goodfaith.

Hey @gkyziridis @kevinbazira
I remember we were discussing about timeout errors in mwapi but I could not find the related task.
Do you know if we got similar timeout error from mediawiki api before in another task?
Thank you!

Jan 6 2026, 10:09 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF added a comment to T413854: errors in revscoring-editquality-goodfaith.

Looking into the errors, they are mostly due to timeouts during the mwapi calls.
The timeout is 5 seconds.

Jan 6 2026, 9:26 AM · Essential-Work, Machine-Learning-Team
OKarakaya-WMF created T413854: errors in revscoring-editquality-goodfaith.
Jan 6 2026, 9:19 AM · Essential-Work, Machine-Learning-Team

Dec 22 2025

OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

(venv) ozge@stat1010:~/repos/wiki/gerrit/inference-services/test/locust$ MODEL=embeddings locust
Min length: 250, Max length: 350

question_length

count 65.000000
mean 301.353846
std 28.316532
min 250.000000
25% 283.000000
50% 303.000000
75% 324.000000
max 348.000000
[2025-12-22 13:49:40,609] stat1010/INFO/locust.main: Run time limit set to 120 seconds
[2025-12-22 13:49:40,609] stat1010/INFO/locust.main: Starting Locust 2.31.5
[2025-12-22 13:49:40,610] stat1010/INFO/locust.runners: Ramping to 2 users at a rate of 10.00 per second
[2025-12-22 13:49:40,610] stat1010/INFO/locust.runners: All users spawned: {"Embeddings": 2} (2 total users)
[2025-12-22 13:51:40,147] stat1010/INFO/locust.main: --run-time limit reached, shutting down
Load test results are within the threshold
[2025-12-22 13:51:40,224] stat1010/INFO/locust.main: Shutting down (exit code 0)
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST https://inference-staging.svc.codfw.wmnet:30443/v1/models/qwen3-embedding:predict 1902 0(0.00%) | 74 67 292 70 | 15.90 0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------

Aggregated                                                                      1902     0(0.00%) |     74      67     292     70 |   15.90        0.00
Dec 22 2025, 2:01 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 22 2025, 1:48 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

(venv) ozge@stat1010:~/repos/wiki/gerrit/inference-services/test/locust$ export MAX_LENGTH=350
(venv) ozge@stat1010:~/repos/wiki/gerrit/inference-services/test/locust$ export MIN_LENGTH=100
(venv) ozge@stat1010:~/repos/wiki/gerrit/inference-services/test/locust$ MODEL=embeddings locust
Min length: 100, Max length: 350

question_length

count 732.000000
mean 135.498634
std 58.858208
min 100.000000
25% 105.000000
50% 112.000000
75% 128.000000
max 348.000000
[2025-12-22 13:44:47,723] stat1010/INFO/locust.main: Run time limit set to 120 seconds
[2025-12-22 13:44:47,723] stat1010/INFO/locust.main: Starting Locust 2.31.5
[2025-12-22 13:44:47,724] stat1010/INFO/locust.runners: Ramping to 2 users at a rate of 10.00 per second
[2025-12-22 13:44:47,724] stat1010/INFO/locust.runners: All users spawned: {"Embeddings": 2} (2 total users)
[2025-12-22 13:46:47,260] stat1010/INFO/locust.main: --run-time limit reached, shutting down
Load test results are within the threshold
[2025-12-22 13:46:47,339] stat1010/INFO/locust.main: Shutting down (exit code 0)
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST https://inference-staging.svc.codfw.wmnet:30443/v1/models/qwen3-embedding:predict 1890 0(0.00%) | 74 66 304 70 | 15.80 0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------

Aggregated                                                                      1890     0(0.00%) |     74      66     304     70 |   15.80        0.00
Dec 22 2025, 1:47 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Staging results.

Dec 22 2025, 1:27 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

results with a new set up from local.

Dec 22 2025, 12:29 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

performance test results in local with CPU:

Dec 22 2025, 11:39 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 22 2025, 10:57 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 22 2025, 10:47 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Dec 18 2025

OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 18 2025, 11:16 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 18 2025, 11:16 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Dec 17 2025

OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Great idea! Let's turn this into a goal. I think it's fine not to create child tickets for now.
I have added checkboxes to the description indicating each step/task.
I'll update them as we progress and I can add weekly updates here.
Thank you!

Dec 17 2025, 1:06 PM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 17 2025, 10:23 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 17 2025, 10:22 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 17 2025, 10:20 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Dec 16 2025

OKarakaya-WMF added a comment to P86608 Prototype serving Qwen3 embeddings with KServe using HF Transformers and ROCm-compatible FlashAttention-2.

Hey @kevinbazira do we already have a dockerfile to try this on staging?

Dec 16 2025, 4:35 PM · Machine-Learning-Team

Dec 15 2025

OKarakaya-WMF added a comment to P86608 Prototype serving Qwen3 embeddings with KServe using HF Transformers and ROCm-compatible FlashAttention-2.

Nice implementation!
Does last token pooling come from qwenlm?

Dec 15 2025, 11:37 AM · Machine-Learning-Team

Dec 12 2025

OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
  1. Reporting 12/11/2025
Dec 12 2025, 11:40 AM · OKR-Work, Goal, Machine-Learning-Team

Dec 11 2025

OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 11 2025, 9:07 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.

Below I share how long it will take to generate embeddings with different set ups and I compare two models:
model_name = "Qwen/Qwen3-Embedding-0.6B"

  • float16, all chars

205/207038 [00:55<15:41:24, 3.66it/s]

  • float16 , first 300 chars.

206/207038 [00:27<7:37:03, 7.54it/s]

  • float32 , first 300 chars.

264/207038 [01:06<14:31:14, 3.96it/s]

  • float32 , all chars.

124/207038 [01:12<33:27:43, 1.72it/s]
model_name = "sentence-transformers/all-mpnet-base-v2"

  • float32 , all chars.

209/207038 [00:21<5:51:39, 9.80it/s]

  • float32 , first 300 chars.

228/207038 [00:10<2:32:48, 22.56it/s]

Dec 11 2025, 9:05 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 11 2025, 7:55 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF updated the task description for T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 11 2025, 7:46 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team
OKarakaya-WMF created T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP.
Dec 11 2025, 7:39 AM · OKR-Work, Goal, Semantic Search, Machine-Learning-Team

Dec 9 2025

OKarakaya-WMF added a comment to T412055: linkrecommendation API does not include the model version in its output.

I agree.
Currently, the only indicator about the model version is the model hash (c4796c3c193d983980a445bb2a76f65def9f2459599fa6df055984bd851d3ca3 is the v2 zhwiki model)
I think we can introduce semantic versioning.

Dec 9 2025, 10:37 AM · Add-Link-Structured-Task, Growth-Team

Dec 8 2025

OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
  1. Reporting 05/11/2025
Dec 8 2025, 12:14 PM · OKR-Work, Goal, Machine-Learning-Team

Nov 28 2025

OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.

Looking into 17days periods:

Nov 28 2025, 11:16 AM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.

I've created a list of currently in use models.
These models below got at least one suggestion accept or suggestion reject since 2025-06-01.
The wikis are sorted by accept count. Therefore, the wikis above are used less.
I'll split the remaining deployments into 3.

  • Deployment 1: Deploy wikis between 1-50. (28/11/2025)
  • Deployment 2: Deploy wikis between 51-80. (01/12/2025)
  • Deployment 3: Deploy wikis between 81-113. (09/01/2026)
  • Deployment 4: Deploy enwiki. (12/01/2026)

Please feel free to suggest another order.

Nov 28 2025, 11:12 AM · OKR-Work, Goal, Machine-Learning-Team

Nov 26 2025

OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 26 2025, 12:00 PM · OKR-Work, Goal, Machine-Learning-Team

Nov 24 2025

OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.

Started updating following wikis:

Nov 24 2025, 11:56 AM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 24 2025, 11:54 AM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF added a comment to T410744: model reference-risk: reference_risk_score is always 0..

cool, thank you @Pablo ,

Nov 24 2025, 11:11 AM · Machine-Learning-Team
OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.

We got results for itwiki:

Nov 24 2025, 10:54 AM · OKR-Work, Goal, Machine-Learning-Team

Nov 21 2025

OKarakaya-WMF moved T410744: model reference-risk: reference_risk_score is always 0. from Unsorted to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Nov 21 2025, 2:46 PM · Machine-Learning-Team
OKarakaya-WMF added a comment to T410744: model reference-risk: reference_risk_score is always 0..

The service works fine:
curl https://api.wikimedia.org/service/lw/inference/v1/models/reference-risk:predict -X POST -d '{"rev_id": 1322686680, "lang": "en"}'
{"model_name":"reference-risk","model_version":"2024-11","wiki_db":"enwiki","revision_id":1322686680,"reference_count":37,"survival_ratio":{"min":0.16666666666666666,"mean":0.6632285937319566,"median":0.6505386708644346},"reference_risk_score":0.08108108108108109}%
https://en.wikipedia.org/w/index.php?title=MarketStar&oldid=1322686680
The issue is that the Deprecated or Blacklisted domains are quiet rare (~120)
Please feel free to let me know if you get 0 for a url which is Deprecated or Blacklisted and we can take a look further.

Nov 21 2025, 2:45 PM · Machine-Learning-Team
OKarakaya-WMF created T410744: model reference-risk: reference_risk_score is always 0..
Nov 21 2025, 2:44 PM · Machine-Learning-Team

Nov 20 2025

OKarakaya-WMF added a comment to T405185: Introduce case sensitivity to machine learning model for Add a Link.

thank you both @Sdkb and @Chipmunkdavis for reporting this issue,

Nov 20 2025, 10:20 AM · Community Feedback (Growth), Machine-Learning-Team, Growth-Team, Add-Link-Structured-Task

Nov 14 2025

OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
  1. Reporting 14/11/2025
Nov 14 2025, 7:51 AM · OKR-Work, Goal, Machine-Learning-Team

Nov 6 2025

OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 6 2025, 3:49 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF added a comment to T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.

I've collected current performance rates and counts of the candidate wikis:

Nov 6 2025, 9:45 AM · OKR-Work, Goal, Machine-Learning-Team

Nov 5 2025

OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 5 2025, 8:06 AM · OKR-Work, Goal, Machine-Learning-Team

Nov 4 2025

OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 4 2025, 3:36 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 4 2025, 3:30 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Nov 4 2025, 3:18 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF moved T400446: Update blubber version in inference services images from In Progress to Unsorted on the Machine-Learning-Team board.
Nov 4 2025, 10:11 AM · Essential-Work, Machine-Learning-Team

Oct 31 2025

OKarakaya-WMF moved T405359: Semantic Search POC - In article QA from In Progress to 2025-2026 Q2 Done on the Machine-Learning-Team board.
Oct 31 2025, 8:58 AM · Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T405359: Semantic Search POC - In article QA.
Oct 31 2025, 8:51 AM · Semantic Search, Machine-Learning-Team
OKarakaya-WMF added a comment to T405359: Semantic Search POC - In article QA.

I'm sharing final evaluation results for this phase:

Oct 31 2025, 8:40 AM · Semantic Search, Machine-Learning-Team

Oct 30 2025

OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Oct 30 2025, 2:53 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Oct 30 2025, 12:52 PM · OKR-Work, Goal, Machine-Learning-Team
OKarakaya-WMF updated the task description for T408790: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production.
Oct 30 2025, 12:52 PM · OKR-Work, Goal, Machine-Learning-Team