User Details
- User Since
- Aug 3 2019, 6:58 AM (358 w, 5 d)
- Availability
- Available
- IRC Nick
- kevinbazira
- LDAP User
- Kevin Bazira
- MediaWiki User
- KBazira (WMF) [ Global Accounts ]
Today
We have run load tests for the cope-b-a4b isvc, and it can handle ~32 requests/second with a median latency of ~36ms as shown below:
| Type | Name | Request Count | Failure Count | Median Response Time | Average Response Time | Min Response Time | Max Response Time | Average Content Size | Requests/s | Failures/s | 50% | 66% | 75% | 80% | 90% | 95% | 98% | 99% | 99.9% | 99.99% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| POST | /v1/models/cope-b-a4b:predict | 3830 | 0 | 36 | 48.31993621008048 | 22.236770018935204 | 1550.042748451233 | 63.0 | 32.13438715363878 | 0.0 | 36 | 45 | 52 | 58 | 76 | 100 | 190 | 280 | 740 | 1600 | 1600 |
| Aggregated | 3830 | 0 | 36 | 48.31993621008048 | 22.236770018935204 | 1550.042748451233 | 63.0 | 32.13438715363878 | 0.0 | 36 | 45 | 52 | 58 | 76 | 100 | 190 | 280 | 740 | 1600 | 1600 |
Yesterday
Thanks for the clarification, @Tchanders! The cope-b-a4b isvc response has been trimmed to violation, p_violation, p_safe as shown below:
$ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/cope-b-a4b:predict" -X POST \ -d '{ "content": "CLICK HERE TO WIN $10000!!! Visit http://totallylegit.biz NOW!!!", "policy": "Content must not contain spam, phishing attempts, or deceptive links." }' \ -H "Host: cope-b-a4b.experimental.wikimedia.org" \ -H "Content-Type: application/json" --http1.1 {"violation":1,"p_violation":1.0,"p_safe":1.522997974471263e-8} real 0m0.042s user 0m0.011s sys 0m0.004s $ $ $ $ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/cope-b-a4b:predict" -X POST \ -d '{ "content": "The library opens at 9am on weekdays and 10am on weekends.", "policy": "Content must not contain spam, phishing attempts, or deceptive links." }' \ -H "Host: cope-b-a4b.experimental.wikimedia.org" \ -H "Content-Type: application/json" --http1.1 {"violation":0,"p_violation":1.538173465229056e-7,"p_safe":0.9999998807907176} real 0m0.045s user 0m0.015s sys 0m0.000s $ $ $ $ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/cope-b-a4b:predict" -X POST \ -d '{ "content": "Check out my new blog where I review productivity apps. Link in my bio!", "policy": "Content must not contain spam, phishing attempts, or deceptive links." }' \ -H "Host: cope-b-a4b.experimental.wikimedia.org" \ -H "Content-Type: application/json" --http1.1 {"violation":0,"p_violation":8.939699493298122e-6,"p_safe":0.999991059383269} real 0m0.045s user 0m0.015s sys 0m0.000s
Tue, Jun 16
A vLLM 0.22.1 base image was published in T428577. This enabled us to migrate the cope-b-a4b model-server from HF transformers to vLLM. The latest cope-b-a4b isvc has been deployed in the prod experimental ns:
Mon, Jun 15
Fri, Jun 12
Weekly Update:
Thu, Jun 11
A vLLM 0.22.1 image has been published to the docker registry in: T428577#12008949
The updated WMF Debian vLLM image that supports the latest upstream software stack as of June 2026 is now available in the wikimedia docker registry: https://docker-registry.wikimedia.org/ml/amd-vllm022/tags/
This image successfully served both facebook/opt-125m and Qwen/Qwen3.6-27B LLMs as shown below:
Using the official upstream pre-built wheels, I've upgraded the wmf-debian-vllm image to support the latest vLLM software stack as of June 2026. The key updates are:
Following T428577#11998794, we added ROCm 7.2.0 packages to the Wikimedia bookworm mirror as shown here: https://apt-browser.toolforge.org/bookworm-wikimedia/thirdparty/amd-rocm72/
Wed, Jun 10
Tue, Jun 9
The Wikimedia bookworm mirror currently contains ROCm 7.0 as the latest packages:
We have also added a library page to the prototype so that you can browse what has been generated so far:
We have started running the batch-generation pipeline using the steps documented in the project README:
https://gitlab.wikimedia.org/toolforge-repos/wiki-tts/-/tree/f95b9c0c0d4642ea2b95d5995f2747b3f20596e7#7-batch-tts-generation-pipeline
Mon, Jun 8
Closing this task as this feature is now live. Please feel free to re-open if needed.
Fri, Jun 5
Weekly Update:
Thu, Jun 4
We have updated the liftwing_client to support this new cope-b-a4b endpoint and shared it with the PSI team to continue using it to fine-tune their policies.
The zentropi-ai/cope-b-a4b docs show we have 3 hosting options: zentropi API, vLLM, HF transformers. Since our vLLM base image doesn't yet support cope-b-a4b as shown in: P93623, we built the model-server on HF transformers instead, which loads and runs the model successfully (see P93624).
Wed, Jun 3
HF transformers successfully loaded in: P93624
The cope-b-a4b model requires vLLM ≥ 0.20.2 based on: https://huggingface.co/zentropi-ai/cope-b-a4b#system-requirements
Tue, Jun 2
Sharing this from slack for posterity:
Mon, Jun 1
We have also added a demo of this feature in the TTS prototype UI. Now when you play a section's audio, the transcript below the audio player highlights each spoken word in real-time so that you can follow-along:
We have added word-level timestamps to the TTS prototype. Each audio section (.mp3) now comes with a companion WebVTT caption file (.vtt) with per-word start and end times:
| Endpoint | Purpose | Responses |
|---|---|---|
| https://wiki-tts.toolforge.org/audio/Earth/Lead.mp3 | Serving of .mp3 (existed) | HTTP 200 if .mp3 exists, HTTP 404 if .mp3 doesn't exist on disk |
| https://wiki-tts.toolforge.org/audio/Earth/Lead.vtt | Serving of .vtt (new) | HTTP 200 if .vtt exists, HTTP 404 if .vtt doesn't exist on disk |
Fri, May 29
Weekly Update:
We found another text normalization edge case while working on T427488: Add word-level timestamps in TTS prototype.
Thu, May 28
Wed, May 27
We have added a new endpoint that provides static audio URLs alongside the existing endpoint that handles on-demand generation and serving:
| Endpoint | Purpose | Responses |
|---|---|---|
| https://wiki-tts.toolforge.org/audio?article=Earth§ion=Lead | On-demand generation + serving of .mp3 (existed) | HTTP 200 if .mp3 exists, HTTP 202 queue generation if .mp3 doesn't exist on disk, HTTP 404 if article/section doesn't exist on Wikipedia |
| https://wiki-tts.toolforge.org/audio/Earth/Lead.mp3 | Serving of .mp3 (new) | HTTP 200 if .mp3 exists, HTTP 404 if .mp3 doesn't exist on disk |
Tue, May 26
Mon, May 25
Following T427173#11952147, we investigated how industry leading TTS engines control pause durations and found that Google Cloud TTS, Amazon Polly, and Azure TTS use a <break> tag, which is an element in the W3C Speech Synthesis Markup Language (SSML) that allows one to add or modify pauses and silences in the generated audio.
The Kokoro model demo shows we can use punctuation (; : , . ! ? — … " ( ) “ ”) to add pauses and intonation between words.
Fri, May 22
Weekly Update:
Thu, May 21
We have also added a nemo_whitelist.tsv. Without it, NeMo would treat unrecognised domain-specific vocabulary words like "UNESCO" as regular words, causing the TTS service to read them as "an eh sko" rather than "yoo neh sko". The whitelist preserves these terms as-is so the TTS service handles pronunciation correctly. The best part is that this list can be expanded whenever we notice new custom/wikipedia-specific/domain-specific terms that aren't being spoken well.
>>> from wiki_tts.text import clean_spoken_text, init_nemo
>>>
>>> init_nemo()
NeMo-text-processing :: INFO :: Post processing graph was restored from /tmp/wiki-tts-nemo-grammars/en_tn_post_processing.far.
NeMo-text-processing :: INFO :: ClassifyFst.fst was restored from /tmp/wiki-tts-nemo-grammars/en_tn_True_deterministic_cased_nemo_whitelist.tsv_tokenize.far.
NeMo-text-processing :: INFO :: VerbalizeFinalFst graph was restored from /tmp/wiki-tts-nemo-grammars/en_tn_True_deterministic_verbalizer.far.
>>>
>>> # Custom/Wikipedia-specific terms: Domain vocabulary not handled by general-purpose text normalization
>>> clean_spoken_text("NASA launched a mission.")
'NASA launched a mission.'
>>> clean_spoken_text("UNESCO declared a world heritage site.")
'yoo neh sko declared a world heritage site.'
>>> clean_spoken_text("DNA and RNA are nucleic acids.")
'DNA and RNA are nucleic acids.'
>>> clean_spoken_text("AI technology is advancing.")
'AI technology is advancing.'Following T426756#11944373, we integrated the NeMo text processing library into the TTS protottype since it handles a majority of the nuanced text normalization edge cases outlined in this task's description.
One thing we found while fixing subscript/superscript edge cases is that Wikipedia's plain-text extract API:
https://en.wikipedia.org/w/api.php?action=query&titles=Square_metre&prop=extracts&explaintext=1&format=json
strips all formatting. Content like m<sub>2</sub> i.e m₂ and m<sup>2</sup> i.e m² both arrive as m2. This ends up being read as "m two". The digit is pronounced correctly, but the superscript meaning ("squared") is lost.
Wed, May 20
May 19 2026
May 15 2026
Weekly Update:
In T424378#11903226, the focus was on vertical scaling. However, after discussions in T425804#11913283 and T425804#11914308, this project was allocated 24Gi RAM to enable horizontal scaling. This approach involves deploying 10 small worker replicas (1CPU and 2Gi RAM each) alongside the web server, as vertical scaling is not currently supported on toolforge. Below are the steps I used to host this TTS prototype on toolforge:
Thanks a lot, everyone : )
Following the quota bump requested in T425804#11914308, we are able to run 10 replicas with 1CPU and 2Gi RAM each:
$ toolforge jobs delete celery-worker $ toolforge jobs run celery-worker \ --command "export ORT_NUM_THREADS=1 OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 VECLIB_MAXIMUM_THREADS=1 NUMEXPR_NUM_THREADS=1 && cd ~/www/python/src && ~/www/python/venv/bin/celery -A worker worker --pool solo --loglevel=info" \ --image python3.11 \ --continuous \ --replicas 10 \ --mem 2Gi \ --cpu 1
May 13 2026
thanks!
May 12 2026
Thanks @komla! I shared feedback from the ML and APPs team: T425804#11913105
May 11 2026
Thanks everyone for your suggestions, we have moved forward to T425909: Request creation of wikitts VPS project
May 8 2026
Weekly Update:
Following T424378#11888422, we started preparing to host the TTS prototype on toolforge. We found that the celery-worker job can currently only run with --mem 4Gi --cpu 2 at most on toolforge because of the resource quotas shown below:
tools.wiki-tts@tools-bastion-15:~$ kubectl describe resourcequotas Name: tool-wiki-tts Namespace: tool-wiki-tts Resource Used Hard -------- ---- ---- configmaps 4 10 count/cronjobs.batch 0 50 count/deployments.apps 1 16 count/jobs.batch 0 15 limits.cpu 500m 16 limits.memory 512Mi 8Gi persistentvolumeclaims 0 0 pods 1 16 requests.cpu 125m 16 requests.memory 256Mi 8Gi secrets 4 64 services 1 16 services.nodeports 0 0