Page MenuHomePhabricator

Request capacity increase in preparation for MinT for wiki Readers experiment
Closed, ResolvedPublic4 Estimated Story Points

Description

As part of the preparations (T381406) to continue with the MinT for Wiki Readers experiment, we want to identify if there are quick ways to improve the response time for machine translation requests. Slow response times can add some noise to the experiment results.

This ticket proposes to identify which server resources could have the most impact in translation speed from those that could be easily increased. In this way we can make a request, and have the service in a better condition for the experiment.

Details

Event Timeline

Nikerabbit triaged this task as Medium priority.Feb 20 2025, 8:46 AM
Nikerabbit set the point value for this task to 4.
Nikerabbit moved this task from Backlog to Infrastructure on the MinT board.
Nikerabbit renamed this task from Request server capcity increase in preparation for MinT for wiki Readers experient to Request server capacity increase in preparation for MinT for wiki Readers experiment.Mar 11 2025, 7:58 AM
Nikerabbit renamed this task from Request server capacity increase in preparation for MinT for wiki Readers experiment to Request capacity increase in preparation for MinT for wiki Readers experiment.

After deploying 'MinT for Wiki Readers' in 4 Wikipedias (T390023), traffic (and other metrics: Memory, CPUs) seems normal, we will observe it this week before going forward to deploy in more Wikis.

Nikerabbit changed the task status from Open to Stalled.Jun 3 2025, 10:36 AM

The recent report for the initial MinT for Wikipedia pre-pilot wikis (T391365) shows high loading times for machine translation views:

  • Over 3s for 68% of the views
  • Over 2s for 90% of the views.

Loading times over 2-3 seconds can affect the abandonment rate, introducing noise to the experiment results.
[This old Google post(https://blog.google/products/admanager/the-need-for-mobile-speed/) summarizes some of the research in this space:

  • 53% of visits are likely to be abandoned if pages take longer than 3 seconds to load.
  • One out of two people expect a page to load in less than 2 seconds.

(These numbers are from 2016, but I see no reason for today expectations being for slower response times)

I think it may be worth trying to make a request for additional server capacity that could result in faster loading times. Especially considering that the experiment is targeting more wikis with more traffic than the 4 in the pre-pilot.

@KartikMistry is there a simple request we can make to increase resources that could help with loading times?

Nikerabbit changed the task status from Stalled to Open.Jul 16 2025, 7:21 AM
KartikMistry changed the task status from Open to In Progress.Nov 6 2025, 8:25 AM
KartikMistry claimed this task.
KartikMistry moved this task from Backlog to In Progress on the LPL Essential (FY2025-26 Q2) board.
KartikMistry removed a subscriber: PWaigi-WMF.

Change #1202642 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] machinetranslation: Increase replica and memory

https://gerrit.wikimedia.org/r/1202642

Looking at the memory usage

https://grafana.wikimedia.org/goto/OxM6IrkvR?orgId=1

It seems like only one of the pods goes from 32G to 64G, and it is the same one (it is visible if you toggle between the two pods and the two DCs (eqiad/codfw)

Is there anything else that might help explain this behaviour? Unless, of course, it is expected. Thank you!

Change #1202642 merged by jenkins-bot:

[operations/deployment-charts@master] machinetranslation: Increase replicas

https://gerrit.wikimedia.org/r/1202642

Mentioned in SAL (#wikimedia-operations) [2025-11-13T06:18:33Z] <kart_> machinetranslation: Increase replicas (T386371)

Capacity has been increased, without apparent issues. When the experiment is launches, we'll hopefully have higher chances of faster request responses and lower chances of servers going out of capacity.