Page MenuHomePhabricator

Ahmadali_dev
User

Projects

User is not a member of any projects.

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Mar 29 2026, 6:33 PM (11 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Ahmadali dev [ Global Accounts ]

Recent Activity

Mar 29 2026

Ahmadali_dev added a comment to T415145: GSoC 2026: Bulk OCR Improvements.

Hello mentors,
My name is Ahmad Ali, a Computer Science student interested in applying for this project under GSoC 2025. I have been exploring the existing Wikisource OCR workflow and the ProofreadPage extension codebase this week, and I find the problem of bulk digitization genuinely important.
I have drafted a proposal covering: asynchronous job queue design using the MediaWiki-native JobQueue, a side-by-side OCR preview interface with low-confidence highlighting, role-based access control with a new bulkocr user right, and an optional Python NLP post-processing microservice designed for graceful degradation.
My technical question: When a bulk OCR job is processing 200+ pages asynchronously, what is the recommended strategy for rate-limiting calls to the Wikimedia OCR service — is there an existing throttling mechanism in the extension I should build on, or would I need to implement a token-bucket style limiter in the job handler?
I would appreciate any guidance or feedback from mentors. I am available on Wikimedia Zulip as well AhmadAli.
Thank you for your time.

Mar 29 2026, 6:55 PM · MW-1.46-notes (1.46.0-wmf.26; 2026-04-28), Patch-For-Review, Google-Summer-of-Code (2026)