Page MenuHomePhabricator

Ores mwparserfromhell causes celery segfaults
Closed, ResolvedPublic

Description

This is similar to T222866, celery on some ores nodes is segfaulting for the same text:

$1 = {ob_base = {ob_refcnt = 2, ob_type = 0x7f523de284e0 <TokenizerType>}, text = {
    object = '<noinclude>{{위키백과:질문방/보존2|년=2021|월=11|주=47}}</noinclude>\n\n== 파일 삭제 요청에 대한 질문 ==\n\n 파일 삭제 요청은 어디서 하나요? --\'\'\'[[사:밀크맛 우유|<span style="text-shadow: 0px 0px 5px #FF83AB; color: snow;">MILK</span>]] [[사토:밀크맛 우유|<span style="text-shadow: 0px 0px 5px #FF83AB; color: skyblue;"> ⍤⃝♬</span>]]\'\'\' 2021년 11월 22일 (월) 15:49 (KST)\n\n:{{핑|밀크맛 우유}}[[백:문서 관리 요청]]에서 하시면 됩니다.ㅡ [[사:Cream Latte|<span style="background: linear-gradient(135deg, slateblue, hotpink); -webkit-background-clip: text; -webkit-text-fill-color: transparent; white-space:nowrap; -webkit-text-stroke:0px white;">\'\'\'˖ ִֶָᶤ C.l ִֶָꨄˊ˗\'\'\'</span>]] 2021년 11월 22일 (월) 17:50 (KST)\n:{{ㄷ|밀크맛 우유}}일문 삭제하듯이 {{틀|ㅅ}} 쓰시면 됩니다. --[[사용자:KeySpace|<span style="color:Blue;font-family:Courier new, serif;">Key</span><span style="color:brown;font-family:Courier new, serif;">.S</span>]]<span style="font-size:85%">([[사토: KeySpace|토론]]/[[특: 기여/KeySpace|기여]]/[[특: 이메일보내기/KeySpace|이메일]])</span> 2021년 11월 22일 (월) 21:07 (KST)\n\n== 최초 ==\n\n위키백과 최초로 차단된 사람이 누구인가요?\n2021년 11월 2...(truncated), length = 25854, kind = 4, data = 0x560e6b9d9ad8}, topstack = 0x7f5226df9990, head = 3085, global = 1,
  depth = 2, route_state = 0, route_context = 524288, bad_routes = 0x7f52256ec428, skip_style_tags = 0}

Event Timeline

elukey created this object with visibility "WMF-NDA (Project)".

Trying to follow up with the same procedure done in T222866#5179563. In this case, the parameters are: length = 25854, kind = 4, data = 0x560e6b9d9ad8

Starting address: 0x560e6b9d9ad8
End address: 0x560e6b9d9ad8 + (4 * 25854) = 0x560E6B9F2ED0

dump memory esbugmem 0x560e6b9d9ad8 0x560E6B9F2ED0

Following pep-0393, kind = 4 should be UCS-4 so 4 bytes for each character.

Indeed with the following script I was able to cause a segfault:

import mwparserfromhell
import time
import os
import glob

# The file is binary memory dump, so read it as binary and then decode it as utf16 which for the purposes of this reproduction is close enough to UCS2.
with open('esbugmem', 'rb') as f:
    data = f.read()
    text = data.decode('utf32')
    print(mwparserfromhell.parse(text))

The good thing is that I cannot reproduce the same problem with the last version of mwparserfromhell, so we should upgrade asap in my opinion. More precisely, we run version 0.5.4 and 0.6+ seems to be unaffected.

To limit the damage, I have disabled coredumps in ORES celery's systemd unit since it was causing too many 10G files to be created on the root partition of the worker nodes, ending up in alarms.

I have created https://gerrit.wikimedia.org/r/c/research/ores/wheels/+/742195/ as follow up, to update mwparserfromhell, following https://wikitech.wikimedia.org/wiki/ORES/Deployment#Update_wheels

Deployed to Beta and checked some scores on https://ores-beta.wmflabs.org, nothing horrible registered in the celery logs on deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud.

The number of segfaults is not a lot, but I'd proceed with the deployment of one prod node anyway (given how much damage a wrong score can cause): thanos link

deployed to ores1001, will proceed with the rest tomorrow if nothing comes up.

elukey claimed this task.

Change deployed fleetwide, will keep monitoring metrics but so far everything green.

elukey changed the visibility from "Custom Policy" to "Public (No Login Required)".Nov 29 2021, 8:14 AM