Page MenuHomePhabricator

Arabic marytts fails on quoted latin text
Open, LowPublic

Description

The request was:
/wikispeech/?lang=ar&input_type=text&input=Script+error%3A+No+such+module+%22Unsubst%22.

Log:

1opusenc opus-tools 0.1.8 (using libopus 1.1)
2Copyright (C) 2008-2013 Xiph.Org Foundation
3Encoding using libopus 1.1 (audio)
4-----------------------------------------------------
5 Input: 16kHz 1 channel
6 Output: 1 channel (1 uncoupled)
7 20ms packets, 33kbit/sec VBR
8 Preskip: 312
9
10
11
12[|] 00:00:00.02 2e+04x realtime, 33.07kbit/s
13Encoding complete
14-----------------------------------------------------
15 Encoded: 0.88 seconds
16 Runtime: 1e-06 seconds
17 (8.8e+05x realtime)
18 Wrote: 4425 bytes, 44 packets, 3 pages
19 Bitrate: 31.9364kbit/s (without overhead)
20 Instant rates: 28kbit/s to 37.6kbit/s
21 (70 to 94 bytes per packet)
22 Overhead: 20.6% (container+metadata)
23
24 File "bin/wikispeech", line 28, in <module>
25 import wikispeech_server.test_api
26 File "./wikispeech_server/test_api.py", line 43, in <module>
27 assert ( langs == expected_langs ) , "%s and %s are not equal" % (expected_langs, langs)
28 * Running on http://0.0.0.0:10000/ (Press CTRL+C to quit)
29[2017-09-26 15:22:21,241] ERROR in app: Exception on /wikispeech/ [POST]
30Traceback (most recent call last):
31 File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1982, in wsgi_app
32 response = self.full_dispatch_request()
33 File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1614, in full_dispatch_request
34 rv = self.handle_user_exception(e)
35 File "/usr/local/lib/python3.4/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
36 return cors_after_request(app.make_response(f(*args, **kwargs)))
37 File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1517, in handle_user_exception
38 reraise(exc_type, exc_value, tb)
39 File "/usr/local/lib/python3.4/dist-packages/flask/_compat.py", line 33, in reraise
40 raise value
41 File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1612, in full_dispatch_request
42 rv = self.dispatch_request()
43 File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1598, in dispatch_request
44 return self.view_functions[rule.endpoint](**req.view_args)
45 File "./wikispeech_server/wikispeech.py", line 154, in wikispeech
46 markup = textproc(lang, textprocessor_name, input, input_type=input_type)
47 File "./wikispeech_server/wikispeech.py", line 367, in textproc
48 utt = process(text, lang, component, input_type=input_type)
49 File "./wikispeech_server/adapters/marytts_adapter.py", line 61, in marytts_preproc
50 (marylang, utt) = maryxml2utt(xml, tp_config)
51 File "./wikispeech_server/adapters/marytts_adapter.py", line 441, in maryxml2utt
52 utt = mary2ws(xml, voice)
53 File "./wikispeech_server/adapters/marytts_adapter.py", line 459, in mary2ws
54 root = ET.fromstring(maryxml.encode('utf-8'))
55 File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 1325, in XML
56 parser.feed(text)
57 File "<string>", line None
58xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 2227
5910.68.21.68 - - [26/Sep/2017 15:22:21] "POST /wikispeech/ HTTP/1.1" 500 -

Event Timeline

Seems like it origins from an error in marytts? The data sent from wikispeech to marytts is correct, I think. See stacktrace below.

Other comments:

  • The URL works with lang=sv instead of lang=ar
  • The URL works if I remove special characters from the input text (quote and %3A)

I also added some error checking in wikispeech. In marytts_adapter.py/marytts_preproc:

if r.status_code != 200:
   log.debug("marytts call failed with error %d" % r.status_code)
   log.debug("marytts error text %s" % r.text)
   raise ValueError("marytts call failed with error", r.status_code, r.text)

MaryTTS stacktrace:

Internal server error</h1><p>Processing failed.<pre>java.lang.Exception: Module JPhonemiser: Problem processing the data.
	at marytts.server.Request.processOneChunk(Request.java:568)
	at marytts.server.Request.processOrLookupOneChunk(Request.java:403)
	at marytts.server.Request.process(Request.java:337)
	at marytts.server.http.SynthesisRequestHandler.process(SynthesisRequestHandler.java:261)
	at marytts.server.http.SynthesisRequestHandler.handleClientRequest(SynthesisRequestHandler.java:91)
	at marytts.server.http.BaseHttpRequestHandler.handle(BaseHttpRequestHandler.java:138)
	at org.apache.http.nio.protocol.BufferingHttpServiceHandler$RequestHandlerAdaptor.handle(BufferingHttpServiceHandler.java:189)
	at org.apache.http.nio.protocol.SimpleNHttpRequestHandler.handle(SimpleNHttpRequestHandler.java:51)
	at org.apache.http.nio.protocol.AsyncNHttpServiceHandler.processRequest(AsyncNHttpServiceHandler.java:453)
	at org.apache.http.nio.protocol.AsyncNHttpServiceHandler.requestReceived(AsyncNHttpServiceHandler.java:225)
	at org.apache.http.nio.protocol.BufferingHttpServiceHandler.requestReceived(BufferingHttpServiceHandler.java:127)
	at org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput(DefaultNHttpServerConnection.java:161)
	at org.apache.http.impl.nio.DefaultServerIOEventDispatch.inputReady(DefaultServerIOEventDispatch.java:147)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Cannot syllabify empty phone string
	at marytts.modules.phonemiser.AllophoneSet.syllabify(AllophoneSet.java:481)
	at marytts.modules.phonemiser.TrainedLTS.syllabify(TrainedLTS.java:179)
	at marytts.language.ar.JPhonemiser.phonemise(JPhonemiser.java:116)
	at marytts.modules.JPhonemiser.process(JPhonemiser.java:193)
	at marytts.server.Request.processOneChunk(Request.java:566)
	... 19 more

Tokenisation problem in Arabic marytts. Is it even wrong? The input is very unlikely for an Arabic synthesis. However there could definitely be a better error message!
Maybe the problem is in handling the various types of quotes used in Arabic? Or just that quotes work with Arabic script input but not with Latin script.
This works:
وكتابه "نهاية الإيجار في دراية الإعجاز" يعتبر من المراجع البلاغية المهمة.
But the example in this ticket doesn't and neither does this:
"Bob Dylan"

HaraldBerthelsen renamed this task from Server error: xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 2227 to Arabic marytts fails on quoted latin text.Dec 5 2017, 12:35 PM
HannaLindgren raised the priority of this task from Lowest to Low.Oct 23 2019, 9:43 AM
Aklapper added subscribers: HaraldBerthelsen, Aklapper.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)