Page MenuHomePhabricator

#N is pronounced as "hashtag"
Closed, ResolvedPublic

Description

NOTE: I tested this with v0.2.1 and couldn't get any response from morf.se.

The character "#" followed by a number is pronounced as only "hashtag". The default behaviour should probably be "number N".

Example: /wikispeech/?lang=en&input_type=text&input=on+May+28+at+%23105. (%23 is encoded #)
Also, the tokens don't match the audio:

"tokens": [
    {
        "endtime": 0.155,
        "orth": "on"
    },
    {
        "endtime": 0.355,
        "orth": "May"
    },
    {
        "endtime": 1.055,
        "expanded": "twenty eight",
        "orth": "28"
    },
    {
        "endtime": 1.185,
        "orth": "at"
    },
    {
        "endtime": 1.81,
        "expanded": "hashtag 105",
        "orth": "#105"
    },
    {
        "endtime": 1.81,
        "orth": "."
    },
    {
        "endtime": 2.21,
        "orth": ""
    }
]

Event Timeline

HannaLindgren renamed this task from #N is prnounced as "hashtag" to #N is pronounced as "hashtag".Nov 14 2017, 4:18 PM
HannaLindgren assigned this task to HaraldBerthelsen.

Yes, English marytts does this. English Flite says "hash" followed by the number. Swedish marytts actually just skips the whole "#N" expression.
So for now what you can do is to change the pronunciation using ssml.
But it also reraises a question we discussed before: Should there be a preprocessing step with e.g. regular expressions to modify input? Maybe a good idea.

I can't reproduce the other part of the ticket (tokens don't match audio). "#105" is only pronounced as "hashtag", but for me the tokens match.
Closing this as "resolved" for now, but should maybe be revisited later?