It would help if the server could take care of some of the SSML parameters internally. What I have found works now is something like:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" mlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <phoneme alphabet="x-sampa" ph="' m V N . k i">dummy</phoneme> </speak>
The client shouldn't need to specify the speak-element. Especially since xml:lang doesn't use the "normal" language codes (e.g. en-US above).
I think it would make sense if the request could look something like this:
?lang=en&input_type=ssml&input=<phoneme alphabet="x-sampa" ph="' m V N . k i"></phoneme>