It is common practice to include or indicate multiple pronunciations in one transcription, as in /ˈkɑːtɑːr, kəˈtɑːr/ or /maɪˈn(j)uːʃ(ɪ)ˌiː/. Providing a button for each pronunciation, or each permutation (the latter stands for four variants at once), would add considerable clutter.
Currently on Beta Cluster, ˈkɑːtɑːr, kəˈtɑːr flat-out gives you a file with no sound (but not an error), and ˈkɑːtɑːr kəˈtɑːr sounds like the name of someone called "Catarca Tar".
We need an operator to use inside ipa="" that results in one file with multiple distinct, isolated utterances with pauses in between.
I believe this is possible by something like
<speak> <p><!-- and <s> if required --> <phoneme alphabet="ipa" ph="ˈkɑːtɑːr">Qatar</phoneme> </p> <p> <phoneme alphabet="ipa" ph="kəˈtɑːr">Qatar</phoneme> </p> </speak>
or
<speak> <phoneme alphabet="ipa" ph="ˈkɑːtɑːr">Qatar</phoneme> <break strength="x-strong"/><!-- compare which strength is best? --> <phoneme alphabet="ipa" ph="kəˈtɑːr">Qatar</phoneme> </speak>
Failing that, send multiple requests, and concatenate the files and add pauses locally.