Page MenuHomePhabricator

In Record Wizard > Studio : Some words fails anyway
Closed, ResolvedPublicBUG REPORT

Description

"Even though I am correctly pronouncing every word, I see a lot of red-labelled words." -- Psubhashish

I also noticed it as well. For some reason, out of the list List:Zho/hsk 2012 Shtooka missing audios, I couldn't record 出示, despite several careful attempts.


Turning back to Details, removing the word 出示 from the list, adding 出示 back, going back to Studio : failed.

Turning back to Details, CLEARING the whole List:Zho/hsk 2012 Shtooka missing audios list, adding List:Zho/hsk 2012 Shtooka missing audios back, going back to Studio : worked !

Event Timeline

Yug created this task.Dec 23 2018, 9:16 PM
Yug renamed this task from In Record Wizard > Studio : Saturation isn't properly explained to In Record Wizard > Studio : Some words fails anyway.Dec 23 2018, 9:40 PM
Yug updated the task description. (Show Details)
Yug moved this task from UI to RecordWizard on the Lingua Libre board.Dec 23 2018, 9:50 PM
0x010C added a subscriber: 0x010C.May 7 2020, 11:02 AM

Also got this error sometimes. When the error occured one time on a word, it keeps occuring during the whole session on this word. For the record, this is the error returned by the API:

{
  "error": {
    "code": "verification-error",
    "info": "Files of the MIME type \"application/x-php\" are not allowed to be uploaded.",
    "details": [
      "filetype-badmime",
      "application/x-php"
    ],
    "*": "See https://v2.lingualibre.fr/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
  }
}

For a reason I don't get yet, MW thinks the uploaded file is an "application/x-php" instead of the correct "audio/wav" type.

0x010C added a comment.EditedMay 7 2020, 12:02 PM

Looking at the log file:

ApiUpload::execute about to verify
[Mime] MimeAnalyzer::loadFiles: loading mime types from /home/www/v2.lingualibre.fr/includes/libs/mime/mime.types
[Mime] MimeAnalyzer::loadFiles: loading mime info from /home/www/v2.lingualibre.fr/includes/libs/mime/mime.info
[Mime] MimeAnalyzer::doGuessMimeType: analyzing head and tail of /tmp/phpqSyGgq for magic numbers.
[Mime] MimeAnalyzer::doGuessMimeType: recognized /tmp/phpqSyGgq as application/x-php
[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/phpqSyGgq: application/x-php
[Mime] MimeAnalyzer::improveTypeFromExtension: improved mime type for .wav: application/x-php
MediaHandlerFactory::getHandler: no handler found for application/x-php.
mime: <application/x-php> extension: <wav>

Searching from that point, it looks like the issue came from this section of code inside the MimeAnalyzer module in MW core (which apparently is already known for raising false positive).
Looking more in-depth one of the files that produces that error, it contain the string <\u0000?\u0000= (aka <?=), which causes the mimealanyzer to flag it as application/x-php.

EDIT: Also found a <\u0000?\u0000\n on another faulty record.

0x010C added a comment.EditedMay 7 2020, 1:25 PM

After digging a bit in MW core's code, it seems that using an external mime-detection command through $wgMimeDetectorCommand will be useless, because this is used only if the internal mime detection did not find anything.

For now on, I see four solutions:

  • adding application/x-php to the allowed mime types in Lingua Libre's settings (very ugly, but according to this comment, the faulty check can already be bypassed by a motivated attacker, this would just open the security hole a bit larger...)
  • manually searching and removing those blocking patterns in the WAV files (or at least, in the first 1024 bytes) before uploading them.
  • raising immediately an error if a blocking pattern is detected, asking the user to immediately redo the record (question: what about the UX?)
  • as the error keeps occurring because we try to upload the same file, a simple workaround is to re-record the blocked file. But this would led to another problem : from a UX perspective, how to explain the situation to the user?

The second option is imho the best one, but testing a fix will be kind of annoying since this error is a pain to reproduce. We could also decide to choose the third or fourth one as they are fairly easy to implement (hoping for a rewrite of the faulty check in MW core one day), but I'll need some UX advises beforehand.

0x010C triaged this task as Medium priority.May 7 2020, 1:26 PM
0x010C changed the subtype of this task from "Task" to "Bug Report".
Yug added a comment.May 7 2020, 1:48 PM

You can mix 2 and 3. Search, and if found, pretend the file failed to
record (redish ending) and suggest rerecording over. No ?
On the UX side, the user will just see and understand the recording he just
made failed.

*Hugo Lopez (羅禹國)* -- Ingénieur plateformes pédagogiques

Mobile: {+0033|0}6.7613.0253 <+33.760.069.730>

<https://www.linkedin.com/in/lopez-hugo-a9402022>

https://github.com/hugolpz
http://stackoverflow.com/users/1974961/hugolpz

0x010C added a comment.EditedMay 7 2020, 6:42 PM

For the record, an example of wav file that is concidered as application/x-php:
(that's just a generated test file, nothing really interesting)

0x010C added a comment.EditedMay 7 2020, 8:35 PM

I just came with a not too complicated solution for the second option. As samples are coded on 16bits signed integer, and if we look at all the strings that need to be removed, we can actually just twist a bit (--> add 1) all occurrences of the 4 samples listed bellow to be negative in all cases:

  • 0x003F = "\x00?"
  • 0x3F00 = "?\x00"
  • 0x7068 = "ph"
  • 0x6870 = "hp"

I implemented it and it seems to work for now on, see commit df8f9bd8.

0x010C closed this task as Resolved.Jun 2 2020, 9:29 AM
0x010C claimed this task.