As discussed in T146908, it would be helpful to have a PHPUnit test to validate the global robots.txt file against the existing specifications. All projects rely on this global robots.txt to be correct, so their own MediaWiki:Robots.txt (which are appended per project to the global one) are also taken into account by the web crawlers and perform as expected.
There is a robots.txt PHP parser class on GitHub, which may be helpful in implementing the unit test, at least as a
copy-paste inspiration source on how a robots file can be parsed in PHP. Of particular interest may also be the project's wiki, which has a nice summary of the relevant specifications.
Unlike this parser library, the unit test most likely doesn't need to understand and implement the more intricate details of the specifications and could focus instead only on what is already used in the existing robots.txt (e.g. fail if empty lines exist within a record, which was the particular problem with T146908). In other words, false positives caused by some obscure if valid syntax are probably fine, as long as there are no false negatives. This may make the task substantially easier.
It may be nice to also cover cases like this old bug report, where the problem had not been with the source robots.txt per se, but rather with how robots.php was handling it.