Parsoid (JS or PHP) has known issues where large pages will either OOM or timeout. With Parsoid/PHP, the memory and request time limits are lower than with Parsoid/JS. We are addressing these issue separately in a number of ways. Given Parsoid's usage profile, these rendering or parse failures shouldn't block train deployments.
However, these failures are currently introducing artificial train blockers by triggering icinga or other train rollout alerts. So, this noisiness needs to be addressed so that they don't act as false positives wrt train rollout.
I am filing this task as a record of conversation in #wikimedia-operations channel. @Krinkle and @cscott are working on this already.
Not that these failures lead to WMFTimeout or OOM fatal errors that occur outside the Parsoid codebase and as such, aren't easily handled within the Parsoid codebase.