If a scan in PDF has a text layer, Mediawiki extracts it very poorly. Even a very good text layer is extracted badly. DJVUs do not suffer this problem and their text layer is extracted well. If the PDF is converted into DJVU, the extraction of the text from its text layer usually improves too. If the text is copypasted from the PDF document into a word processor, it is good as well. This means that the text layer is good, only Mediawiki cannot get it well from PDFs.
Example of text layer extraction from a PDF here: https://en.wikisource.org/w/index.php?title=Page:The_Hussite_Wars,_by_the_Count_L%C3%BCtzow.pdf/70&action=edit&redlink=1
The same PDF scan was converted into DJVU and the result can be compared here: https://en.wikisource.org/w/index.php?title=Page:The_Hussite_wars,_by_the_Count_L%C3%BCtzow.djvu/70&action=edit&redlink=1
Most libraries including Internet Archive or HathiTrust offer downloading PDFs with text layers and not DJVUs. Besides that handling DJVU is difficult for many contributors, not only for newbies. So, we do need to fix the text layer extraction from PDFs.