I'm trying to convert IA pdf's into djvu with pdf2djvu, using -d 600 parameter to increase imafes resolution; the result has been good in a test file, so I suggest to consider pdf2djvu when default, complex routine fails.
i'm working on "Possibly failed" items into IA Upload list.
Description
Related Objects
Event Timeline
Using the first failed upload in the present list, IA bnf-bpt6k5566792c, suggested djvu name Blandy - Revanche de femme.djvu, i successfully got an excellent djvu by pdf2djvu, and I'd like to know who is the interested user to help him. How can I find him?
Currently IA Upload uploads DjVu obtained via three possible sources:
- Use existing DjVu
- From original scans (JP2)
- From PDF (maybe of lower quality)
The first option is usually not available for newer IA items as they stopped generating them in March 2016.
The second option seems the best but currently has some issues (see T300761).
The final option also has some issues (see T307956) but suffice it to say that the conversion is currently done by DjVuDigital. The pages created by DjVuDigital are then reinstrumented by djvused with the IA OCR text (created by Abbyy or Tesseract since 2021) provided in their "Djvu XML" (_djvu.xml).
DjVuDigital is a script that uses GSDjVu, csepdjvu, etc. to build DjVu files.
GSDjVu is based on GhostScript with some back-end drivers for DjVu image separation. As such this means that DjVuDigital can convert both PDF and PostScript files to DjVu while using the advanced image separation in GSDjVu that often results in faster conversions with superior image compression. However, the code has some copyright issues and is older, somewhat inflexible and not well maintained.
pdf2djvu is based on Poppler so it can only convert PDF files and it employs a simpler image separation algorithm. That said it is well maintained and has several other benefits.
@Alex_brollo For a good comparison look at: pdf2djvu vs djvudigital.