If an archive-url ends with “pdf#page=” followed by an integer, then the bot should not remove “id_” from a Wayback archive-url, because it breaks the link to a specific page within the PDF. In fact, if the bot isn’t adding “id_” to archive-urls ending with “pdf#page=” and then an integer, then the bot is breaking the spirit, if not the letter of WP:CITE, which says that “No editor is required to add page links [to links to Google Books citations], but if another editor adds them, they should not be removed without cause”.
Links that contain “pdf#page=” should not remove “id_” from the archive-url because without it, the Wayback Machine serves the archived PDF with the Wayback toolbar, and this mixed-content (HTML–PDF) hybrid prevents the “pdf#page=” part of the IRI from sending the user to the referenced page.
In this before and after of this diff (https://en.wikipedia.org/?diff=856918530), you can see the problem:
• archive-url before IABot, sending the user to the 225th page of the archived PDF: https://web.archive.org/web/20180825193304id_/energy.gov/sites/prod/files/2017/09/f36/EIS-0527_FEIS_CH11.pdf#page=225
• archive-url after IABot, sending the user to the 1st page of the archived PDF: https://web.archive.org/web/20180825193304/http://energy.gov/sites/prod/files/2017/09/f36/EIS-0527_FEIS_CH11.pdf#page=225
Note that the absence of the http:// before energy.gov in the old URL doesn’t change the link’s functionality; it isn’t related to this bug (even if the old URL were https://web.archive.org/web/20180825193304id_/http://energy.gov/sites/prod/files/2017/09/f36/EIS-0527_FEIS_CH11.pdf#page=225, it’d’ve worked fine).
A less urgent problem, is the access-date parameter in the same citation in the same diff:
• the access-date was added to a citation that didn’t have one
• the access-date was added to the middle of a quote parameter’s content
An even less urgent problem is the addition of blank df parameters to every citation the bot touches, which at best adds nothing useful for the reader, editor, or software.
While the bugs are addressed, I will revert the parts of the diff in question.
Many thanks for this priceless service you’re providing,
LLarson