When I paste a URL like
https://archive.org/details/minutesofcommitt571newy
or
https://archive.org/stream/cu31924028853327
into citoid, it should detect that those are books and extract publisher, author, etc. information from archive.org.
I'm not sure if this helps, but surely some of these sorts of things can be gotten from openlibrary (for which there is already an extractor); e.g. http://openlibrary.org/ia/minutesofcommitt571newy for the first link above.
[This could be a SoC project,