When I paste a URL like
into citoid, it should detect that those are books and extract publisher, author, etc. information from archive.org.
I'm not sure if this helps, but surely some of these sorts of things can be gotten from openlibrary (for which there is already an extractor); e.g. http://openlibrary.org/ia/minutesofcommitt571newy for the first link above.
[This could be a SoC project,