Page MenuHomePhabricator

Batch Upload for The Smart Set
Open, In Progress, Needs TriagePublic

Assigned To
Authored By
Oct 26 2021, 2:08 PM
Referenced Files
F34895420: Smart Set Issues - MJP.xlsx
Dec 24 2021, 6:54 PM
F34895388: List.xlsx
Dec 24 2021, 5:57 PM
F34894568: Smart Set Issues.xlsx
Dec 23 2021, 10:39 PM
F34894038: Smart Set Issues.xlsx
Dec 23 2021, 3:25 PM
F34894014: Smart Set Issues.xlsx
Dec 23 2021, 2:45 PM
F34712013: List.xlsx
Oct 26 2021, 2:08 PM


To keep your house warm and fulfill a popular request.

Event Timeline

Inductiveload changed the task status from Open to In Progress.EditedDec 21 2021, 11:34 AM
Inductiveload subscribed.

Note that there needs to be a "year" column that only contains the year. Text like Mar.-June 1900 should go in the vol_detail column. I have adjusted the spreadsheet for this run, but keep it in mind for next time please :-)

Thank you for fixing this. I'll make sure to keep this in mind next time. Happy Holidays! :)

Unexpectedly got some time today. Here is the list of all the issues of the Smart Set currently available. It seems that Volume 64, Issue 2-4 are either missing or were never issued. Volume 64 and Volume 65 both start with January 1921, but have different TOC.

Edit: Ok, there is definitely some weirdness going on. It appears that the Sims set and the HT set begin to diverge at some point.

BTW, please make sure the main sheet is the first in the document!

It's OK, I figured it out in the end, just completely wasn't expecting it and got some very confusing errors while looking at the wrong sheet!

OK, does this look like it makes sense?,_Number_2).djvu (I used number, not issue because there were existing files and also that's whats actually written on the cover).

OK, please stand clear of the doors! The good thing about this is it's nearly all IA-based, so it can run in parallel to Lippincotts (which is HT-based) without causing rate limit issues on either side.

Oh yeah, one more thing: do you know if the SIM collection includes indexes or ToCs? They might be entirely separate records since presumably you'd want them on a dedicated reel for searching?

The discrepancy seems to come from the fact that some of the SIM set was published in the UK.

The Sim set has TOCs. There's going to be some untangling to do with the number weirdness. For example Sims Volume 58 N 3 is the same as HT V 59 N 3 (July 1919), but September 1919 is different. The Sims set skips V64 without any missing months.

OK, must just be the occasional one without it. Am I OK to keep the upload going, or should I wait? It's quite painful to manually shift things around later compared to just adjusting the spreadsheet.

Keep going. The spreadsheet and IA are not wrong. It's the publisher. I think they were trying out a British edition for a bit. Some of the months have two different TOC, like the albums of the 1960s.

Here is the file for the MJP. Turns it out it was pretty easy to scrape the data. Hurray for well-designed sites.

I am fiddling with this, the main question is can I figure out a way to get a pagelist from MJP? The PDFs have named pages, but I cannot see an alternative source for the data. Maybe I'll just run that as a separate task or something if I can't figure it out soon.

I tried to look into the page number and couldn't figure it out. Sorry.

It's odd, the PDFs have them, so the data is somewhere, but no where else in the metadata I can find. So the question really is "can I get this out of the PDF sensibly before upload, or should I do it in two phases".

I think the page number is present in the TEI data as as <pb n=""/>. It would require a script to parse the TEI and convert them into a list of page numbers.