Page MenuHomePhabricator

Batch load existing users
Closed, ResolvedPublic

Description

We can provide a csv file of existing users, giving username, email address, and partner, for batch loading into the database.

Event Timeline

As discussed, application submission and approval dates are also desirable.

I have the 1st cut of taylor and francis apps and user imports up on staging. I marked them all as sent to partner. The dates are not looking quite right, and I'm thinking maybe I should be loading a comment that notes that this is a batch load.

Awesome, I'll take a look. I definitely think that some flagging of these applications is a good idea.

Hm, yeah, looks like the dates are showing as the date you added them, rather than the date supplied in the data file. I also notice (it's less of a big deal), that the 'satisfies terms of use' field is showing 'no' for users whose application was imported. 'Days since application was closed' seems correct, however.

One other issue that may come up: Users join date is the date you imported their information. I don't know if this will be an issue if you then link them to another imported application that came earlier than the first you uploaded (i.e. they would have an application filed before they 'joined')

Had an opportunity to squeezed in some work on this in the morning Noted issues should be resolved except the terms of use field.

Looks great.

The only outstanding issue is that the date the applications were 'sent to partner' is today's date. Would make more sense if this was the same date as the application, as if it were immediately sent onwards. Either way this will interfere with the 'Median days from application to decision' statistic, but I think that's going to happen either way and will need a more involved fix.

That issue should be resolved. It took more work than I had hoped. That date was set by the revision system, which I had to partially implement on the importer to configure the date.

satisfies terms of use is now set to true for imported users
I updated the application model to include a flag for imported objects.
the time to decision metrics are now excluding imported applications from the calculations.
code pushed to staging.

Not seeing any outstanding issues - this looks great!

Some applications don't have email. Is username sufficient?

Leave the column in the data file, but just leave the cell for any apps without email empty.

Just ran a test on this by deleting random email addresses from the taylor and francis data. All users and apps came in just fine. Should I go ahead and run the taylor and francis import on the live site?

Great. Yes, I think so. Shall we leave the T&F data out of the full file for importing?

We also have a number of signups with no date information. Can that data be left out, or shall we fill in with 1-1-2000 or some other throwaway date?

Another (hopefully minor) complication is that sometimes a user has multiple collections. Is separating the collection numbers with a comma an ok solution?

Oh, I was assuming I'd get separate files, but a single file is much better. The importer should be able to skip over already existing applications, so you can leave all data in. Leave unknown dates empty and I'll update the code to handle it. As long as the collection field with multiple values is wrapped in quotes you can separate the values within with any character you like as long as you're consistent. Comma works for me! I'll need to update the importer to account for multivalue fields.

Can you send me an updated file with what you have so far? That will let me test all of these circumstances with real data. I'll hold off on updating production until our importer code handles all known conditions.

Okay, I've loaded the collated data file on dev, and I was able to get ~5100 apps imported. I'm currently running the import on staging, but it may take some time to complete.

It looks like we're getting everything in correctly as long as I can identify a valid user to attach the app to.
We're importing specific collections and specific titles correctly as of about 15 minutes ago.
Applications have to have timestamps because of the way the versioning system works, so in places where we didn't have dates, I set the application to the beginning of the Unix epoch, which is 12am January 1, 1970.

I've updated the importer to alter the email backend at runtime. This will prevent the production imports from sending any unanticipated emails, such as the waitlisted messages that got sent by the test import to staging.

Batch load to production done. We've pulled in over 5000 applications, with a the rest being interesting/invalid records that will need to be dealt with separately.

For all intents and purposes this is done. We just have a handful of applications which have required some manual verification.