Page MenuHomePhabricator

How many users accessed Newspapers.com in the past 6 months?
Closed, ResolvedPublic

Description

We are considering migrating Newspapers.com back to individual account setup because of the numerous issues we're encountering with proxy configuration (T322916). To help us make decisions about this it would be helpful to know how many users accessed Newspapers.com in the past 6 months. This data isn't super straightforward to analyse, but the logs do exist.

Session logs can be accessed via our OCLC EZProxy admin interface. The .txt files have a list of session IDs per user (logging in and logging or timing out), and the files starting spu match session IDs to individual accesses.

Previous analysis on these files, for wiley.com, can be found at https://github.com/Samwalton9/OCLC-Analysis, which may be helpful in understanding how to programatically retrieve and combine these files.

Event Timeline

Note that we also have some log parsing prior art in the twlight_ezproxy repository:
https://github.com/WikipediaLibrary/twlight_ezproxy/blob/master/tools/logsearch.sh
This is in shell, so maybe less useful!

I took a look at this and forked Sam's repository to get it sort of up and running to see how close it was to usable:
https://github.com/jsnshrmn/OCLC-Analysis
It still has some work to go, but we can pick it back up later

unique ezproxy users by domain and host between 2024-01-01 and 2024-06-30:

  • newspapers.com: 1518