Page MenuHomePhabricator

Fundraising Analytics Infrastructure and Setup
Closed, ResolvedPublic

Description

Toward the goal of creating automated analytics & reporting, we have talked about several different tools that will help build out the infrastructure for automation. Here is the list thus far:

  • BI tool for visualization (Superset)
  • Development environment on a server for analyst dev and QA (Jupyter Notebooks / Labs)
  • [Potentially] An additional server to support these asks, as well as additional data structures/automation

While we are all incredibly busy this time of year, I wanted to start the discussion of what we may need to do in order begin putting these systems in place, testing, etc. Please let me know what I can provide from my end.

Event Timeline

EYener created this task.Nov 15 2019, 1:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 15 2019, 1:11 AM
Jgreen added a subscriber: Jgreen.EditedNov 15 2019, 5:36 PM

From an Ops/SRE perspective we're thinking the architecture should be a separate application server (like T237442, approx $2700) vs database server (like T237437 approx $12,700).

EBjune added a subscriber: EBjune.Nov 15 2019, 7:04 PM
DStrine moved this task from Triage to FR-Ops on the Fundraising-Backlog board.Nov 18 2019, 3:21 PM

Question for the analytics folks: what our the requirements in terms of uptime and disaster recovery? With a single database server we run daily backups and store them offhost, which would allow for a full restore but only after downtime to repair/replace the server. Is that an acceptable level of service?

@Jgreen that's a good question. What is the usual time frame for a full restore and replace process (worst case scenario)? If we successfully switch over reporting to an analytics server and rely exclusively on it for all reporting during peak times like Big English, that could be a hindrance to a number of work flows.

@Jgreen that's a good question. What is the usual time frame for a full restore and replace process (worst case scenario)? If we successfully switch over reporting to an analytics server and rely exclusively on it for all reporting during peak times like Big English, that could be a hindrance to a number of work flows.

Hardware repair can take several days, full replacement is several weeks.

Time to restore a database is heavily dependent on the size and structure of the database. The main FR database is 1.7TB between data and indexes. Copying the backup from the archive server and restoring it to mariadb takes on the order of 12H.

Per jeff we will only need 2 servers at this point:
1 @ database server,
1 @ dual-cpu misc server

Per jeff we will only need 2 servers at this point:
1 @ database server,
1 @ dual-cpu misc server

My rationale is that this is essentially read-only civi+drupal (~1.7TB with indexes) replica with a much smaller read/write analytics database along side it. As of 12/19 have enough database machines to be able to borrow one from the pool in the event of hardware failure. So the recovery path would be to restore or regenerate the smaller analytics database to another database machine from the active pool and cut over to it. This seems like a solid starting point, and as the scope of the analytics project evolves we can budget and refactor accordingly.

Jgreen mentioned this in Unknown Object (Task).Dec 17 2019, 8:50 PM
Jgreen added a subtask: Unknown Object (Task).
Jgreen renamed this task from Analytics Infrastructure and Setup to Fundraising Analytics Infrastructure and Setup.Dec 17 2019, 8:53 PM
Jgreen mentioned this in Unknown Object (Task).
Jgreen added a subtask: Unknown Object (Task).

This is great; thank you!

Jgreen added a comment.EditedFeb 6 2020, 7:37 PM

As of 2020-02-06 the plan is to host this project in eqiad, using frdb1003 (already procured T236920) and a new procurement for the web/application server (T240994). We still need the database server (frdb2001/T240993) but it will offset frdb1003, going to the fundraising-database pool to improve our codfw redundancy.

EYener added a subscriber: jrobell.Feb 10 2020, 3:12 PM
Jgreen claimed this task.Feb 11 2020, 1:10 PM
Jgreen triaged this task as Medium priority.
Jgreen moved this task from Triage to In Progress on the fundraising-tech-ops board.
Papaul closed subtask Unknown Object (Task) as Resolved.Feb 19 2020, 4:10 PM
jrobell added a subscriber: Nuria.Feb 24 2020, 5:50 PM
Jclark-ctr closed subtask Unknown Object (Task) as Resolved.Mar 4 2020, 1:02 AM
EYener updated the task description. (Show Details)Mar 17 2020, 12:36 PM

T238395 was our previous task about investigating this for FR

mpopov added a subscriber: mpopov.EditedMar 18 2020, 9:31 PM

By the way, as a demo I have a VM on Wikimedia Cloud Services' Cloud VPS that's running RStudio Server: https://rstudio-test.wmflabs.org/ (login with your Wikitech LDAP username & password)

Currently requires SSH'ing to the instance just once (ssh <your shell username>@rstudio-server-01.eqiad.wmflabs) so a homedir is auto-created for you. RStudio Server is not configured to take care of that step. See wikitech:Help:Accessing Cloud VPS instances#What you'll need for instructions on connecting to Cloud VPS.

Nuria added a comment.Mar 19 2020, 4:46 AM

I would encourage @Jgreen to focus in tools we use broadly to reduce maintenance costs of updates. For exploration/daily work for analysts jupyter notebooks is the best solution, for dashboarding superset. Superset is flexible in that it can display dat from a number of datasources, specially if presto is used with it. Also it has more sophisticated authentication as it can be used with kerberos. I would discourage setting up another dashboarding tool such us Rstudio.

I would discourage setting up another dashboarding tool such us Rstudio.

For the record RStudio is an IDE for working with R, Python, and C++. RStudio Server makes that IDE fully available in your browser.

Nuria added a comment.Mar 19 2020, 8:44 PM

Sorry, R-bashed dashboarding.

Dwisehaupt closed subtask Restricted Task as Resolved.May 7 2020, 6:06 PM
Jgreen closed subtask Restricted Task as Resolved.May 8 2020, 2:40 PM
Jgreen closed this task as Resolved.May 27 2020, 8:21 PM
Jgreen moved this task from In Progress to Done on the fundraising-tech-ops board.

The main requirements are done.