Wed, Oct 28
After speaking with Melanie and @LeanneS I will be setting Melanie up for Superset access as well
Thanks @MNoorWMF ! So it sounds like we should be looking at the enUS segments only. Has this been for Email2 only? Or should we go back to email1?
Tue, Oct 27
Tagging @RMurthy to assist
Sure thing we can help with this. Thanks for the ping @Eileenmcnaughton .
Oct 15 2020
@RLewis I've added a Superset account for Sheyna as well.
Sep 9 2020
Sep 4 2020
@KHaggard and I discussed this last week (or so) and I wanted to post some highlights here.
subscribing @spatton for awareness as there was discussion of RML reporting on a call yesterday
Sep 3 2020
To add context, the parent group I am looking at is group_id 701 from civicrm.civicrm_group_contact. It has child groups 685,704,723,724. All members of group 701 seem to be 'deleted'.
Aug 26 2020
Cool! The uniqueness check suggested sounds good and beneficial to me. I would like to double check and make sure that it captures the first event occurrence.
Aug 25 2020
Thank you @elukey, I can!
Aug 14 2020
Hi all! I found this task because I am experiencing this same problem for the Netherlands banners. As @Pcoombe indicated in the last comment (back in '17!) this seems to be happening with long modifiers. For example, the B2021_0708_nlNL_dsk_p1_lg_txt_twin1_optIn1.no-LP.rtbt.rtbt_ideal utm_source does not join to drupal.contribution_tracking, while the B2021_0708_enNL_m_p1_lg_txt_twin1_optIn1.no-LP.paypal does.
Aug 11 2020
This query takes 42.7 seconds on last run to execute in Superset:
Aug 5 2020
Aug 3 2020
I might need more context here to give input. When does a failed recur email get sent out? What information might we want to report on and analyze? IE, would this affect multiple mediums / campaigns and would we want that level of granularity?
Jul 24 2020
Hi @Dwisehaupt I just added the locking scripts to _insert.py files, tested, and they ran just fine so I merged your commit. I think we're good to close this for now and open a new tasks as needed. Thanks again!
Jul 20 2020
Jul 13 2020
@Jgreen I'd like to set up more alerting on frdb1003 for before we make this change so that my data cube testing and runs don't cause the server to lag behind master. Any ideas you have on this - or ideas on how I can leverage existing alerting - would be great!
Jul 9 2020
Added new cube (impressions_hourly) that runs once per week on Thursdays
Jul 8 2020
@Dwisehaupt logging has been added to:
Jun 30 2020
Hi @Eileenmcnaughton I just want to confirm that after the new silverpop job runs, the tables in the silverpop DB will be overwritten with the new fields, correct? Or will the new job write to a separate db / table structure?
Jun 24 2020
Jun 18 2020
Hi @Dwisehaupt how about 9PM UTC / 5PM Eastern / 2 PM Pactific on Monday? Alternately, the same time on Tuesday or Thursday would be preferable.
Good question - let me check and get back to you. Thanks @Dwisehaupt
Great, thanks! I will mark this as resolved (I think I can do that...)
The table above is in frdb1003: analytics_ad_hoc.targetsmart_export_june_2020. However, I ended up running a tunnel through R, creating a query, and writing that query output to a CSV on the file server via RStudio. Thanks to @Dwisehaupt for the help figuring out how to connect to a volume from a local application!
Hi @Eileenmcnaughton - I'm curious which of these fields you're referring to? What complication do you mean? Asking because I'm also working with similar metrics and would like to brainstorm!
Jun 16 2020
Thank you! It's working great now.
Hmm. @Jgreen , I'm not seeing it there. I added a table, so the DB is not empty, if that helps.
Jun 15 2020
@LeanneS is able to access the list through Superset. I did an update to make the export fields more human-readable (see below). I'll do another glance through tomorrow to make sure I'm not missing anything.
Awesome! Two in one. Thanks!
Jun 10 2020
Jun 5 2020
I chatted with @LeanneS about it and it seems that keeping all communication preferences is okay for now (please correct me if this changes!)
Jun 4 2020
Adding query for transparency and any future needs:
Jun 1 2020
@Jgreen could you also add the time module, please?
May 28 2020
Hi @Eileenmcnaughton - I ran the CIDs (both deleted and kept) above through activities yesterday as a test, and I'm not getting the expected match.
May 27 2020
Hi @Jgreen I actually found a hack for current day (within the UI) by searching Stack Overflow. The time grain supports selecting "today:tomorrow" for current day reporting and I believe there are other work-arounds within the UI for month to date and year to date. It's not intuitive based on their selections but I don't believe this requires a code-level change - just more self-education!
May 21 2020
To add to the context and background on this, it is often important to in Fundraising to report on impression rate (impressions / pageviews) for a particular campaign to ensure that there are not technical issues during campaigns. Viewing this metric for a live campaign alongside other health metrics in a Fundraising dashboard would be of great value to our Creative team. @AndyRussG had a great example of monitoring impression rates by country during our last Big English - and we were just discussing this use case today - so I will let him discuss in more detail.
May 18 2020
May 13 2020
I've deleted my notebooks as well. Thank you!
May 8 2020
May 7 2020
May 6 2020
@Jgreen pointed out that we should be able to connect fr-superset to Hive without pulling data into Fundraising from the data lake itself - this might be a great option
@Jgreen what do you think of adding the new ChangeOwnPassword role to Beta and Gamma as well? I can see all new users needing to change their own passwords - it might be good to do this ahead of needing it.
That cadence works for me - thanks!
Mar 23 2020
Hi @elukey! Apologies - yes, notebook1003. This seems to have fixed the problem!
Mar 18 2020
@Jgreen we can close this as resolved. At least one data cube is running both inserts and updates on a schedule in the new ecosystem now.
My mistake - removed from the list. Thank you!
Very similar, with a few additions, to T236750.
Mar 17 2020
Mar 13 2020
Thank you, @Nuria ! It works seamlessly.
Hi @Nuria and all, we're ready to try a 'mock' data set as well. Can someone point me toward instructions on accessing and utilizing the staging environment so that I can get started with the upload? Thank you!
Mar 12 2020
Mar 5 2020
Removed test cube from cron and set up a v1 production cube on a 10 minute schedule for inserts only. Next, I'll be working on updates on the whole cube, which would be more resource-intensive.
Feb 25 2020
It would be good to touch base, @Milimetric - I'll find time on your calendar for later next week.
Feb 24 2020
@Milimetric Likewise excited for collaboration! I agree that visualization is the final piece of this puzzle. In parallel to discussing a front end tool, I have been testing/working on creating and automating (via cron job) OLAP-style data cubes, based on the business units found within Fundraising, that would be able to interact with many visual front ends. I would be happy to discuss the full approach and hear more about best practices at any time!
@Jgreen do you have an estimated level of effort on installing Superset on the application server? We are trying to determine if this is something we should scope out and commit to being our visualization tool of choice before requesting. If the level of effort is relatively low, it would be great to have.
Feb 21 2020
I have a small test cube on dev_analytics pulling down new donation IDs, contact IDs, and utm_medium on a 10 minute interval via a python3 script in my home directory. I will alter the time table to a 1 hour interval to run overnight (through Saturday, possibly the weekend) for additional timing tests, and pick this up again Monday for more full-scale trial of larger data sets.
Feb 10 2020
Feb 6 2020
I'm curious, @Ottomata and @JAllemandou, if there is an elegant solution to dynamically filling partitions. IE, once a table is created with the partition types declared and a main location established, is the best way to fill the table to declare individual ALTER statements by day, hour, etc? Or is there a better way to accomplish this?
Thanks for the suggestion, @Ottomata - I'll take it back to the team and see what makes sense. Since we've been using the json_string format since 2016, it might make sense to have the 2019 data in the same format as well.
Thank you, @JAllemandou! I am new to Hive, and was not aware I could do this myself. It worked well, though, and I appreciate the resources. You can close this ticket; much appreciated.
Thank you, @JAllemandou! I've learned several new Hive features and commands.
Feb 4 2020
Jan 24 2020
Jan 9 2020
Thank you Eileen!
Jan 7 2020
Hi @Eileenmcnaughton, could you please backfill data again? That would help me do an en6C wrap up while a fix is being worked on. Thanks!
Dec 17 2019
This is great; thank you!
@Jgreen that's a good question. What is the usual time frame for a full restore and replace process (worst case scenario)? If we successfully switch over reporting to an analytics server and rely exclusively on it for all reporting during peak times like Big English, that could be a hindrance to a number of work flows.
Dec 11 2019
Great! Thank you, @Eileenmcnaughton!
Hi @Eileenmcnaughton, do you think I can start working with this data for current (FY1920) mailings, or are things still being investigated?
Nov 27 2019
Nov 21 2019
I would also like to request Kerberos credentials for stat100x and notebook100x machines. My username is eyener. Thank you!
To clarify the expected downtime: Is it 45 minutes or 3 to 4 hours?
Nov 15 2019
@Jgreen can you install SQLAlchemy as well, please?
Oct 31 2019
Hi Eileen - I completely agree. I would prefer the data coming in to be as raw as possible so that we're future-proofed against any changes we want to make in terms of how we want to analyze the data. So if the donor opens the email multiple times, multiple opens are great to see. That second ask is more to just confirm that (which I think we're both saying) that "what the user does, the data reflects" without any wrangling, cleaning, or summarizing at this stage.