Page MenuHomePhabricator

Event Logging Schema Ownership and Maintenance
Closed, ResolvedPublic


This is a task to identify existing Event Logging schemas and classify them as app/web within Reading.

Event Timeline

dr0ptp4kt raised the priority of this task from to Needs Triage.
dr0ptp4kt updated the task description. (Show Details)
dr0ptp4kt added a project: Reading-Admin.
dr0ptp4kt moved this task to Admin on the Reading-Admin board.

@dr0ptp4kt: Is this tracking the email thread? Are the results of that thread to be published on-wiki?

There's an email triggering this task, but I think this task will be the communication channel from now on.
And yes, the Analytics team will update the results in the EventLogging schema talk pages when finished.

For the record, the only thing that we need is, for each schema in the list[1]:

  1. "Owner" (go-to person for questions on the schema, for notifications, alerts, etc.)
  2. Team that the schema belongs to

[1] List of EventLogging schemas that are initially assumed to belong to the Mobile(App/Web) team:


@mforns: The MobileOperatorCode schema page has been deleted. The Readership Web team doesn't own that schema but I'd hazard a guess that the data could be archived.

@Jdlrobson: To my knowledge the MobileBetaWatchlist, MobileLeftNavbarEditCTA, and MobileWatchlistInteraction schemas are no longer used (they were all last edited in early 2013). Can we archive them? /cc @mforns

@phuedx @Jdlrobson
I created a docs spreadsheet with just the Mobile Schemas and the fields we need to fill in.
This will help in our discussion I hope, here it is:

@Jdlrobson, @kaldari: Can either of you comment on the four schemas mentioned above. I don't want to archive and delete before verifying.

9:28 AM <mforns> phuedx, BTW, so these 4 schemas (MobileBetaWatchlist, MobileLeftNavbarEditCTA, MobileOperatorCode and MobileWatchlistInteraction) have been mentioned in the task as obsolete and ok to delete, would be that correct?
9:28 AM <phuedx> as far as i can tell, yes
9:29 AM <phuedx> though i would like jdlrobson or kaldari to comment on the ticket
9:29 AM <mforns> phuedx, OK sure, thx
9:29 AM <jdlrobson> phuedx: mforns hey
9:29 AM <jdlrobson> yeh all those can die, I have never heard of MobileOperatorCode though

MobileOperatorCode can be deleted. It's superseded by MobileWikiAppOperatorCode. It's Wikipedia Zero related. I've updated the Google Sheet, accordingly.

I have added the following to the spreadsheet based on input from @bearND and @phuedx as well.

MobileWikiAppShareAFactOnboarding (not implemented yet in the app)

@mforns, is the Google Sheet now filled out to your liking? Please work with @JKatzWMF re: purging old data (as opposed to dropping tables...although there you should probably double check just in case - MobileOperatorCode *can* be safely deleted).

@dr0ptp4kt: As the Product Owner of the Reading Web (web) team for Q1, shouldn't @Jhernandez be listed as the owner for the schemas used by web and not @JKatzWMF?

Thank you guys for filling in the spreadsheet :]

@phuedx: @JKatzWMF or me (and Jon said it's him, not me) should be the "owner", but should we replace your name in there with @Jhernandez's name?

@dr0ptp4kt: Now I'm unsure as to the granularity of this. If you do, then are you going to replace @bearND as well?

@phuedx, another alternative would be to add yet another column to capture the various roles:

Owner (on the hook for data retention matters)
Engineering PO
Engineering Tech Lead

How about that? Then for web we have Jon, Joaquin, and you and for apps we have Jon, Dmitry, and Bernd?

cc @Dbrant

Thanks for pinging me. I've removed my name from MobileWikiAppReadingAction since the Android app is not using it. Maybe iOS is. I don't know.
@BGerstle-WMF: It would be good if you added yourself next to my name and and also add iOS next to Android to the rows of the schema you are using.

For us Analytics, the most important role right now is the "Owner (on the hook for data retention matters)".
But if you add the other roles to that, that's cool. We can add them to the schema talk page, too.

For us Analytics, the most important role right now is the "Owner (on the hook for data retention matters)".

In that case @dr0ptp4kt: @JKatzWMF it is!

@mforns, while @JKatzWMF is OoO is any further support from me needed? Or is this being deferred until his return?

@dr0ptp4kt, hi! Yes, please. I'd appreciate your support. Is it possible for you to have a 1-hour meeting with me before next Tuesday?

Here is the spreadsheet with the schemas marked as yours:

The desired outcome would be having all those schemas marked either for F, G or H (columns) option.
If the selected option is H, we'd like to have an alternative sanitizing plan.

To reduce the time we'll need, I would go through all schemas in the spreadsheet and identify potential sensitive information that we think should be taken care of before the meeting, and present you with it.

If this is OK with you, please let me know and I'll set up the meeting.
Or please set it up, if you want to do it. My calendar is up-to date.
Feel free to chose any hour that is good for you, I work in US hours.

Many thanks!

@mforns, that sounds good. Please schedule a meeting - my calendar is up-to-date.

Loads of thanks to @dr0ptp4kt for starting this ticket :). Is it possible to publicly document the following somewhere:

Can I move the googledoc to a wiki after maintenance?

@mforns and I worked more of the way through the spreadsheet, and we'll meet again. @mforns is going to be discussing with his team about aggregation approaches and getting another meeting going to go over the remainder of the spreadsheet.

@Moushira, for documenting who owns what, I believe is one place where info like this has been catalogued in the past. @mforns, is that where the metadata about owners and such ought to be documented in the future? Or will there be some other place? If so, @Moushira, once the exercise is complete yes, please feel free to update accordingly. I think it will be several weeks yet before the exercise is complete.

@Moushira, regarding the example you noted: @BGerstle-WMF any idea if Schema:MobileWikiAppReadingAction can just be deleted? The corresponding META schema was deleted a long time ago, so I don't see a need for the code anymore.

@Moushira, realized I failed to answer this question. It usually simplifies the data analysis. There are often particulars for one or more of the channels that make it hard to have "one schema to rule them all", people fondly refer to this concept.


Who owns which Schema?

The right place to look after the schema owner is the talk page associated with each schema page. The schema pages are listed, as @dr0ptp4kt mentioned, in However, right now, the EventLogging database is undergoing a privacy and data retention audit, and this schema list, and potentially all schema talk pages are out-of-date! When the audit ends, Analytics team will ensure that both sources are updated correctly.

and, in general who owns schemas (PM, tech lead, depends..etc)

We (analytics team) expect the schema owner to be a go-to person for notifications, alerts, questions, etc. about the schema. It's not needed that the owner knows all answers, but it is desirable that they can redirect you to the person that will. Normally the schema owner is the person who created the schema, or the PM of the team that develops it, or the tech lead that knows more about it.

Can I move the googledoc to a wiki after maintenance?

I'd say there's no need. We used a script to pull information from the talk pages and populate the spreadsheet. When the audit is done, we will use a similar script to update all schema talk pages and the schema list page. We'd like to keep the schema talk pages the single source of information on schemas.


is that where the metadata about owners and such ought to be documented in the future? Or will there be some other place?

Yes, the right place should be the schema talk page, and at the end of the audit, the analytics team is going to update them all accordingly.

Hi @dr0ptp4kt, I wrote this wiki page that tries to explain the ideas that we Analytics had on sanitizing/aggregating the discussed schemas.
Please, let me know what you think about it, if I missed something or if you have other ideas! As we combined, I'll set up a second meeting for this week, and we can discuss this options and go though the other schemas. Thanks!

@mforns, thanks. I've asked people to weigh in on the discussion page.

Looking forward to the meeting. If no input is received on the discussion page, I can weigh in on the preferred approach on my own.

Nemo_bis triaged this task as Medium priority.Jan 31 2016, 9:43 PM
Nemo_bis set Security to None.