Page MenuHomePhabricator

Start refining all blacklisted EventLogging streams
Closed, ResolvedPublic

Description

According to [ table_blacklist_regex in refine.pp](https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/refine.pp#L41) , 2 schemas are still not being refined into Parquet tables that can queried via Hive.

We should fix this so all the EventLogging data is available in the Data Lake.

Event Timeline

nshahquinn-wmf created this task.

Ok! So just like the Edit schema data (which was migrated to EditAttemptStep), these are blacklisted because of schema nastiness or incompatibilities. I can help you figure out why each one was blacklisted when you are ready to work on this, but the likely solution will be a schema redesign, which will require clients to change how they send their data.

Ok! So just like the Edit schema data (which was migrated to EditAttemptStep), these are blacklisted because of schema nastiness or incompatibilities. I can help you figure out why each one was blacklisted when you are ready to work on this, but the likely solution will be a schema redesign, which will require clients to change how they send their data.

Thanks, that's essentially what I expected! My idea is that we could help the maintainers implement the necessary schema updates, although we haven't actually decided to prioritize that work.

If nothing else, these tickets will be a good place to point people when they ask why their data isn't in the Data Lake 😁

@Neil_P._Quinn_WMF, now that T212367 is unblacklisted, can we close this (and that) task?

@Neil_P._Quinn_WMF, now that T212367 is unblacklisted, can we close this (and that) task?

Everything seems good to me and Growth doesn't even want the schema anymore (T212367#5613242), so yes!