Page MenuHomePhabricator

Make Refine use JSONSchemas of event data to support Map types and proper types for integers vs decimals
Closed, ResolvedPublic13 Story Points

Description

In order to support Map types in JSON data, as well as to use the proper integer vs. decimal types, we need to examine the JSONSchema for event data during refinement.

Once done, we will be able to convert schemas with

map_field: 
  type: object
  additionalProperties:
    type: string # Or whatever the map value type is
integer_field:
  type: integer
decimal_field:
  type: number

This will allow us to use JSON data like:

{"map_field": {"key1": "val1", "key2", "val2"}, "integer_field": 2, "decimal_field": 1.2}

And automatically create Hive tables like

`map_field`     map<string,string>,
`integer_field` bigint,
`decimal_field` double

Event Timeline

Ottomata triaged this task as Medium priority.Feb 6 2019, 5:41 PM
Ottomata created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 6 2019, 5:41 PM
fdans assigned this task to JAllemandou.Feb 7 2019, 6:22 PM
fdans added a project: Analytics-Kanban.
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

@Ottomata: Double reading is way to go when we have schema discrepancies that can't be solve through casting (struct -> map).
See test ran below:

case class TestMap(
    k1: String = "v1",
    k2: String = "v2",
    k3: String = "v3",
    k4: String = "v4",
    k5: String = "v5"
)

case class TestObj(
    k1: Int = 1,
    k2: TestMap = TestMap()
)


val df = spark.createDataFrame(spark.sparkContext.parallelize((0 to 1000).map(_ => TestObj())))

// Testing cast approch
df.selectExpr("CAST(k2 AS map<string, string>)").show(10)
// org.apache.spark.sql.AnalysisException: cannot resolve '`k2`' due to data type mismatch: cannot cast struct<k1:string,k2:string,k3:string,k4:string,k5:string> to map<string,string>;


// Testing double-reading approach
df.write..mode("overwrite").json("/tmp/joal/test_json_maps")

// Checking by-default read format
spark.read.json("/tmp/joal/test_json_maps").printSchema
// Struct it is
//root
// |-- k1: long (nullable = true)
// |-- k2: struct (nullable = true)
// |    |-- k1: string (nullable = true)
// |    |-- k2: string (nullable = true)
// |    |-- k3: string (nullable = true)
// |    |-- k4: string (nullable = true)
// |    |-- k5: string (nullable = true)


// Create schema to be read using a map
import org.apache.spark.sql.types._
val testSchema = StructType(
        Seq(StructField("k1", LongType, nullable = false),
            StructField("k2", MapType(StringType, StringType, valueContainsNull = false), nullable = false)
        )
    )
// Read using predefined schema
val df2 = spark.read.schema(testSchema).json("/user/joal/test_json_maps")
// Check read schema
df2.printSchema
// \o/ We have a map
//root
// |-- k1: long (nullable = true)
// |-- k2: map (nullable = true)
// |    |-- key: string
// |    |-- value: string (valueContainsNull = true)

// Check data
df2.show(10, false)
// It looks good :)
//+---+--------------------------------------------------+
//|k1 |k2                                                |
//+---+--------------------------------------------------+
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//|1  |[k1 -> v1, k2 -> v2, k3 -> v3, k4 -> v4, k5 -> v5]|
//+---+--------------------------------------------------+
Ottomata added a comment.EditedFeb 8 2019, 3:27 PM

Ok, I'm ready to cave and go back to double reading for JSON data. :/ Thanks so much Joseph!

I must say I also pushed for stopping double reading, and continue to think so.
I wonder if having a first refine step gathering schemas and converting json to spark schemas wouldn't make more sense here (and actually apply schema changes in hive from schema change detection only, before even reading).

Not sure I understand...?

Arf - Will try to be clearer: Instead of getting schema data from reading json and double read in case the schema you get from json is not cast-able to expected schema, could we gather the "real" schema from the schema repo and convert it to spark-schema, then read the json from that? I know you'd always prefer not to, but now that we're back to double-reading, maybe it'd make more sense? (for schemas with a big number of events, double reading will be expensive !)

Hm, I guess we could! Would be faster for us to implement it here than to
figure it out how to run Kafka Connect now. Would be nice if we had a
these schemas available via HTTP now though. I suppose shipping them with
the job is ok...

Ottomata raised the priority of this task from Medium to High.Feb 21 2019, 5:53 PM

Change 492756 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] [WIP] Add JsonSchemaConverter to spark package

https://gerrit.wikimedia.org/r/492756

Change 493307 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/extensions/EventLogging@master] Allow looking up of latests JSONSchema by title without specific revid

https://gerrit.wikimedia.org/r/493307

Change 493307 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Allow looking up of latest JSONSchema by title without specific revid

https://gerrit.wikimedia.org/r/493307

Ottomata moved this task from Backlog to In Progress on the Event-Platform board.Mar 4 2019, 10:29 PM

Change 494831 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] Add SparkSchemaLoader capabilities to Refine and RefineTarget

https://gerrit.wikimedia.org/r/494831

Ottomata renamed this task from Spike: Can Refine handle map types if Hive Schema already exists with map fields? to Make Refine use JSONSchemas of event data to support Map types and proper types for integers vs decimals.Mar 6 2019, 10:17 PM
Ottomata updated the task description. (Show Details)

Change 492756 merged by Ottomata:
[analytics/refinery/source@master] Add JsonSchemaConverter to spark package

https://gerrit.wikimedia.org/r/492756

Ottomata changed the point value for this task from 0 to 8.Mar 8 2019, 5:05 PM
Ottomata claimed this task.Mar 8 2019, 5:08 PM

Change 495733 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/extensions/EventLogging@master] Use isset instead of array_key_exists to check for params

https://gerrit.wikimedia.org/r/495733

Change 495733 merged by Ottomata:
[mediawiki/extensions/EventLogging@master] Use isset instead of array_key_exists to check for params

https://gerrit.wikimedia.org/r/495733

Change 494831 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] Add SparkSchemaLoader capabilities to Refine and RefineTarget

https://gerrit.wikimedia.org/r/494831

Change 494831 merged by jenkins-bot:
[analytics/refinery/source@master] Add SparkSchemaLoader capabilities to Refine and RefineTarget

https://gerrit.wikimedia.org/r/494831

nettrom_WMF updated the task description. (Show Details)Apr 1 2019, 9:32 PM
kostajh added a subscriber: kostajh.Apr 2 2019, 5:31 PM

Change 505287 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] New Refine job to refine events using remote JSONSchemas

https://gerrit.wikimedia.org/r/505287

Change 494831 merged by Fdans:
[analytics/refinery/source@master] Add SparkSchemaLoader capabilities to Refine and RefineTarget

https://gerrit.wikimedia.org/r/494831

Change 492756 merged by Fdans:
[analytics/refinery/source@master] Add JsonSchemaConverter to spark package

https://gerrit.wikimedia.org/r/492756

Change 505287 merged by Ottomata:
[operations/puppet@production] New Refine job to refine events using remote JSONSchemas

https://gerrit.wikimedia.org/r/505287

Change 505867 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use webproxy for mediawiki_events refine job

https://gerrit.wikimedia.org/r/505867

Change 505867 merged by Ottomata:
[operations/puppet@production] Use webproxy for mediawiki_events refine job

https://gerrit.wikimedia.org/r/505867

Yahoo!

hive (event)> describe event.mediawiki_api_request;
OK
col_name	data_type	comment
_schema             	string
meta                	struct<uri:string,request_id:string,id:string,dt:string,domain:string,stream:string>
http                	struct<method:string,client_ip:string,request_headers:map<string,string>>
database            	string
backend_time_ms     	bigint
api_error_codes     	array<string>
params              	map<string,string>
datacenter          	string
year                	bigint
month               	bigint
day                 	bigint
hour                	bigint

map<string,string> :)

Does it mean we can now have a field that contains arbitrary key/values in EventLogging? If so, how do we specify it in the JSON schema?

It almost does! We're blocked by T218617 atm, but I think that code will roll out next week. Once it is out, we can fix a few offending EventLogging schemas. We can then set up a new Refine job that creates Hive tables based on the JSONSchemas, rather than inferring the Hive schemas from the event data.

We'll need to be careful to make sure the JSONSchemas are compatible with the Hive tables that already exist.  I expect most of them to be, but we need to make sure and adjust anywhere we find they aren't.

Map types aren't compatible with MySQL though.

Anyway, the map field type is specified like:

"map_field": {
  "type": "object",
  "additionalProperties": {
    "type": "string" // or whatever type your values are.
  }
}

I just updated https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines#Schema_set_up

Change 508007 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] Fix EventLoggingSchemaLoader to properly set useragent is_bot and is_mediawiki fields as booleans

https://gerrit.wikimedia.org/r/508007

Change 508008 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] EventLoggingSchemaLoader - Don't include unused timestamp field in EventCapsule

https://gerrit.wikimedia.org/r/508008

Change 508007 merged by jenkins-bot:
[analytics/refinery/source@master] Fix EventLoggingSchemaLoader to properly set useragent is_bot and is_mediawiki fields as booleans

https://gerrit.wikimedia.org/r/508007

Change 508008 merged by jenkins-bot:
[analytics/refinery/source@master] EventLoggingSchemaLoader - Don't include unused timestamp field in EventCapsule

https://gerrit.wikimedia.org/r/508008

I just ran a Refine test with JSONSchema on existing Hive table schemas Most worked fine. There were a number that failed with errors like:

java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: The 2th field 'variant' of input row cannot be null.

I believe this only happens with on required fields. I haven't looked into the real problem here (are these fields somehow null in the source data?). Need to dig deeper. Here's a list of the offending schemas:

MediaViewer
MobileWikiAppAppearanceSettings
MobileWikiAppArticleSuggestions
MobileWikiAppCreateAccount
MobileWikiAppDailyStats
MobileWikiAppEdit
MobileWikiAppFeedConfigure
MobileWikiAppFeed
MobileWikiAppFindInPage
MobileWikiAppIntents
MobileWikiAppLangSelect
MobileWikiAppLinkPreview
MobileWikiAppLogin
MobileWikiAppMediaGallery
MobileWikiAppNavMenu
MobileWikiAppOnThisDay
MobileWikiAppPageScroll
MobileWikiAppProtectedEditAttempt
MobileWikiAppRandomizer
MobileWikiAppReadingLists
MobileWikiAppSearch
MobileWikiAppSessions
MobileWikiAppShareAFact
MobileWikiAppTabs
MobileWikiAppToCInteraction
TestSearchSatisfaction2

Change 508655 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] Refine - Make all fields not required when reading data using JSONSchema

https://gerrit.wikimedia.org/r/508655

Change 508655 merged by Ottomata:
[analytics/refinery/source@master] Refine - Make all fields not required when reading data using JSONSchema

https://gerrit.wikimedia.org/r/508655

Ok, with the latest patches, I seem to be able to use the JSONSchemas to read EventLogging data and still use the existing Hive tables.

Change 508863 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] Bump up refinery version for refine.pp

https://gerrit.wikimedia.org/r/508863

Change 508863 merged by Ottomata:
[operations/puppet@production] Enable schema aware eventlogging hive refinement

https://gerrit.wikimedia.org/r/508863

yessss!

19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ClickTiming` (total # refined records: 78241)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSFeed` (total # refined records: 9295)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`WikibaseTermboxInteraction` (total # refined records: 78)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppProtectedEditAttempt` (total # refined records: 67)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppArticleSuggestions` (total # refined records: 531)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`HomepageModule` (total # refined records: 9)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppFeed` (total # refined records: 2318)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppNotificationPreferences` (total # refined records: 1)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSLoginAction` (total # refined records: 30)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSSettingAction` (total # refined records: 5)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`QuickSurveysResponses` (total # refined records: 1195)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ContentTranslationCTA` (total # refined records: 1152)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ServerSideAccountCreation` (total # refined records: 566)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CitationUsagePageLoad` (total # refined records: 596)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWebSearch` (total # refined records: 1352)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EditAttemptStep` (total # refined records: 36698)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppOnThisDay` (total # refined records: 43)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ElementTiming` (total # refined records: 32441)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CentralAuth` (total # refined records: 3618)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MediaViewer` (total # refined records: 70664)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UploadWizardErrorFlowEvent` (total # refined records: 201)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`WikidataCompletionSearchClicks` (total # refined records: 2638)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CitationUsage` (total # refined records: 20)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppLanguageSearching` (total # refined records: 353)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ContentTranslation` (total # refined records: 49)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CentralNoticeTiming` (total # refined records: 32307)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`LayoutJank` (total # refined records: 15295)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppLinkPreview` (total # refined records: 301864)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppPageScroll` (total # refined records: 10883)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppFeedConfigure` (total # refined records: 163)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`PrefUpdate` (total # refined records: 4395)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MultimediaViewerAttribution` (total # refined records: 1061)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`Popups` (total # refined records: 1)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppNavMenu` (total # refined records: 16)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EchoInteraction` (total # refined records: 3002)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`NavigationTiming` (total # refined records: 69744)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppAppearanceSettings` (total # refined records: 1016)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UploadWizardUploadFlowEvent` (total # refined records: 312)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`HomepageVisit` (total # refined records: 1)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`SearchSatisfactionErrors` (total # refined records: 247)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`GuidedTourExited` (total # refined records: 53)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UploadWizardExceptionFlowEvent` (total # refined records: 24)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ReadingDepth` (total # refined records: 3270556)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MultimediaViewerDuration` (total # refined records: 680)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EventError` (total # refined records: 1577)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`WikipediaPortal` (total # refined records: 1345)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppIntents` (total # refined records: 723)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppShareAFact` (total # refined records: 12817)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UniversalLanguageSelector` (total # refined records: 3713)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`TranslationRecommendationAPIRequests` (total # refined records: 14)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EventTiming` (total # refined records: 32311)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppWiktionaryPopup` (total # refined records: 8)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppNotificationInteraction` (total # refined records: 16)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`SaveTiming` (total # refined records: 11356)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`GuidedTourGuiderHidden` (total # refined records: 39)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CpuBenchmark` (total # refined records: 45924)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppSessions` (total # refined records: 674424)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`TestSearchSatisfaction2` (total # refined records: 25681)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSReadingLists` (total # refined records: 317)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`AdvancedSearchRequest` (total # refined records: 23028)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`HelpPanel` (total # refined records: 110)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UploadWizardStep` (total # refined records: 1199)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppInstallReferrer` (total # refined records: 888)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ContentTranslationError` (total # refined records: 4)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CentralNoticeBannerHistory` (total # refined records: 1823)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`CentralNoticeImpression` (total # refined records: 51072)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`VirtualPageView` (total # refined records: 3581226)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppToCInteraction` (total # refined records: 128318)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`GettingStartedRedirectImpression` (total # refined records: 264)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppLogin` (total # refined records: 832)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`QuickSurveyInitiation` (total # refined records: 44386)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UploadWizardFlowEvent` (total # refined records: 7)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EchoMail` (total # refined records: 454)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppEdit` (total # refined records: 1223)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppCreateAccount` (total # refined records: 214)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWebMainMenuClickTracking` (total # refined records: 5598)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppTabs` (total # refined records: 982)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSSearch` (total # refined records: 17366)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppReadingLists` (total # refined records: 4608)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ExternalGuidance` (total # refined records: 14027)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`RUMSpeedIndex` (total # refined records: 47637)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppRandomizer` (total # refined records: 167)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MultimediaViewerDimensions` (total # refined records: 3927)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppMediaGallery` (total # refined records: 422)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`GuidedTourButtonClick` (total # refined records: 318)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`VisualEditorFeatureUse` (total # refined records: 2529)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppFindInPage` (total # refined records: 532)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSSessions` (total # refined records: 14466)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppStuffHappens` (total # refined records: 1)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ContentTranslationSuggestion` (total # refined records: 80)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`WikimediaBlogVisit` (total # refined records: 45)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`PaintTiming` (total # refined records: 100002)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`GuidedTourExternalLinkActivation` (total # refined records: 37)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`UploadWizardTutorialActions` (total # refined records: 530)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ResourceTiming` (total # refined records: 47262)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MediaWikiPingback` (total # refined records: 11417)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppiOSUserHistory` (total # refined records: 1387)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppSearch` (total # refined records: 3103)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EditConflict` (total # refined records: 122)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EditorJourney` (total # refined records: 344)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppSavedPages` (total # refined records: 204)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`GuidedTourGuiderImpression` (total # refined records: 495)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ServerTiming` (total # refined records: 50503)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MultimediaViewerNetworkPerformance` (total # refined records: 3155)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`ChangesListFilterGrouping` (total # refined records: 1029)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`LandingPageImpression` (total # refined records: 18)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppLanguageSettings` (total # refined records: 636)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`EditorActivation` (total # refined records: 175)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`TwoColConflictConflict` (total # refined records: 6)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`TemplateWizard` (total # refined records: 12027)
19/05/09 18:36:56 INFO Refine: Successfully refined 1 of 1 dataset partitions into table `event`.`MobileWikiAppDailyStats` (total # refined records: 37095)

This is using EventLogging JSONSchemas!

Ottomata changed the point value for this task from 8 to 13.

I'd say this task is done. We still need to migrate all of the EventBus ones to use JSONSchemas from schema.svc, but we'll do that one at a time as we move them to eventgate-main as part of T211248: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main

Ottomata moved this task from In Progress to Done on the Event-Platform board.May 14 2019, 1:33 PM
Nuria closed this task as Resolved.May 14 2019, 8:47 PM