Page MenuHomePhabricator

Publish detailed Action API request information to Hadoop
Closed, ResolvedPublic

Description

Instrument MediaWiki on the WMF production cluster to send structured Action API request information to Hadoop via Kafka.

Data to collect:

DataNameType
Timestamptsint (unix epoch seconds)
Requesting IPipstring
Requesting User-AgentuserAgentstring
Wiki (wfWikiId())wikistring
Time spent processing request (ms resolution)timeSpentBackendint
Were errors encountered?hadErrorboolean
List of error codeserrorCodesarray<string>
Request parameters (name=value pairs)paramsmap<string,string>

Data will be collected by adding a new debug logging channel (ApiRequest) with structured data in the the PSR-3 context. Any MediaWiki deployment can then choose where and how to route these log messages.

For the WMF production cluster, introduce configuration to route this log channel to the local Kafka cluster in a topic that can be loaded into Hadoop.

Deployment Checklist

These steps are intended to be done in-order.

  1. Commit schema to mediawiki/event-schemas repository (gerrit)
  2. Commit submodule bump to analytics/refinery/source repository (gerrit)
  3. Commit Oozie job to create partitions to analytics/refinery repository (gerrit)
  4. Commit property changes for Camus to operations/puppet repository (gerrit)
  5. Wait for analytics to deploy new versions of refinery and refinery-source to analytics cluster
  6. T129889: Create mediawiki_ApiAction Kafka topic
  7. Go back and fix things that were done incorrectly (T108618#2132875)
  8. Commit submodule bump along with proper configuration to operations/mediawiki-config repository (gerrit)
  9. Deploy initial mediawiki-config patch to production with a sampling rate of a few events per minute for testing
  10. Verify events in Kafka are as expected. Check mediawiki logs for errors.
  11. After enough time has passed (Camus runs once per hour) verify events are showing up in HDFS
  12. Create table in Hive pointing at the events in HDFS (T129886: Create wmf_raw.ApiAction table)
  13. Submit coordinator to Oozie to auto-create partitions
  14. Adjust (or remove) sampling of events in operations/mediawiki-config repository

Original task description

log user agent in api.log
We tell clients to use an interesting user agent, but don't log it to api.php.

In T102079#1417411 Anomie commented

User agent could be included easily enough, but would need to be run by Ops for the text logfile and @bd808 for logstash (if it wouldn't already be there) to verify that it wouldn't make a prohibitive difference to the storage requirements.

Seems a simple change to ApiMain->logRequest(), ApiBase->logFeatureUsage() already logs user agent to api-feature-usage.log.

Related Objects

StatusAssignedTask
ResolvedQgil
ResolvedKeegan
DeclinedNone
ResolvedQgil
ResolvedQgil
InvalidNone
InvalidNone
ResolvedNone
DeclinedQgil
ResolvedQgil
OpenNone
OpenNone
DuplicateNone
Resolved Addshore
Declined Addshore
Resolvedbd808
ResolvedNone
ResolvedDzahn
Resolvedbd808
ResolvedJoe
ResolvedJoe
ResolvedJoe
ResolvedJAllemandou
ResolvedOttomata

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ainali removed a subscriber: Ainali.Nov 13 2015, 9:50 PM

Change 240614 merged by jenkins-bot:
Add structured API request debug logging

https://gerrit.wikimedia.org/r/240614

Anomie moved this task from In Dev to Done on the MediaWiki-API board.Jan 4 2016, 7:02 PM
Milimetric moved this task from Incoming to Radar on the Analytics board.Jan 12 2016, 7:32 PM

Change 265164 had a related patch set uploaded (by BryanDavis):
Avro schema for Action API request logging

https://gerrit.wikimedia.org/r/265164

bd808 updated the task description. (Show Details)Jan 21 2016, 4:08 PM
bd808 added a comment.Jan 21 2016, 4:10 PM

I updated the data table in the description to match guidance given during Avro schema code review: event time changed from ISO8601 to unix timestamp and camelCase naming for fields to match EventLogging conventions.

Change 265507 had a related patch set uploaded (by BryanDavis):
Update ApiAction logging channel values

https://gerrit.wikimedia.org/r/265507

Change 265164 merged by Nuria:
Avro schema for Action API request logging

https://gerrit.wikimedia.org/r/265164

bd808 updated the task description. (Show Details)Feb 27 2016, 12:32 AM

Change 273556 had a related patch set uploaded (by BryanDavis):
Add ApiAction avro schema

https://gerrit.wikimedia.org/r/273556

Change 273557 had a related patch set uploaded (by BryanDavis):
Add initial oozie job for ApiAction

https://gerrit.wikimedia.org/r/273557

bd808 updated the task description. (Show Details)Feb 27 2016, 1:10 AM

Change 273558 had a related patch set uploaded (by BryanDavis):
Camus: specify latest schema for ApiAction

https://gerrit.wikimedia.org/r/273558

Change 273559 had a related patch set uploaded (by BryanDavis):
Logging: Add ApiRequest kafka logging

https://gerrit.wikimedia.org/r/273559

bd808 updated the task description. (Show Details)Feb 27 2016, 1:19 AM
Qgil removed a subscriber: Qgil.Feb 29 2016, 9:00 AM

Change 265507 merged by jenkins-bot:
Update ApiAction logging channel values

https://gerrit.wikimedia.org/r/265507

Change 273556 merged by Nuria:
Add ApiAction avro schema

https://gerrit.wikimedia.org/r/273556

Change 273558 merged by Ottomata:
Camus: specify latest schema for ApiAction

https://gerrit.wikimedia.org/r/273558

bd808 updated the task description. (Show Details)Mar 5 2016, 3:23 PM

Change 273557 merged by Joal:
Add initial oozie job for ApiAction

https://gerrit.wikimedia.org/r/273557

bd808 updated the task description. (Show Details)Mar 14 2016, 4:07 PM
bd808 added a comment.Mar 14 2016, 4:12 PM

Now at the "Wait for analytics to deploy new versions of refinery and refinery-source to analytics cluster" stage of the checklist.

keep intending to work on a way for camus to read the schema's from outside of the jar resources ... just havn't gotten around to it yet :(

bd808 updated the task description. (Show Details)Mar 14 2016, 5:22 PM
bd808 updated the task description. (Show Details)
bd808 updated the task description. (Show Details)Mar 17 2016, 5:35 PM
bd808 updated the task description. (Show Details)Mar 17 2016, 5:38 PM

Change 273559 merged by jenkins-bot:
Logging: Add ApiRequest kafka logging

https://gerrit.wikimedia.org/r/273559

Mentioned in SAL [2016-03-17T22:04:21Z] <bd808@tin> Synchronized wmf-config/event-schemas: Add ApiRequest kafka logging (T108618) (duration: 00m 38s)

Mentioned in SAL [2016-03-17T22:05:17Z] <bd808@tin> Synchronized wmf-config/InitialiseSettings.php: Add ApiRequest kafka logging (T108618) (duration: 00m 34s)

Mentioned in SAL [2016-03-17T22:10:29Z] <bd808@tin> Synchronized wmf-config/InitialiseSettings.php: Disable ApiRequest kafka logging (T108618) (duration: 00m 31s)

Change 278136 had a related patch set uploaded (by BryanDavis):
Cast API timeSpentBackend to an int

https://gerrit.wikimedia.org/r/278136

Change 278170 had a related patch set uploaded (by BryanDavis):
Cast API timeSpentBackend to an int

https://gerrit.wikimedia.org/r/278170

Change 278180 had a related patch set uploaded (by BryanDavis):
Disable ApiRequest properly

https://gerrit.wikimedia.org/r/278180

So I messed up and named the logging channel "ApiRequest" in some places (MediaWiki, events-schemas) and "ApiAction" in others (oozie, hdfs). I'm going to have to make some patches to fix that. I think the "ApiRequest" parts will be easiest to change.

Change 278136 merged by jenkins-bot:
Cast API timeSpentBackend to an int

https://gerrit.wikimedia.org/r/278136

Change 278180 merged by jenkins-bot:
Disable ApiRequest properly

https://gerrit.wikimedia.org/r/278180

Mentioned in SAL [2016-03-17T22:42:18Z] <bd808@tin> Synchronized wmf-config/InitialiseSettings.php: Disable ApiRequest properly (T108618) (duration: 00m 27s)

Change 278187 had a related patch set uploaded (by BryanDavis):
Rename ApiRequest to ApiAction

https://gerrit.wikimedia.org/r/278187

Change 278188 had a related patch set uploaded (by BryanDavis):
Rename ApiRequest to ApiAction

https://gerrit.wikimedia.org/r/278188

Change 278170 merged by jenkins-bot:
Cast API timeSpentBackend to an int

https://gerrit.wikimedia.org/r/278170

Change 278193 had a related patch set uploaded (by BryanDavis):
Fix ApiAction record name

https://gerrit.wikimedia.org/r/278193

bd808 added a comment.EditedMar 17 2016, 11:06 PM

New checklist:

Change 278207 had a related patch set uploaded (by BryanDavis):
Rename ApiRequest to ApiAction

https://gerrit.wikimedia.org/r/278207

Change 278187 merged by jenkins-bot:
Rename ApiRequest to ApiAction

https://gerrit.wikimedia.org/r/278187

Change 278193 merged by Ottomata:
Fix ApiAction record name

https://gerrit.wikimedia.org/r/278193

Change 278188 merged by Ottomata:
Rename ApiRequest to ApiAction

https://gerrit.wikimedia.org/r/278188

Change 278346 had a related patch set uploaded (by BryanDavis):
Update mediawiki/event-schemas submodule

https://gerrit.wikimedia.org/r/278346

Change 278347 had a related patch set uploaded (by BryanDavis):
Logging: add ApiAction kafka logging

https://gerrit.wikimedia.org/r/278347

Change 278346 merged by Ottomata:
Update mediawiki/event-schemas submodule

https://gerrit.wikimedia.org/r/278346

bd808 updated the task description. (Show Details)Mar 22 2016, 11:32 PM

Change 278207 merged by jenkins-bot:
Rename ApiRequest to ApiAction

https://gerrit.wikimedia.org/r/278207

Change 278347 merged by jenkins-bot:
Logging: add ApiAction kafka logging

https://gerrit.wikimedia.org/r/278347

Mentioned in SAL [2016-03-23T16:57:30Z] <bd808@tin> Synchronized wmf-config/InitialiseSettings.php: Logging: add ApiAction kafka logging (34f236c) (T108618) (duration: 00m 28s)

bd808 added a comment.Mar 23 2016, 5:16 PM

New checklist:

bd808 updated the task description. (Show Details)Mar 23 2016, 5:17 PM
bd808 added a comment.Mar 23 2016, 5:19 PM

Verified that data is reaching Kafka from MediaWiki via kafkacat -b kafka1012 -t mediawiki_ApiAction -c 1 on stat1002. Next step is to check back after ~1h to verify that Camus is copying the events to HDFS.

bd808 updated the task description. (Show Details)Mar 25 2016, 2:51 PM
hive (wmf_raw)> describe ApiAction;
OK
col_name        data_type       comment
ts                      int
ip                      string
useragent               string
wiki                    string
timespentbackend        int
haderror                boolean
errorcodes              array<string>
params                  map<string,string>
year                    string
month                   string
day                     string
hour                    string

# Partition Information
# col_name              data_type               comment

year                    string
month                   string
day                     string
hour                    string
hive (wmf_raw)> select count(*) from ApiAction where year = 2016;
OK
_c0
133739

Change 279614 had a related patch set uploaded (by BryanDavis):
Logging: Remove sampling from ApiAction kafka channel

https://gerrit.wikimedia.org/r/279614

Change 279614 merged by jenkins-bot:
Logging: Remove sampling from ApiAction kafka channel

https://gerrit.wikimedia.org/r/279614

bd808 moved this task from Needs Review/Feedback to Done on the User-bd808 board.Apr 6 2016, 11:14 PM
Qgil awarded a token.Apr 7 2016, 6:22 AM
bd808 moved this task from Done to Archive on the User-bd808 board.Apr 13 2016, 4:08 PM

Change 240617 abandoned by EBernhardson:
Send the api request log to kafka

Reason:
this patch was more an example, the feature has been written and deployed since.

https://gerrit.wikimedia.org/r/240617