Page MenuHomePhabricator

How to use Wikipedia EventLogging schemas in Vagrant setup?
Closed, ResolvedPublic3 Estimated Story Points

Description

I'm trying to test the Wikipedia EventLogging schemas in my local Vagrant setup. But somehow they won't get recognized by the eventlogging server (/vagrant/logs/eventlogging.log is empty):

# /var/log/upstart/eventlogging-devserver.log 
Traceback (most recent call last):
  File "/usr/lib/python2.7/wsgiref/handlers.py", line 85, in run
    self.result = application(self.environ, self.start_response)
  File "/vagrant/srv/eventlogging/bin/eventlogging-devserver", line 156, in handle_event
    event, errors = validate(log_line)
  File "/vagrant/srv/eventlogging/bin/eventlogging-devserver", line 131, in validate
    schema = eventlogging.get_schema(scid, encapsulate=True)
  File "/vagrant/srv/eventlogging/eventlogging/schema.py", line 79, in get_schema
    validate_scid(scid)
  File "/vagrant/srv/eventlogging/eventlogging/schema.py", line 153, in validate_scid
    raise ValidationError('Invalid revision ID %s' % revision)
ValidationError: Invalid revision ID -1

I enabled the event logging role via Vagrant:

vagrant roles enable eventlogging
vagrant provision

I trigger events via JS console:

mw.loader.load( 'ext.eventLogging' ); 
mw.eventLog.logEvent( 'CentralAuth', { 'version': 1, 'action': 'sul2-autologin-login', 'userId': 1 } );
# CentralAuth is active ( https://meta.wikimedia.org/wiki/Schema_talk:CentralAuth )

I added schema API url to several config files:

$wgEventLoggingSchemaApiUri = 'https://meta.wikimedia.org/w/api.php';

# /vagrant/LocalSettings.php 
# /vagrant/settings.d/puppet-managed/05-EventLogging.php 
# /vagrant/settings.d/01-EventLogging.php

What's wrong with my setup and how can I fix it?

Event Timeline

srishakatux subscribed.

I'm also having the same issue and my setup looks exactly the same. I am using Vagrant, https://meta.wikimedia.org/w/api.php as the schema api url, enabled EventLogging role and /vagrant/logs/eventlogging.log is empty.

As per the devserver installation instructions in the EventLogging/Guide, there should be a server folder inside the extension directory. But, I don't see that in my vagrant setup and maybe these instructions are not for Vagrant setup.

As per these instructions, Extension:EventLogging#Using_mediawiki-vagrant, maybe I don't need to setup devserver separately? In any case, my eventlogging.log file is empty and there isn't eventlogging-devserver.log file generated.

Any helpful pointers on this?

fdans triaged this task as Medium priority.
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
fdans added a project: Analytics-Kanban.

TL;DR;

It looks like the eventlogging vagrant install is in a funky state because of a distribution upgrade. The nuclear option may be to blow everything away and start over. I did manage to fix my setup without too much pain, so give this a shot first:

vagrant ssh
cd /vagrant/srv/eventlogging
./virtualenv/bin/pip uninstall mysqlclient
./virtualenv/bin/pip install --no-binary mysqlclient -e .

The uninstall mysqlclient is the key, I guess it has some old incompatible binaries but it doesn't know to refresh them.

Details/Thoughts

Maybe someone can look at the puppet and see if there's a way to prevent this bug: https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/eventlogging/manifests/init.pp#L22

When I looked at /vagrant/logs/eventlogging.log, the last events I had were from March 2017, so it may have been broken for a while. Status on the service told me (sudo journalctl -u eventlogging-devserver.service --no-pager):

Jan 04 02:13:04 mediawikivagrant systemd[1]: Started EventLogging Dev Server.
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]: Traceback (most recent call last):
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/bin/eventlogging-devserver", line 4, in <module>
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     __import__('pkg_resources').require('eventlogging==0.9.post20170315')
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3036, in <module>
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     @_call_aside
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3020, in _call_aside
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     f(*args, **kwargs)
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3049, in _initialize_master_working_set
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     working_set = WorkingSet._build_master()
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 656, in _build_master
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     return cls._build_from_requirements(__requires__)
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 669, in _build_from_requirements
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     dists = ws.resolve(reqs, Environment())
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:   File "/vagrant/srv/eventlogging/virtualenv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 854, in resolve
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]:     raise DistributionNotFound(req, requirers)
Jan 04 02:13:05 mediawikivagrant eventlogging-devserver[4142]: pkg_resources.DistributionNotFound: The 'eventlogging==0.9.post20170315' distribution was not found and is required by the application
Jan 04 02:13:05 mediawikivagrant systemd[1]: eventlogging-devserver.service: Main process exited, code=exited, status=1/FAILURE
Jan 04 02:13:05 mediawikivagrant systemd[1]: eventlogging-devserver.service: Unit entered failed state.
Jan 04 02:13:05 mediawikivagrant systemd[1]: eventlogging-devserver.service: Failed with result 'exit-code'.

This led me to believe I needed to reinstall eventlogging, which is now at a much later version and I guess vagrant provision didn't know to update (but the puppet seems to do a git pull... hm). Anyway, reinstalling and fixing the problem with mysqlclient seems to do the trick.

Thanks @Milimetric for looking into this! Uninstalling mysqlclient did the trick and I am now able to see logging in the browser in the format below:

UserFeedback {event: {page_id: 6, vote: "Yes"}, revision: -1, schema: "UserFeedback", webHost: "dev.wiki.local.wmftest.net", wiki: "wiki"}

I've a couple more doubts:

Go to /vagrant/mediawiki/extensions/EventLogging/server/

I don't see a server directory. I believe the other three steps don't apply to me as I've the Vagrant setup. Still not sure where these logs are getting stored and how I can get access to them. Hope this browser testing is not affecting the production schema or the logs?

  • To gain access to production logs, I am guessing I might have to request access to analytics server and follow the Python script here?
Go to /vagrant/mediawiki/extensions/EventLogging/server/

I don't see a server directory. I believe the other three steps don't apply to me as I've the Vagrant setup. Still not sure where these logs are getting stored and how I can get access to them. Hope this browser testing is not affecting the production schema or the logs?

I'm sorry, I think what's happening is that the docs are out of sync with reality. I didn't do this so I'm guessing a little, but it looks like the devserver submodule was removed completely from the mediawiki extension and vagrant knows how to install it as a separate part of the eventlogging role. So no manual steps are required to install it, and once you fixed the python problem we discussed above, you can start the server like this:

sudo systemctl start eventlogging-devserver

And then you can follow logs like this:

sudo journalctl -u eventlogging-devserver.service -f --no-pager

I will update the docs with this. Thank you so much for finding the issue and letting me know. Ping me if anything's still broken.

  • To gain access to production logs, I am guessing I might have to request access to analytics server and follow the Python script here?

If the schema is on the whitelist, not on the blacklist, and is able to be automatically refined, then events from it will land in Hive tables for easy querying in the event Hive database. This is accessible from stat1007 or stat1004. Let me know if you need help with that or if you don't have access. The python script you reference above would pluck the events off of kafka directly, which is fine, but it's harder to do any real investigation that way.

Also, no worries, these wouldn't end up in production unless you changed the endpoint you're posting events to (by default it's localhost)

@Milimetric Thanks! I'm now able to see the journal logs. But, still running into exactly the same error as mentioned in the task description. Error as shown on the Journal log:

Jan 09 06:59:08 vagrant eventlogging-devserver[2995]: Traceback (most recent call last):
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:   File "/usr/lib/python2.7/wsgiref/handlers.py", line 85, in run
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:     self.result = application(self.environ, self.start_response)
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:   File "/vagrant/srv/eventlogging/bin/eventlogging-devserver", line 156, in handle_event
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:     event, errors = validate(log_line)
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:   File "/vagrant/srv/eventlogging/bin/eventlogging-devserver", line 131, in validate
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:     schema = eventlogging.get_schema(scid, encapsulate=True)
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:   File "/vagrant/srv/eventlogging/eventlogging/schema.py", line 94, in get_schema
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:     validate_scid(scid)
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:   File "/vagrant/srv/eventlogging/eventlogging/schema.py", line 175, in validate_scid
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]:     raise ValidationError('Invalid revision ID %s' % revision)
Jan 09 06:59:08 vagrant eventlogging-devserver[2995]: ValidationError: Invalid revision ID -1

From network tab on javascript console; I am getting 'Internal Server Error' for the request URL that looks like this:

http://dev.wiki.local.wmftest.net:8080/event.gif?%7B%22event%22%3A%7B%22page_id%22%3A1%2C%22vote%22%3A%22Yes%22%7D%2C%22revision%22%3A-1%2C%22schema%22%3A%22UserFeedback%22%2C%22webHost%22%3A%22dev.wiki.local.wmftest.net%22%2C%22wiki%22%3A%22wiki%22%7D;

Assuming that default settings in LocalSettings.php are ideal, I haven't changed anything there. Could you tell why this might be happening?

If the schema is on the whitelist, not on the blacklist, and is able to be automatically refined, then events from it will land in Hive tables for easy querying in the event Hive database. This is accessible from stat1007 or stat1004. Let me know if you need help with that or if you don't have access. The python script you reference above would pluck the events off of kafka directly, which is fine, but it's harder to do any real investigation that way.

If whitelist means those schemas, the data from which needs to be stored beyond 90-day window, then the schema we've got is on the blacklist. The initial pilot would be to deploy the gadget in a namespace and interpret the results via a custom solution. For something small as this experiment, I imagine the Python script might work for us. If this is what we should move forward with, then I am guessing I would need access to stat1007 / analytics-privatedata-users? If so, I can file a separate task for that!

Still trying to connect all these different pieces together :D

I'm sorry, I focused on the devserver problems and completely missed the more obvious error you posted. That's just a matter of registering the schema, which appears to be documented incorrectly as well, I'll have to fix that too. So in your extension.json, you need to set a top level property called "EventLoggingSchemas" that points to the revision of the schema you want to use. For example, if you're using the latest of CentralAuth available now, it would be:

"EventLoggingSchemas": {
    "CentralAuth": 5690875
}

But the helper code on the meta schema page on the top right tells you to do something completely different in extension.json, so this is entirely our fault. I'm sorry again, will try to fix this as soon as possible.

And yes, file an access request with SRE-Access-Requests for analytics-privatedata-users, you can cc me in case anyone has questions and I think the policy is to also cc your manager.

"EventLoggingSchemas": {
    "CentralAuth": 5690875
}

By registering the schema in the format you've indicated above, the revision ID error is gone, no other errors on journal logs and I am able to see the schema ID value for revision parameter in query string.

{event: {…}, revision: 18716449, schema: "UserFeedback", webHost: "dev.wiki.local.wmftest.net", wiki: "wiki"}. Any pointers on this?

But, still don't see any entry in /vagrant/logs/eventlogging.log

Also, as Lego points, just using "manifest_version": 2 instead breaks the wiki.

Maybe if you use "manifest_version": 2 you have to change the format of the whole extension.json file, couldn't find out from the docs, jumped back out of that rabbit hole. In the meantime it seems fine to stick with the syntax that works.

As for nothing showing up in eventlogging.log, that seems weird. Do you want to push a [WIP] change to gerrit and link me so I can try it in my vagrant?

@Milimetric I am working on a script for a gadget, and I've copy/pasted the code here: https://pastebin.com/uEetfGQd. So, you could copy to your custom javascript and test it that way.

@srishakatux I have good news and bad news.

The good news is that I added this to my extension.json (with "manifest_version": 1):

"EventLoggingSchemas": {
    "UserFeedback": 18716449
}

And took the mw.eventLog.logEvent code from your gadget, ran it in the console, and saw the event in /vagrant/logs/eventlogging.log right away

The bad news is I don't know what's different or wrong in your setup. Have you updated everything with vagrant git-update? If that doesn't work, I'm really not sure. Maybe just hit it with a hammer and re-install vagrant? I guess the other question is, if this is a gadget, which extension.json are you using to register the schema in? Maybe you're not doing it in a place that vagrant looks at?

@Milimetric Good news from my end too :D I am able to see the events in the log file.

vagrant git-update and re-installing the extension did the trick. Thanks for all your help!!

For future readers, if you've encountered a similar issue, here are some tips that you might find helpful:

  • Your vagrant install might be in a funky state because of a distribution upgrade. To resolve this problem, uninstall mysqlclient first.
vagrant ssh
cd /vagrant/srv/eventlogging
./virtualenv/bin/pip uninstall mysqlclient
./virtualenv/bin/pip install --no-binary mysqlclient -e .
  • Make sure to use vagrant git-update to keep your git repositories, external libraries, and database schema up to date.
  • Install the EventLogging extension with vagrant roles enable eventlogging --provision
vagrant ssh
sudo systemctl start eventlogging-devserver
sudo journalctl -u eventlogging-devserver.service -f --no-pager
  • Register your Schema by adding the following lines (change schema name and ID) in the extension.json file in the EventLogging extension folder:
"EventLoggingSchemas": {
    "CentralAuth": 5690875 #Schema ID
}

If all goes well, you should be able to see the logs in /vagrant/logs/eventlogging.log

Thank you for pointing out the issues and helping with docs @srishakatux

Nuria set the point value for this task to 3.