Page MenuHomePhabricator

Install superset on front end server for analytics
Closed, ResolvedPublic

Description

This is the tracking task. ops needs might need to sync with analytics engineering but this is used in WMF already.

Here are some reccomendations from folks working on analytics at WMF:

From Mikhail Popov:

Relevant Puppet code for ideas and reference:

Andrew Otto:

WIP design document on how SWAP works and problems with it:

From Nuria:

I would encourage @Jgreen to focus in tools we use broadly to reduce maintenance costs of updates. For exploration/daily work for analysts jupyter
notebooks is the best solution, for dashboarding superset. Superset is flexible in that it can display dat from a number of datasources, specially if presto is used with it. Also it has more sophisticated authentication as it can be used with kerberos. I would discourage setting up another dashboarding tool such us Rstudio.

Related Objects

StatusSubtypeAssignedTask
ResolvedJgreen
ResolvedJgreen

Event Timeline

DStrine created this task.Feb 20 2020, 5:12 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 20 2020, 5:12 PM
Jgreen moved this task from Triage to Up Next on the fundraising-tech-ops board.Feb 20 2020, 6:00 PM
DStrine moved this task from Triage to FR-Ops on the Fundraising-Backlog board.Feb 20 2020, 6:49 PM

@Jgreen do you have an estimated level of effort on installing Superset on the application server? We are trying to determine if this is something we should scope out and commit to being our visualization tool of choice before requesting. If the level of effort is relatively low, it would be great to have.

Nuria added a subscriber: Nuria.

Putting on radar for Analytics @Jgreen can ping us if he needs help.

Excited to collaborate on this. My first thought is to caution against seeing superset as a solution. In our data pipelines, Superset comes in at the very end as a way to see across a variety of datasets using a variety of query mechanisms. But the more important part, in my opinion, is organizing how data is ingested, transformed, packaged, maintained (aggregated, sanitized, etc.), and optimized to answer the most relevant queries. We have years of experience with those decisions, and are more than happy to share.

EYener added a subscriber: jrobell.Feb 24 2020, 7:14 PM

@Milimetric Likewise excited for collaboration! I agree that visualization is the final piece of this puzzle. In parallel to discussing a front end tool, I have been testing/working on creating and automating (via cron job) OLAP-style data cubes, based on the business units found within Fundraising, that would be able to interact with many visual front ends. I would be happy to discuss the full approach and hear more about best practices at any time!

For OLAP-style dimensional data we like Druid as a data store. So the flow there for us is:

  • Kafka -> Camus (bucket hourly) -> HDFS
  • Mediawiki MySQL -> sqoop -> HDFS
  • HDFS -> Oozie jobs transform the data -> Druid

From Druid, we visualize with either Turnilo or Superset, depending on whether exploration or long-lived dashboards are the goal.

Those pipelines are pretty battle-tested, though we are looking at replacing Oozie in the mid-term (still evaluating options). For an example, you can look at datasets in Turnilo like:

That last one is an interesting dataset, feel free to set up a call and we can talk more details.

It would be good to touch base, @Milimetric - I'll find time on your calendar for later next week.

Please feel free to add others to the meeting - or for those on the ticket, to ping the ticket if interested so I can include you!

Milimetric moved this task from Incoming to Radar on the Analytics board.Mar 2 2020, 4:58 PM
Jgreen updated the task description. (Show Details)Mar 19 2020, 4:27 PM

Spent some time digging into superset and how to get it packaged up for use on our hosts. There is no debian package, and the install process is very pip heavy, so ti will take some work. There is a published dockerfile that could be used as a basis for doing an install. Unsure how it will go but I'll try to get time in the coming weeks to take a deep dive on it.

Also did a brief overview of the SWAP/newpyter doc. Will definitely need to do more digesting to know how that process and proposed improvements might work for us.

Nuria added a comment.Apr 1 2020, 7:52 PM

@Dwisehaupt you can take advantage of the already existing puppet modules for superset. See a recent example of changes to those: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/580359/

@Nuria Thanks for those links. I hadn't dug into the puppet portions yet but will do so. Hopefully there will be some bits we could reuse given how different our setups are.

Jgreen added a comment.EditedMay 1 2020, 1:17 PM

After a bunch of refactoring and testing, I've managed to refactor the Analytics superset deploy project and puppet code for the frack environment.

In the process I found the following bug in Werkzeug 1.0.0 that breaks mariadb-backed authentication.

File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/security/manager.py", line 766, in auth_user_db
  elif check_password_hash(user.password, password):
File "/srv/superset/venv/lib/python3.7/site-packages/werkzeug/security.py", line 219, in check_password_hash
  if pwhash.count("$") < 2:
TypeError: argument should be integer or bytes-like object, not 'str'

As far as I can tell, Superset stores the password hash utf-8 encoded. Werkzeug's check_password_hash breaks at string operations that mix utf-8 and non-utf-8 strings. Here's a hack to get it working:

- if pwhash.count("$") < 2:
-    method, salt, hashval = pwhash.split("$", 2)
+ pwhash_decoded = pwhash.decode('utf-8')
+ if pwhash_decoded.count("$") < 2:
+    method, salt, hashval = pwhash_decoded.split("$", 2)

I'm not sure how we want to fix this. There are quite a few encoding-related bugs in the Werkzeug changelog, but the latest 1.0.1 does not seem to address this bug. I'm not even sure this should be considered Werkzeug bug. Maybe the password hash shouldn't be stored utf-8 encoded in the first place? Also, as far as I can tell Flask is using Werkzeug as a testing webserver, can we use a proper webserver like nginx or apache instead.

Jgreen claimed this task.May 1 2020, 6:44 PM

I worked up a local patch for werkzeug security.py which will hopefully allow us to move forward on this project until an upstream fix is available.

Jgreen added a subtask: Restricted Task.May 4 2020, 1:44 PM
Jgreen added a subtask: Restricted Task.May 4 2020, 1:50 PM
ayounsi closed subtask Restricted Task as Resolved.May 4 2020, 2:21 PM

Change 594235 had a related patch set uploaded (by Jgreen; owner: Jgreen):
[operations/dns@master] add analytics.frdev.wikimedia.org

https://gerrit.wikimedia.org/r/594235

Change 594235 merged by Jgreen:
[operations/dns@master] add analytics.frdev.wikimedia.org

https://gerrit.wikimedia.org/r/594235

Jgreen added a comment.May 4 2020, 5:47 PM

Failure at https://analytics.frdev.wikimedia.org/users/userinfo/

Sorry, something went wrong
500 - Internal Server Error
Stacktrace

        Traceback (most recent call last):
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/security/decorators.py", line 123, in wraps
    return f(self, *args, **kwargs)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/security/views.py", line 348, in userinfo
    appbuilder=self.appbuilder,
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/baseviews.py", line 272, in render_template
    template, **dict(list(kwargs.items()) + list(self.extra_args.items()))
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/templating.py", line 140, in render_template
    ctx.app,
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/templating.py", line 120, in _render
    rv = template.render(context)
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/model/show.html", line 2, in top-level template code
    {% import 'appbuilder/general/lib.html' as lib %}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/base.html", line 1, in top-level template code
    {% extends base_template %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/superset/base.html", line 19, in top-level template code
    {% extends "appbuilder/baselayout.html" %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/baselayout.html", line 20, in top-level template code
    {% import 'appbuilder/baselib.html' as baselib %}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/init.html", line 46, in top-level template code
    {% block body %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/baselayout.html", line 39, in block "body"
    {% block content %}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/model/show.html", line 25, in block "content"
    {% block show_form %}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/model/show.html", line 27, in block "show_form"
    {{ widgets.get('show')()|safe }}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/widgets.py", line 37, in __call__
    return template.render(args)
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/widgets/show.html", line 8, in top-level template code
    {{ formatter(v) if formatter else v }}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/widgets/show.html", line 18, in block "columns"
    {% else %}
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/runtime.py", line 679, in _invoke
    rv = self._func(*arguments)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/lib.html", line 271, in template
    {{ caller() }}
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/runtime.py", line 679, in _invoke
    rv = self._func(*arguments)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/widgets/show.html", line 24, in template
    {% for item in fieldset_item[1].get('fields') %}
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/runtime.py", line 679, in _invoke
    rv = self._func(*arguments)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/general/widgets/show.html", line 6, in template
    {%- set formatter = formatters_columns.get(item) -%}
TypeError: __repr__ returned non-string (type bytes)
Jgreen added a comment.May 4 2020, 5:49 PM

Failure at https://analytics.frdev.wikimedia.org/users/list/

Sorry, something went wrong
500 - Internal Server Error
Stacktrace

        Traceback (most recent call last):
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/security/decorators.py", line 123, in wraps
    return f(self, *args, **kwargs)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/views.py", line 553, in list
    self.list_template, title=self.list_title, widgets=widgets
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/baseviews.py", line 272, in render_template
    template, **dict(list(kwargs.items()) + list(self.extra_args.items()))
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/templating.py", line 140, in render_template
    ctx.app,
  File "/srv/superset/venv/lib/python3.7/site-packages/flask/templating.py", line 120, in _render
    rv = template.render(context)
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/general/model/list.html", line 20, in top-level template code
    {% import 'appbuilder/general/lib.html' as lib %}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/base.html", line 1, in top-level template code
    {% extends base_template %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/superset/base.html", line 19, in top-level template code
    {% extends "appbuilder/baselayout.html" %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/baselayout.html", line 20, in top-level template code
    {% import 'appbuilder/baselib.html' as baselib %}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/init.html", line 46, in top-level template code
    {% block body %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/baselayout.html", line 39, in block "body"
    {% block content %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/general/model/list.html", line 26, in block "content"
    {% block list_search scoped %}
  File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/general/model/list.html", line 27, in block "list_search"
    {{ widgets.get('search')()|safe }}
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/widgets.py", line 115, in __call__
    form_fields[col] = self.template_args["form"][col]()
  File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/fields/core.py", line 155, in __call__
    return self.meta.render_field(self, kwargs)
  File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/meta.py", line 56, in render_field
    return field.widget(field, **render_kw)
  File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/fieldwidgets.py", line 176, in __call__
    return super(Select2ManyWidget, self).__call__(field, **kwargs)
  File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/widgets/core.py", line 324, in __call__
    html.append(self.render_option(val, label, selected))
  File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/widgets/core.py", line 337, in render_option
    return HTMLString('<option %s>%s</option>' % (html_params(**options), escape_html(label, quote=False)))
  File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/widgets/core.py", line 31, in escape_html
    s = escape(text_type(s), quote=quote)
TypeError: __str__ returned non-string (type bytes)
Jgreen added a comment.May 4 2020, 5:49 PM

Failure at https://analytics.frdev.wikimedia.org/roles/list/

Sorry, something went wrong
500 - Internal Server Error
Stacktrace

      Traceback (most recent call last):
File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
  response = self.full_dispatch_request()
File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
  rv = self.handle_user_exception(e)
File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
  reraise(exc_type, exc_value, tb)
File "/srv/superset/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
  raise value
File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
  rv = self.dispatch_request()
File "/srv/superset/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
  return self.view_functions[rule.endpoint](**req.view_args)
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/security/decorators.py", line 123, in wraps
  return f(self, *args, **kwargs)
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/views.py", line 553, in list
  self.list_template, title=self.list_title, widgets=widgets
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/baseviews.py", line 272, in render_template
  template, **dict(list(kwargs.items()) + list(self.extra_args.items()))
File "/srv/superset/venv/lib/python3.7/site-packages/flask/templating.py", line 140, in render_template
  ctx.app,
File "/srv/superset/venv/lib/python3.7/site-packages/flask/templating.py", line 120, in _render
  rv = template.render(context)
File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
  self.environment.handle_exception()
File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
  reraise(*rewrite_traceback_stack(source=source))
File "/srv/superset/venv/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
  raise value.with_traceback(tb)
File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/general/model/list.html", line 20, in top-level template code
  {% import 'appbuilder/general/lib.html' as lib %}
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/base.html", line 1, in top-level template code
  {% extends base_template %}
File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/superset/base.html", line 19, in top-level template code
  {% extends "appbuilder/baselayout.html" %}
File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/baselayout.html", line 20, in top-level template code
  {% import 'appbuilder/baselib.html' as baselib %}
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/templates/appbuilder/init.html", line 46, in top-level template code
  {% block body %}
File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/baselayout.html", line 39, in block "body"
  {% block content %}
File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/general/model/list.html", line 26, in block "content"
  {% block list_search scoped %}
File "/srv/superset/venv/lib/python3.7/site-packages/superset/templates/appbuilder/general/model/list.html", line 27, in block "list_search"
  {{ widgets.get('search')()|safe }}
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/widgets.py", line 115, in __call__
  form_fields[col] = self.template_args["form"][col]()
File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/fields/core.py", line 155, in __call__
  return self.meta.render_field(self, kwargs)
File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/meta.py", line 56, in render_field
  return field.widget(field, **render_kw)
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/fieldwidgets.py", line 176, in __call__
  return super(Select2ManyWidget, self).__call__(field, **kwargs)
File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/widgets/core.py", line 324, in __call__
  html.append(self.render_option(val, label, selected))
File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/widgets/core.py", line 337, in render_option
  return HTMLString('<option %s>%s</option>' % (html_params(**options), escape_html(label, quote=False)))
File "/srv/superset/venv/lib/python3.7/site-packages/wtforms/widgets/core.py", line 31, in escape_html
  s = escape(text_type(s), quote=quote)
File "/srv/superset/venv/lib/python3.7/site-packages/flask_appbuilder/security/sqla/models.py", line 81, in __repr__
  return str(self.permission).replace("_", " ") + " on " + str(self.view_menu)

TypeError: str returned non-string (type bytes)

Alrighty, I finally figured out that superset does not play well with 'binary' as a database character set. Worked around this by overriding the character set for the superset database in mariadb to 'utf8'. Rolled back the local patch, which was of course not a functional fix.

ayounsi closed subtask Restricted Task as Resolved.May 5 2020, 12:39 PM
Jgreen closed this task as Resolved.May 5 2020, 4:42 PM
Jgreen moved this task from In Progress to Done on the fundraising-tech-ops board.

This is done.

Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:43 AM