Page MenuHomePhabricator

Staging environment for upgrades of superset
Closed, ResolvedPublic8 Estimated Story Points

Description

Create staging environment to easily test superset upgrades, while we have a setup in labs it is hard to test w/o exercising dashboards over our current datasets.

The plan is composed by two parts:

  1. Upgrade the current Superset host to Debian Buster (notably with Python 3.7 to avoid issues with Superset that requires >= 3.6)
  • Create a ganeti instance called analytics-tool1004 with Debian Buster. T217640
  • Configure superset on analytics-tool1004 to use the superset-production database, and populate it with a snapshot of the superset one
  • Deploy and test superset on analytics-tool1004
  • Test superset
  • Swap traffic from analytics-tool1003 to analytics-tool1004
  • Decom analytics-tool1003
  1. Create a staging environment
  • Create a ganeti instance called an-tool1005. T217640
  • Configure superset on an-tool1005 to use the superset-staging database, and populate it with a snapshot of the superset-production one.
  • Update documentation about how to deploy/test Superset in Wikitech

Event Timeline

Milimetric triaged this task as Medium priority.Jan 3 2019, 6:37 PM
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

@Ottomata I have a proposal, let me know what you think about it:

  • we create a ganeti instance called analytics-tool-test1001 (or whatever name we think it is appropriate :)
  • we deploy superset and turnilo to it, with a single httpd instance in front of them (listening on different ports)
  • we create a superset-staging database on an-coord1001
  • we could also create two DNS names (+ Varnish config) for turnilo-staging.wikimedia.org and superset-staging.wikimedia.org (useful if we want to let people test a release candidate before going live). Tunneling via SSH is also easy so I am fine if we skip this.

The deployment workflow should become:

  • pull the new superset code into a separate branch on deploy1001, and scap deploy only to the staging host
  • do testing, let people test, etc..
  • rinse repeat for the 'production' instance

There is a caveat though - Debian Buster and Python 3.6. Would it make sense to do the following first:

  • create analytics-tool1004 with Buster when Moritz is ready
  • deploy Superset in there (even using the same database) and then, if all works, flip Varnish config to point to it
  • decom analytics-tool1003
  • create analytics-tool-test1001 with Debian Buster, and start the testing of (hopefully) 0.29 with Python 3.6
elukey changed the task status from Open to Stalled.Mar 4 2019, 10:23 AM

This task is blocked by the Debian Installer for buster not available for amd64 - https://d-i.debian.org/daily-images/daily-build-overview.html

elukey changed the task status from Stalled to Open.Mar 5 2019, 11:54 AM

Change 494473 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Assign role::analytics_cluster::superset to analytics-tool1004

https://gerrit.wikimedia.org/r/494473

Change 494473 abandoned by Elukey:
Assign role::analytics_cluster::superset to analytics-tool1004

https://gerrit.wikimedia.org/r/494473

Change 494473 restored by Elukey:
Assign role::analytics_cluster::superset to analytics-tool1004

Reason:
Wrong CR :)

https://gerrit.wikimedia.org/r/494473

elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 494473 merged by Elukey:
[operations/puppet@production] Assign role::analytics_cluster::superset to analytics-tool1004

https://gerrit.wikimedia.org/r/494473

Change 495182 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/superset/deploy@master] Add artifacts for Debian Buster and upgrade to 0.29rc7

https://gerrit.wikimedia.org/r/495182

0.29rc7 is deployed on analytics-tool1004, so far the only I issue that I found is that graphs showing data on a World Map are broken. To reproduce, it is sufficient to check the "Pageviews Overview" dashboard and see that one graph fails with: "Too many indexers"

The stack trace from the logs is:

Mar 11 11:08:03 analytics-tool1004 superset[27831]: 2019-03-11 11:08:03,985:ERROR:root:Too many indexers
Mar 11 11:08:03 analytics-tool1004 superset[27831]: Traceback (most recent call last):
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/views/base.py", line 96, in wraps
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     return f(self, *args, **kwargs)
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/views/core.py", line 1211, in explore_json
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     samples=samples,
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/views/core.py", line 1142, in generate_json
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     payload = viz_obj.get_payload()
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 374, in get_payload
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     payload['data'] = self.get_data(df)
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 1745, in get_data
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     ndf['m1'] = df[metric].iloc[:,0]
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1471, in __getitem__
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     return self._getitem_tuple(key)
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 2012, in _getitem_tuple
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     self._has_valid_tuple(tup)
Mar 11 11:08:03 analytics-tool1004 superset[27831]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 219, in _has_valid_tuple
Mar 11 11:08:03 analytics-tool1004 superset[27831]:     raise IndexingError('Too many indexers')
Mar 11 11:08:03 analytics-tool1004 superset[27831]: pandas.core.indexing.IndexingError: Too many indexers

Pandas and numpy got updated for this upgrade, as anybody can see in frozen-requirements.txt (pandas 0.22 -> 0.23.1). I am currently using the last version contained in 0.29rc8 (not published in pypi).

The following change fixes the problem:

  • vim /srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py
  • goto line 1745, and replace ndf['m1'] = df[metric].iloc[:,0] with ndf['m1'] = df[metric].iloc[:]

I tried to dump df[metric] and I can see the following:

Mar 11 09:08:49 analytics-tool1004 superset[8195]: 0      97187689.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 1      26184404.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 2      20536118.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 3      10864359.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 4       7827900.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 5       7514454.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 6       6213216.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 7       5776987.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 8       3284091.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 9       3120546.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 10      2843286.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 11      2538632.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 12      2520153.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 13      2272735.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 14      2154346.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 15      1945710.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 16      1900448.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 17      1813370.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 18      1727824.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 19      1562032.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 20      1554327.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 21      1326832.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 22      1303095.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 23      1268391.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 24      1210362.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 25      1075065.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 26       986889.0
Mar 11 09:08:49 analytics-tool1004 superset[8195]: 27       959068.0

Didn't find any upstream issue so far, I guess that either something in pandas changed or our dashboards might need an upgrade. I tried to make everything build with pandas 0.22 (our current version) plus its related numpy but the build yields to compile errors for numpy (I think due to Python 3.7 but I need to investigate).

To quickly test superset: ssh -L 9080:analytics-tool1004.eqiad.wmnet:80 analytics-tool1004.eqiad.wmnet and then localhost:9080

I added some logging on the main superset instance. This is what df[metric] looks like for the pageview dashboard:

Mar 11 14:32:35 analytics-tool1003 superset[8338]: 2019-03-11 14:32:35,615:INFO:root:ilo:      sum__view_count  sum__view_count
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 0         97187689.0       97187689.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 1         26184404.0       26184404.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 2         20536118.0       20536118.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 3         10864359.0       10864359.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 4          7827900.0        7827900.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 5          7514454.0        7514454.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 6          6213216.0        6213216.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 7          5776987.0        5776987.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 8          3284091.0        3284091.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 9          3120546.0        3120546.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 10         2843286.0        2843286.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 11         2538632.0        2538632.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 12         2520153.0        2520153.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 13         2272735.0        2272735.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 14         2154346.0        2154346.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 15         1945710.0        1945710.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 16         1900448.0        1900448.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 17         1813370.0        1813370.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 18         1727824.0        1727824.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 19         1562032.0        1562032.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 20         1554327.0        1554327.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 21         1326832.0        1326832.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 22         1303095.0        1303095.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 23         1268391.0        1268391.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 24         1210362.0        1210362.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 25         1075065.0        1075065.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 26          986889.0         986889.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 27          959068.0         959068.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 28          926334.0         926334.0
Mar 11 14:32:35 analytics-tool1003 superset[8338]: 29          923704.0         923704.0
[..]

This is on the 0.29rc7 version:

Mar 11 14:35:35 analytics-tool1004 superset[27205]: 2019-03-11 14:35:35,070:INFO:root:iloc: - 0      97187689.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 1      26184404.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 2      20536118.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 3      10864359.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 4       7827900.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 5       7514454.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 6       6213216.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 7       5776987.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 8       3284091.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 9       3120546.0
Mar 11 14:35:35 analytics-tool1004 superset[27205]: 10      2843286.0
[..]

This comment on the code might be useful:

# df[metric] will be a DataFrame
# because there are duplicate column names

Maybe it doesn't hold anymore?

EDIT: in theory though ilo[:,0] should return the first column, meanwhile now it doesn't due to indexing error. I am wondering if it is not a well formed tuple anymore.

Turned out that simply re-creating the world maps from scratch worked fine!

EDIT: nope! It is the "bubble" functionality (bottom left of the chart edit panel) that triggers the issue!

I tried to force pandas to 0.22 but this requires numpy 1.13.2, that is not compatible (apparently) with Python 3.7, leading to compilation errors while building the numpy's C stuff.

Change 495182 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/superset/deploy@master] Add artifacts for Debian Buster and upgrade to 0.31.0rc18-wikimedia1

https://gerrit.wikimedia.org/r/495182

The 0.29/0.31 releases are broken for:

https://github.com/apache/incubator-superset/issues/7171 was generated by our setup.py script running webpack. I removed the step and ran pypi_push.py instead, all got fixed.

Very good news, it seems that the Superset project has finally found a way to release under Apache license. I upgraded today to 0.32rc2 (that it is currently being voted to be released by upstream) and most of the above issues are still there. Going to ping again upstream to see if all the issues could be fixed and added to the next release.

Change 506179 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::database::meta: add properties to my.cnf

https://gerrit.wikimedia.org/r/506179

Change 506179 merged by Elukey:
[operations/puppet@production] profile::analytics::database::meta: add properties to my.cnf

https://gerrit.wikimedia.org/r/506179

Change 510310 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Move superset.wikimedia.org to analytics-tool1004

https://gerrit.wikimedia.org/r/510310

Mentioned in SAL (#wikimedia-operations) [2019-05-15T08:36:31Z] <elukey> stop superset on analytics-tool1003 as prep step for the migration to the new host - T212243

Change 495182 merged by Elukey:
[analytics/superset/deploy@master] Add artifacts for Debian Buster and upgrade to 0.32rc2

https://gerrit.wikimedia.org/r/495182

Change 510447 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/superset/deploy@master] Upgrade pyhive to 0.6.1

https://gerrit.wikimedia.org/r/510447

Change 510447 merged by Elukey:
[analytics/superset/deploy@master] Upgrade pyhive to 0.6.1

https://gerrit.wikimedia.org/r/510447

Change 510310 merged by Elukey:
[operations/puppet@production] Move superset.wikimedia.org to analytics-tool1004

https://gerrit.wikimedia.org/r/510310

Change 510502 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set analytics-tool1004 as primary superset host

https://gerrit.wikimedia.org/r/510502

Change 510502 merged by Elukey:
[operations/puppet@production] Set analytics-tool1004 as primary superset host

https://gerrit.wikimedia.org/r/510502

Change 511699 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/superset/deploy@master] scap: update targets

https://gerrit.wikimedia.org/r/511699

Change 511699 merged by Elukey:
[analytics/superset/deploy@master] scap: update targets

https://gerrit.wikimedia.org/r/511699

elukey moved this task from Paused to Done on the Analytics-Kanban board.
elukey set the point value for this task to 5.
elukey changed the point value for this task from 5 to 3.
elukey changed the point value for this task from 3 to 8.