Page MenuHomePhabricator

Upgrade to Superset 1.0
Closed, ResolvedPublic

Assigned To
Authored By
elukey
Jan 19 2021, 3:41 PM
Referenced Files
F34183938: image.png
Mar 24 2021, 5:49 AM
F34148484: image.png
Mar 9 2021, 10:32 PM
F34124131: image.png
Feb 26 2021, 8:48 PM
F34124134: image.png
Feb 26 2021, 8:48 PM
Tokens
"Party Time" token, awarded by EYener.

Description

Superset 1.0 passed the voting and it is now a release! Once available in https://pypi.org/project/apache-superset/ we should upgrade :)

Event Timeline

fdans triaged this task as Medium priority.Jan 21 2021, 5:49 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

We should talk as a team on how to manage and own Superset upgrades going forward. Razzi will bring this up in a meeting.

I tried to deploy superset to the staging box, but it failed with

aiohttp-3.7.3-cp37-cp37m-manylinux2014_x86_64.whl is not a supported wheel on this platform.

Furthermore the rollback failed with

Rollback all deployed groups? [Y/n]: y
17:38:42
== DEFAULT ==
:* an-tool1005.eqiad.wmnet
17:40:45 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/superset/deploy', '--force', '-g', 'default', 'rollback', '--refresh-config'] on an-tool1005.eqiad.wmnet returned [70]: Rolling back from revision c7147389288b54b0a25c3b3e9d64b9ebed32e638 to 828ef03a8b0bc1b8a59c63f50b4c7ff131183bbb
Removing old revision /srv/deployment/analytics/superset/deploy-cache/revs/5d4f5ac8530e10c468ab9156f52daf6b96ed29ee
Restarting service 'superset'
Port 9080 not up. Waiting 3.00s
...
Port 9080 not up. Waiting 3.00s
Unhandled error:
deploy-local failed: <OSError> {}

analytics/superset/deploy: rollback stage(s): 100% (ok: 0; fail: 1; left: 0)
17:40:45 1 targets had deploy errors
17:40:45 1 targets failed

So staging superset is currently unavailable.

Ok, the problem was that I had upgraded the pip version in the docker container when building the wheels, which made the wheels incompatible with the staging server.

I was able to keep the old pip version and build all packages by using apt to install cargo, which was needed by the python cryptography package.

Now I'm running into an error when following the instructions to upgrade superset (https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Upgrading):

(venv) superset@an-tool1005:/home/razzi$ superset db upgrade
Loaded your LOCAL configuration at [/etc/superset/superset_config.py]
logging was configured successfully
...
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1709, 'Index column size too large. The maximum column size is 767 bytes.')
[SQL: CREATE INDEX ix_row_level_security_filters_filter_type ON row_level_security_filters (filter_type)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

I tried creating the mysql database with

create database superset_staging DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci

rather than the documented

create database superset_staging DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

but got the same error. I'm not a mysql expert, @elukey @Ottomata any ideas?

Not sure what the 'right' thing to do is, but a quick search for that error brought me to https://stackoverflow.com/questions/30761867/mysql-error-the-maximum-column-size-is-767-byteswith some suggestions. Been a while since I did MySQL stuff too, do we use innodb file format barracuda?

I have something in my notes: https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Superset#Upgrade_DB

We already have innodb_file_format=Barracuda in the config, can you check if doing ALTER TABLE row_level_security_filters ROW_FORMAT=DYNAMIC works? (Assuming that row_level_security_filters is a table of course)

MariaDB [(none)]> show global variables like 'innodb_file_format';
+--------------------+-----------+
| Variable_name      | Value     |
+--------------------+-----------+
| innodb_file_format | Barracuda |
+--------------------+-----------+
1 row in set (0.00 sec)

@razzi remember also to drop/re-create the staging database as we did the last time so we'll have a more up to date version to do testing of dashboards etc..

ALTER TABLE row_level_security_filters ROW_FORMAT=DYNAMIC; fixed it, thanks! Here's the full procedure so the order is clear:

On an-coord1001:

$ sudo mysql
> drop database superset_staging;
> create database superset_staging DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
> exit
$ sudo sh -c 'mysqldump superset_production > superset_production_$(date +%s).sql'
$ sudo mysql superset_staging < superset_production_1613751453.sql
$ sudo mysql superset_staging
> ALTER TABLE row_level_security_filters ROW_FORMAT=DYNAMIC;

On an-tool1005:

razzi@an-tool1005:~$ sudo su superset
superset@an-tool1005:/home/razzi$ source /srv/deployment/analytics/superset/venv/bin/activate
(venv) superset@an-tool1005:/home/razzi$ export PYTHONPATH=/etc/superset
(venv) superset@an-tool1005:/home/razzi$ superset db upgrade
(venv) superset@an-tool1005:/home/razzi$ superset init

Found a client-side error: when creating a new chart from pageviews_hourly, when attempting to add a metric in the chart creator, the frontend app crashes.

image.png (908×1 px, 162 KB)

image.png (1×3 px, 953 KB)

I'm guessing this has to do with the frontend needing to be recompiled. There's a step in the deploy frozen-requirements-custom-build.txt which seems to relate to this:

# Inside docker:
# cd /superset_upstream/superset-frontend/
# npm ci && npm run build

I'll see if I can run that in docker and if it produces artifacts that fix the error.

Hm, I see now that the built javascript files are contained within the superset 1.0.1 wheel, so they should be in sync.

Tested with another data source and another browser (originally used Firefox, Safari also shows error) and each setup repro'd the error.

@razzi great finding, if you don't manage to solve the problem I suggest to check https://github.com/apache/superset/issues to see if anything is outstanding for Superset 1.1 and if not, to open an issue :)

I installed superset from source to run the frontend in development mode, and reported the error I found upstream: https://github.com/apache/superset/issues/13396

Found another bug in 1.0.1:

Viewing https://superset.wikimedia.org/superset/dashboard/165/, the top left chart "Impression Count Pie Chart | Banners Selected | FY2021 | India Campaign" loads in production; in staging, it errors.

image.png (398×1 px, 58 KB)

Here's the traceback from the logs:

File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/common/query_context.py", line 399, in get_df_payload
  query_result = self.get_query_result(query_obj)
File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/common/query_context.py", line 106, in get_query_result
  result = self.datasource.query(query_object.to_dict())
File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/sqla/models.py", line 1295, in query
  query_str_ext = self.get_query_str_extended(query_obj)
File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/sqla/models.py", line 766, in get_query_str_extended
  sqlaq = self.get_sqla_query(**query_obj)
File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/sqla/models.py", line 1074, in get_sqla_query
  eq = utils.cast_to_num(flt["val"])
KeyError: 'val'

This one has been reported upstream at https://github.com/apache/superset/issues/13229, and there is PR open, so it should be resolved soon.

Change 670959 had a related patch set uploaded (by Razzi; owner: Razzi):
[operations/puppet@production] superset: allow analytics networks through staging firewall

https://gerrit.wikimedia.org/r/670959

Change 670959 merged by Razzi:
[operations/puppet@production] superset: allow analytics networks through staging firewall

https://gerrit.wikimedia.org/r/670959

@Ottomata and I enabled forwarding traffic from analytics hosts, so teams like Product Analytics with access to the stat boxes will be able to run ssh -NL 8080:an-tool1005.eqiad.wmnet:80 stat1004.eqiad.wmnet to test things out on staging. I'll present this at the next PA sync.

Change 665130 had a related patch set uploaded (by Razzi; owner: Razzi):
[analytics/superset/deploy@master] Upgrade superset to 1.0.1

https://gerrit.wikimedia.org/r/665130

I built the latest superset wheel from source so it'd have the fix for https://github.com/apache/superset/issues/13229, but the wheel is too large to be uploaded to gerrit:

$ git review
error: Object too large (107,217,599 bytes), rejecting the pack. Max object size limit is 104,857,600 bytes.

While I'm sure we could find a workaround, I think it's a good time to think installing python packages from a package index, or moving deployment to kubernetes.

@EBernhardson somehow uses archiva.wikimedia.org.

+1 for k8s.

Adding some notes from IRC: Superset is kerberized so the move to kubernetes is a little trickier, since we would need to figure out how that works (passing the keytab, allowing superset to authenticate with the KDC, etc..)

Alright, building the superset wheel from the latest upstream source on our build server deneb.codfw.wmnet produced a wheel that is 25M, which is in line with the size of the upstream wheels. Not sure why that one wheel I built was 100M, but it looks like we fortunately don't have to use git lfs... yet. The latest superset source is deployed on staging (an-tool1005) and the upstream error https://github.com/apache/superset/issues/13229 has gone away.

Also the error that was causing the frontend to crash when editing charts using druid datasources now only causes the metric selector component to crash, so a user could fix things by changing the dataset, rather than seeing a blank page as I originally did. Screenshot:

image.png (988×1 px, 157 KB)

razzi changed the task status from Open to Stalled.Mar 24 2021, 5:57 PM

Moving this to paused while I await feedback from Product Analytics. Only known issues are with druid datasources, which can be migrated to druid tables.

@razzi if you have tested Superset and it is good, let's add a deadline for the feedback and set an upgrade day (so we don't wait too much and we can move on)

Good idea @elukey, posted in slack channel product-analytics and #wikimedia-analytics irc, and I'll share here as well: feedback is open for testing on superset staging and unless any blockers are found by next Tuesday 3/30, we'll release to superset on Wednesday 3/31.

razzi changed the task status from Stalled to Open.Mar 24 2021, 8:52 PM

Also I don't think "Stalled" is what I was looking for; I found a better action moving this to "In Code Review" on the kanban.

Change 665130 merged by Razzi:

[analytics/superset/deploy@master] Upgrade superset to 1.0.1

https://gerrit.wikimedia.org/r/665130

Ok, I released superset 1.0! I'll keep this open for now, for reporting any regressions.

I've noticed a couple of issues related to annotation layers:

  1. The annotation text is shown on a transparent background with default font color, so it blends into other text on the screen. My recollection of pre-1.0 was that it used inverse colours, making it easy to see what the annotation was.
  2. For time-series bar charts, the transparent annotation layer is laid on top of the chart. This means that it's no longer possible to hover over the bar to see what the value is. It would be nice to have the bar on top instead, but I'm not sure if that's even possible.

Here's a dashboard that shows these two issues.

Hmm, I definitely see these issues @nettrom_WMF; thanks for commenting. I'll see if upstream is aware and file a bug if not.

Alright, I reported the annotation text one upstream at https://github.com/apache/superset/issues/13959. I'll look into the other issue another day.