Page MenuHomePhabricator

Superset's rolling average feature results in error message
Closed, ResolvedPublic3 Estimated Story Points

Description

Trying to use this option ("Advanced Analytics" --> "Rolling" --> "mean") results in the following error message: module 'pandas' has no attribute 'rolling_mean'

This is https://github.com/apache/incubator-superset/issues/5324 and was apparently fixed in https://github.com/apache/incubator-superset/pull/5328 .

Per https://github.com/apache/incubator-superset/blob/master/CHANGELOG.md , this fix was included in 0.26.2, but we are currently on 0.26.3 already. @jlinehan took a closer look, and it appears that 0.26.3 actually does not include the fix. It however is included in 0.28.0, so I'm going to mark T211605 as a subtask of this for now.

Event Timeline

We're planning on upgrading superset to the latest release once the project is a bit more stable, which won't happen at least until after the all hands. Hopefully by that release this issue will be solved.

fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Since the Superset 0.28.1 upgrade (T211605) has run aground for now, I talked with @Nuria about the possibility of patching this problem on the fork maintained by the Analytics Engineering team. I made a pull request to do just that.

Do we have a Superset test environment available that mirrors the production image? -- because this patch is a few steps above eyeballed. The patch itself should not cause instability (the only lines of code that change are lines of code that currently break anyway), but I don't want to look like a total cowboy :)

Nuria raised the priority of this task from High to Needs Triage.Feb 7 2019, 10:56 PM
Nuria moved this task from Operational Excellence to Incoming on the Analytics board.

We would like to test upstream in a staging environment before making more changes to fork, once the db cluster migration is done we can spend some cycles on testing upstream and deciding whether is worth it to port this fix to fork. Thanks your doing PR.

@jlinehan sorry for the lag in answering :(

I just updated T212243 with a plan to create a staging environment for Turnilo/Superset. If my team likes it, I'll start working on it asap. I'd prefer to dedicate resources in moving away from the Analytics fork to follow upstream releases, it is very painful and a bit messy to keep our own version of Superset. 0.29 is in RC and I hope that it will be released soon(ish), would it be acceptable for you to wait a bit more?

elukey changed the task status from Open to Stalled.Feb 26 2019, 2:58 PM
elukey lowered the priority of this task from High to Medium.

@elukey Sure, but the reason for the patch being offered in the first place was due to the uncertainty about how long an upgrade would be stalled, so we are back where we started. I was under the impression from @Nuria that the blocker on tracking upstream was migrating our own systems to Python 3 rather than the 0.29 release schedule -- are we ready to go as soon as 0.29 hits stable? This patch is quite a bit older than 0.29, so it's not as if it requires the latest and greatest.

It won't bother me to wait, since I don't use Superset to do my job, but I can't speak for the PMs and analysts who would like to do something simple like calculate a moving average and can't because the current install is broken. I personally hate maintaining forks and I feel your pain, but since this patch does not introduce novel functionality, but rather a commit that already exists in the Superset tree (slightly ahead of the current version we are forked from), and a small one at that, I don't agree that it creates a support burden. When it comes time to upgrade, this patch will wash away in the merge.

@elukey Sure, but the reason for the patch being offered in the first place was due to the uncertainty about how long an upgrade would be stalled, so we are back where we started. I was under the impression from @Nuria that the blocker on tracking upstream was migrating our own systems to Python 3 rather than the 0.29 release schedule -- are we ready to go as soon as 0.29 hits stable? This patch is quite a bit older than 0.29, so it's not as if it requires the latest and greatest.

All the details in https://phabricator.wikimedia.org/T212243 - basically we would like to set up a staging environment before attempting any other deploy to superset. The idea is to build something like 'superset-staging.wikimedia.org' that gets updated when a new upstream release comes, and it is testable by everybody to gather feedback before hitting production. This is needed since the last time we had to roll forward/back three times before getting (what we thought was) a simple deployment, so we decided to add a bit more safety to our process to avoid impacting users. The current idea is implementable in a couple of days, but we are currently blocked by Debian upstream and their install image of Buster (we are in fact planning to make Superset run directly on a OS that runs Python 3.7).

It won't bother me to wait, since I don't use Superset to do my job, but I can't speak for the PMs and analysts who would like to do something simple like calculate a moving average and can't because the current install is broken. I personally hate maintaining forks and I feel your pain, but since this patch does not introduce novel functionality, but rather a commit that already exists in the Superset tree (slightly ahead of the current version we are forked from), and a small one at that, I don't agree that it creates a support burden. When it comes time to upgrade, this patch will wash away in the merge.

I completely get your point and I am available to help as much as possible to solve this problem. If this specific case is pressing we can try to merge/test/deploy another time (without the staging environment). Would it be possible for you to ask to PMs if this is a feature that can wait or not?

Also @HaeB, do you have an example of dashboard that I can use to trigger this issue? I tried today and failed to reproduce :(

...

Also @HaeB, do you have an example of dashboard that I can use to trigger this issue? I tried today and failed to reproduce :(

Actually it works for me too now, on the same view where I previously encountered this error (result now; compare to the unsmoothened chart I ended up using instead in our presentation last month). Good news! Did anything change since the time that this bug was filed? (I'm seeing "0.26.3-wikimedia2" in https://superset.wikimedia.org/static/assets/version_info.json .)

Actually it works for me too now, on the same view where I previously encountered this error

Next time let's please add the view that encounters the error to the bug report (besides the error itself).

In any case, thanks to @jlinehan for providing fix (it might very well be we will issue appearing again) . In any case , let's wait to move to new debian for staging environment as it seems the best value for efforts.

Next time let's please add the view that encounters the error to the bug report (besides the error itself).

Similarly, it would be helpful to advertise changes to the production Superset instance. Is there a CI or deployment feed that can be subscribed to? I assume this bug vanished due to a change in Superset version or Pandas version.

As long as an upgrade is truly feasible in the short-term, I can see the engineering rationale for waiting. But I've asked around and surfaced other bugs encountered by @JKatzWMF and @mpopov and I'm sure there are more to come. The sooner we can start addressing them, the better.

Superset adoption In Audiences has been a little bumpy in part because of rough edges like this one. It's tough trying to socialize it with this level of polish. There's willingness to contribute improvements that would have a big impact on various Audiences teams, so I hope we can get this done.

If there's any way I can help your team with standing up the new environment, let me know and I'll dedicate some time.

Next time let's please add the view that encounters the error to the bug report (besides the error itself).

Similarly, it would be helpful to advertise changes to the production Superset instance. Is there a CI or deployment feed that can be subscribed to? I assume this bug vanished due to a change in Superset version or Pandas version.

We have a tracking task for that in https://phabricator.wikimedia.org/T211706. The ideal scenario that we imagined was to deploy upstream versions as they were coming out (so the list of changes was known and published by upstream), but sadly this project is really unstable now (still in the Apache incubator) so to speed up the fix of another bug we had to deploy the "wikimedia" fork, and probably some new/updated dependencies got in. The complete list of deps deployed are listed in https://gerrit.wikimedia.org/r/#/admin/projects/analytics/superset/deploy (that is basically what we deploy to the Superset host).

As long as an upgrade is truly feasible in the short-term, I can see the engineering rationale for waiting. But I've asked around and surfaced other bugs encountered by @JKatzWMF and @mpopov and I'm sure there are more to come. The sooner we can start addressing them, the better.

We are committed to provide a regular update to Superset, hopefully upstream will be able to solve their release issues in the meantime so our life will be easier (backporting patches to our fork is a real pain). The staging host will be a key piece of the Analytics infrastructure to make everything better.

Superset adoption In Audiences has been a little bumpy in part because of rough edges like this one. It's tough trying to socialize it with this level of polish. There's willingness to contribute improvements that would have a big impact on various Audiences teams, so I hope we can get this done.

Me too!

If there's any way I can help your team with standing up the new environment, let me know and I'll dedicate some time.

Thanks!

@jlinehan I am currently half way through the plan for the staging environment, one thing that we could try to do is to test https://pypi.org/project/superset/0.29.0rc7/ as first use case (rc8 is the newest tag but they haven't released it via pypi). There is still little traction on https://github.com/apache/incubator-superset/issues/6785 for their first Apache release, so we'll probably have to wait more for the first 0.29 stable. Thoughts? I believe that this task can be closed, lemme know if there are other steps todo/missing.

Once staging is setup let's go over issues users have opened for 0.29 and see how things look.

Thoughts? I believe that this task can be closed, lemme know if there are other steps todo/missing.

Yeah, since the bug this task is reporting no longer exists, I think it's fine to mark it closed, and move the discussion over to T211706 or new tasks we'll make for any bugs uncovered testing 0.29.0rc7. Thanks!

elukey set the point value for this task to 3.