Page MenuHomePhabricator

Puppet tab in Horizon unusably slow
Open, NormalPublic

Tokens
"Like" token, awarded by Gehel."Mountain of Wealth" token, awarded by mobrovac."Manufacturing Defect?" token, awarded by Nemo_bis."Burninate" token, awarded by Joe."Like" token, awarded by scfc."Like" token, awarded by Volans.
Assigned To
None
Authored By
yuvipanda, Oct 31 2016

Description

It takes anywhere between 20-30s every time I click the 'Puppet roles' tab anywhere in Horizon (per instance, project, prefix), and it also just spins forever a good 20-25% of the time. This is sad - it isn't really a performance issue as much as a total usability issue...

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 31 2016, 4:19 PM

I can confirm all the slowness, in particular in the Puppet-related stuff on the Horizon UI but also in general in the Horizon UI, from the login to each page change.

scfc awarded a token.Dec 1 2016, 10:15 AM
Joe added a subscriber: Joe.Jan 20 2017, 2:50 PM

Today I wanted to go around horizon to check and refactor hiera keys before merging https://gerrit.wikimedia.org/r/#/c/332355/.

It was a very frustrating experience, and I think it is a good thing to just report it here.

It took 34 seconds to open https://horizon.wikimedia.org/project/prefixpuppet/ for tools

Then I had to click on one specific prefix. It took 27 seconds to get a page back.

I changed some hiera value (loading the pop up editor was slow, but in the order of 1 second), made my edit, waited 28 seconds and almost fainted: all the hiera config seemed to be gone! I quickly realized the problem was I wasn't back to the page about the same prefix, so to check my edit was ok I had to click again on that specific prefix.

It took me another 21 seconds to get that page back and confirm my edit was succesful.

So it took me overall around 2 minutes and clicking through 4 different times just to find, edit and confirm the value of one single hiera key.

Since I have to check multiple projects and several different prefixes per project, and fix in case multiple hiera keys, I quickly moved to change the values in the database directly, or it would take me one day to perform this rather simple task.

This should show everyone how the Horizon puppet UI, as it is, is utterly unusable and causes frustration and discomfort in anyone trying to use it.

Please unbreak it.

Joe triaged this task as Unbreak Now! priority.Jan 20 2017, 2:50 PM
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptJan 20 2017, 2:50 PM
Joe awarded a token.Jan 21 2017, 12:05 AM
scfc added a subscriber: scfc.Jan 21 2017, 3:06 AM

+1. The last time this bugged me I thought maybe the Puppet roles were re-read from the filesystem each time, looked at the source (modules/openstack/files/liberty/horizon/puppettab/puppet_roles.py), and, no, the data is (or at least should be) cached with Memcached for five minutes. Yet if I reload an instance's page's Puppet tab and look at Firefox's network stats, the URL (for example) https://horizon.wikimedia.org/project/instances/44fe77b0-e544-47b2-a9a4-1fc10e935090/?tab=instance_details__puppet is fetched by a script and takes ~ 10 s. As other tabs load reasonably "fast" (~ 1 s), maybe the cache is not working?

It looks like what actually takes most of the time is rendering each of the 468 rows it has to display.

Krenair added a comment.EditedJan 22 2017, 5:15 AM

It seems we can bypass all of the django/horizon template stuff going on in each row render by implementing render on UpdateRow in puppet_tables.py, taking off a few seconds. Something like this perhaps:

from django.utils.safestring import mark_safe

...
    def render(self):
        ret = '<tr' + self.attr_string + '>'
        for cell in self:
            ret += '<td' + cell.attr_string + '>' + str(cell.value) + '</td>'
        ret += '</tr>'
        return mark_safe(ret)
Joe added a comment.Jan 22 2017, 8:04 AM

I would suggest, a few things:

  1. check if the data is too large to fit on memcache (hint: they probably are). If that's the case, save them in smaller data structures. I would go as far as to suggest we store the info in mysql and update mysql data via some script that queries the puppet API asynchronously.
  2. By default, just show the classes you already added to the prefix, and just load other on demand via a search box. I don't think it's reasonable to expect anyone to get to the interface without any knowledge of our puppet tree and just "explore" to find whichever role she wants and pick it.
  3. Check if it's possible to know when to evict the classes cache by checking some metadata on the puppet API
  4. Give a thorough thought to the UX once it's fast enough that its usability flow is clearer
  5. I find it hard to believe that rendering 500 rows in a django template takes 10 seconds or so, but if that's the case, django has a lot of caching mechanisms (or what Krenair suggested)

I'd say an acceptable target for the page load speed is 2 seconds, and a good speed is around 1 second.

  • I will double-check the caching, although I'm pretty sure I verified that the cache was working previously.
  • I'm currently experimenting with the next rev of Horizon, just in case it is generally faster. I fear that the actual performance-focused rewrite isn't present until version o though
  • Possibly the best option for this is simply to remove the [all] section from the filter entirely. That will speed up the panel immensely, but will also leave users to dig in the puppet code more than they might like. If we do this we should first audit all the currently used classes and make sure they appear in the other more restrictive filters.
greg added a subscriber: greg.Jan 30 2017, 7:12 PM

UBN! for over a week?

Andrew lowered the priority of this task from Unbreak Now! to Normal.Jan 30 2017, 7:17 PM

Change 335593 had a related patch set uploaded (by Andrew Bogott):
Horizon: Only display puppet roles that have filtertags in the puppet comments.

https://gerrit.wikimedia.org/r/335593

Andrew added a comment.Feb 2 2017, 2:51 AM

Updates:

  • The cache is working properly
  • Rendering in version Mitaka is much faster, so this will be less of an issue as soon as we're able to upgrade.
  • In the meantime, I have a patch ready that prunes a lot of classes out of the GUI. This will speed things up a lot, at the cost of making the class list less discoverable.
    • Before I merge that patch I want to make an audit of all currently-used classes and make sure they're all included in the filters.
scfc added a comment.Feb 2 2017, 3:17 AM

I'm not particular fond of that idea (but can offer no alternative) because then "click here to test a role" becomes a) "submit a patch to add filtertags", wait for merge, wait for Puppet, "click" or b) DIY with "Other Classes" and "Hiera Config".

Like @Joe, I find the notion that Django takes 10 s to "render" 500 rows hard to believe. @bd808, do you have some idea if this is normal or what could be suboptimal?

Andrew added a comment.Feb 2 2017, 2:36 PM

@scfc part of why I'm conflicted about this issue is that wmf Ops keep asserting that no one would ever use this gui to discover roles that they don't already know about from reading the code. If they're right then option b) is just fine, but of course my original goal was to minimize the need for code-digging.

Regarding the slow render times... I encourage you to make yourself an account on labtestwikitech and then see if you agree that rendering is faster on labtesthorizon.

Change 335869 had a related patch set uploaded (by Andrew Bogott):
Add a bunch of filtertags to puppet class comments

https://gerrit.wikimedia.org/r/335869

Change 335869 merged by Andrew Bogott:
Add a bunch of filtertags to puppet class comments

https://gerrit.wikimedia.org/r/335869

Change 335593 merged by Andrew Bogott:
Horizon: Only display puppet roles that have filtertags in the puppet comments.

https://gerrit.wikimedia.org/r/335593

scfc added a comment.Feb 17 2017, 4:11 PM

AFAIUI, https://horizon.wikimedia.org/ has been updated to Mitaka which shows all roles, regardless of filtertags. Clicking on a Puppet tab now takes about 4 s for me compared to about 10 s prior. A 60 % decrease in load time is pretty remarkable, thanks, @Andrew.

On the page https://horizon.wikimedia.org/project/prefixpuppet/ (for the project "tools"), clicking on for example the "tab" "tools-redis" still takes 25 s. As this data is also used on the instance pages, could it be that this prefix handling needs an index or something like that?

Using th material skin still takes along time to load this tab. So some how the performance improvements weren't done for that skin.

Volans added a comment.Jun 5 2017, 2:04 PM

To add some data here, I'm getting very slow responses when opening an instance page, like https://horizon.wikimedia.org/project/instances/edbb1ea0-6e77-4159-8e6f-29886fad5dfa/, it takes around 15 seconds the first time, and then is quicker for a while, I guess until some of the results are cached. Then, to open the Puppet Configuration tab it takes another 4~5 seconds. See the timings below with the details for the instance GET:

elukey added a subscriber: elukey.Jan 11 2018, 10:20 AM
Gehel awarded a token.Apr 5 2018, 3:53 PM

Change 479755 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[openstack/horizon/wmf-puppet-dashboard@master] puppet_config: cache hiera and roles that we get from external APIs

https://gerrit.wikimedia.org/r/479755

Change 479755 merged by Andrew Bogott:
[openstack/horizon/wmf-puppet-dashboard@master] puppet_config: cache hiera and roles that we get from external APIs

https://gerrit.wikimedia.org/r/479755

Change 479759 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[openstack/horizon/wmf-puppet-dashboard@ocata] puppet_config: cache hiera and roles that we get from external APIs

https://gerrit.wikimedia.org/r/479759

Change 479759 merged by Andrew Bogott:
[openstack/horizon/wmf-puppet-dashboard@ocata] puppet_config: cache hiera and roles that we get from external APIs

https://gerrit.wikimedia.org/r/479759

Change 479761 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[openstack/horizon/deploy@ocata] Update wmf-puppet-dashboard submodule

https://gerrit.wikimedia.org/r/479761

Change 479761 merged by Andrew Bogott:
[openstack/horizon/deploy@ocata] Update wmf-puppet-dashboard submodule

https://gerrit.wikimedia.org/r/479761

Bstorm added a subscriber: Bstorm.Mar 21 2019, 5:14 PM
ema added a subscriber: ema.Aug 7 2019, 1:16 PM

Horizon really is unbearably slow, to the point of being almost unusable.

To add a data point, I've measured 16.21s simply to get the details of an instance (https://horizon.wikimedia.org/project/instances/65ef22bd-6373-4be4-aba7-d8defb7317aa/). See screenshot:

Is this just the way it is, upstream software performs terribly and there is no way to improve it? If so, is there any alternative we could switch to?

Horizon really is unbearably slow, to the point of being almost unusable.

I couldn't reproduce the issue right now.

Is this just the way it is, upstream software performs terribly and there is no way to improve it? If so, is there any alternative we could switch to?

Overall, we know our horizon setup is slow. It can be related to the underlying APIs being slow as well (the openstack APIs). We are currently running a very old version of openstack that we expect to upgrade soon (we are already working on it as part of our Q goals).

JHedden added a subscriber: JHedden.Aug 7 2019, 3:10 PM

Viewing the instance console log can occasionally take longer than expected. This process queries multiple APIs and communicates directly with the hypervisor supporting the VM, i.e. there's lots of potential places for delay and resource contention.

IMO I wouldn't blame this directly on upstream Horizon. While I wouldn't say it's fast, there are many other considerations like customizations, implementation details and utilization that can directly effect user experience.

The OpenStack CLIs are probably the best alternative. Unfortunately this isn't an option for our platform today, but it's something we're working towards in the future. T225932 T223907

Horizon really is unbearably slow, to the point of being almost unusable.

I couldn't reproduce the issue right now.

Does this ever take less than 10 seconds (my personal threshold for unbearable) to load for you?
https://horizon.wikimedia.org/project/instances/00bede6f-6352-4432-8be7-91fede12a107/?tab=instance_details__puppet

It just took 25.16 seconds for me to get that page to render:

I'm not trying to be provocative, this actually happened: I forgot what I wanted to do while waiting for the "Puppet Configuration" tab to load. You can argue that this says more about me than it does about Horizon, I accept that. :)

Yes, the puppet information in horizon is extremely slow, specially the Prefix Puppet pages. That in concrete is a known issue with no short term fix :-(
35 seconds for me:

But as I said before, I do not observe the long load time when browsing other parts of horizon:

@ema I understand you.. We are very aware that our horizon/openstack setup requires a good deal of love to get in shape, and we will get to it eventually.
Just for reference, let me mention here our phab task to track the current Q goal: T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton

bd808 added a comment.Aug 7 2019, 4:53 PM

I have dreams of a complete rewrite of our Puppet dashboard (what plugins are called in Horizon), but that is stuck behind more urgent infra updates for now on the priority queue.