The elasticsearch version 5 cluster is being shutdown today. Your tool account credentials have been migrated to the new cluster which can be reached at http://elasticsearch.svc.tools.eqiad1.wikimedia.cloud
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 20 2020
The elasticsearch version 5 cluster is being shutdown today. Your tool account credentials have been migrated to the new cluster which can be reached at http://elasticsearch.svc.tools.eqiad1.wikimedia.cloud
Apr 18 2020
Server groups are working now and have been re-enabled in Horizon https://gerrit.wikimedia.org/r/c/openstack/horizon/horizon/+/585802
Apr 17 2020
Apr 16 2020
Apr 15 2020
The new prometheus server is up and scraping node-exporter metrics from all the VMs in tools and cloudinfra
Command used to update security groups for tools and cloudinfra
Hi @diego looks like this ticket slipped through. We identified an issue with SSH logins on the bastion nodes that was resolved on April 13th 2020. If it happens again please let us know.
$ sudo /usr/local/sbin/maintain-replica-indexes --database grwikimedia --debug $ sudo /usr/local/sbin/maintain-views --databases grwikimedia --debug $ sudo /usr/local/sbin/maintain-meta_p --databases grwikimedia
$ /usr/local/sbin/wmcs-wikireplica-dns --aliases WARNING:mwopenstackclients.DnsManager:Creating grwikimedia.analytics.db.svc.eqiad.wmflabs. WARNING:mwopenstackclients.DnsManager:Creating grwikimedia.web.db.svc.eqiad.wmflabs.
In T250206#6059740, @Krenair wrote:Ah, is the plan to use the existing grafana-labs in prod?
Apr 14 2020
In T250206#6056586, @Krenair wrote:In T250206#6056556, @JHedden wrote:Maybe a new port with a reserved address and DNS name in addition to the local VM's network interface would be better?
I'm not sure I understand the distinction beyond the addition of a particular DNS name?
In T250206#6056475, @Krenair wrote:Doesn't it also make it impossible to fully replace the instance without ops intervention? I'd prefer we had a special DNS name that stayed the same.
The new VM is created with a dedicated network port. Having a dedicated port reserves an IP address making future architecture changes or rebuilds more flexible.
Renaming this project to 'metricsinfra' to keep consistency with the existing 'cloudinfra' name
Thanks @jcrespo! Once I have more information I'll update this task for further review and advice.
Apr 13 2020
Apr 10 2020
+1
@jcrespo and/or @Marostegui I'm looking at using MariaDB with Galera for the OpenStack services and I'm curious if you have any thoughts or recommendations. It's a fairly common architecture used by OpenStack that works nicely with multiple masters without running into the known limitations [0]
Apr 9 2020
Reminder: the version 5 elasticsearch cluster will be shutdown on April 20th, 2020.
Reminder: the version 5 elasticsearch cluster will be shutdown on April 20th, 2020.
Reminder: the version 5 elasticsearch cluster will be shutdown on April 20th, 2020.
Reminder: the version 5 elasticsearch cluster will be shutdown on April 20th, 2020.
Apr 8 2020
Apr 7 2020
I think this is the full conversation stream between xtools and de.wikipedia.org (after URI: http://xtools.wmflabs.org/api/page/articleinfo/de.wikipedia.org/Benutzer:Epischel?format=html&uselang=de)
Apr 6 2020
Apr 5 2020
The haproxy check for nova-placement on /healthcheck is generating keystone errors:
INFO keystonemiddleware.auth_token [req-74e64099-5d1e-4b5b-bdc2-c2aa950f1a8f novaadmin admin - default default] Rejecting request
Looks like there's database connection issues too
This is not related to the port work I did last week, OVS (openvswitch) is left over from the VXLAN work @aborrero is working on.
Apr 3 2020
Apr 2 2020
Changes made on cloudnet2002-dev to enable basic networking using our existing configuration
This updated interface is working as expected, thanks!
Scratch that ^, I was able to verify I can see the traffic on the other hypervisors over the 2105 VLAN.
Could you double check the interface has access to cloud-instances2-b-codfw? I'm not able to communicate on VLAN 2105 using the eno1 interface.
In T248425#6009405, @ayounsi wrote:Let me know when you want the switch ports to be re-configured.
Apr 1 2020
In the Horizon dashboard you'll need to add the role::labs::lvm::srv puppet role to your instance.
Mar 31 2020
Seems fine after soft restarting the iDRAC card with racadm racreset. If this happens again we should look at upgrading the firmware, which may require a full restart of the host.
Mar 30 2020
Tools project and misc NFS storage usage is down to 79%
Mar 27 2020
In T248610#6003245, @JHedden wrote:The missing user is currently causing the systemd-tmpfiles-clean service to fail. The ceph-common package is currently not logging anything, so as a quick fix and temporary work around I've removed the logging configuration until the service account is added.
sudo cumin 'P{R:Class = role::wmcs::openstack::eqiad1::virt}' 'rm /usr/lib/tmpfiles.d/ceph.conf && systemctl restart systemd-tmpfiles-clean'/usr/lib/tmpfiles.d/ceph.confd /run/ceph 0770 ceph ceph
CloudVPS virtual machine names must be unique across all projects. Unfortunately the name "staging" is already taken by another project, could you try again with a different host name?
Ceph-common package requirements
Mar 26 2020
The missing user is currently causing the systemd-tmpfiles-clean service to fail. The ceph-common package is currently not logging anything, so as a quick fix and temporary work around I've removed the logging configuration until the service account is added.
No longer seeing any errors in the Horizon logs.
cloudvirt1015 has crashed again using @Andrew's stress test.
Mar 24 2020
cloudvirt1008:~# /opt/hp/hpssacli/bld/hpssacli controller slot=0 pd all show
Any updates on this?
Mar 23 2020
The above patch ensures we're using the correct virtual environment and directly addresses this exception: