Was able to get a puppet run on elastic2088, but since that run a couple hours ago the host is ssh unreachable (it hangs indefinitely). Seeing some concerning stuff in the drac via getsel on elastic2088.mgmt.codfw.wmnet:
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Yesterday
Mon, Apr 15
Added further context to the SLI section of the documentation explaining what each query type actually means. I believe there's no more oustanding TODOs on this task.
In T339347#9565775, @Hannah_Bast wrote:@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:
PREFIX schema: <http://schema.org/> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT DISTINCT ?editor ?editorName WHERE { SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata> { wd:Q113544723 wdt:P179 ?editor. ?editor schema:name ?editorName. } }
I vote that we choose the default based on the cluster being operated on. So 3 for eqiad and codfw, 2 for cloudelastic and 1 for relforge.
Mon, Apr 8
Created subtask for dc-ops' side of the decom. Resolving this parent ticket now.
Thu, Apr 4
Tue, Apr 2
With dc-ops having closed out the decom subtask, this should be all done.
Thu, Mar 28
In T353878#9664756, @Volans wrote:elastic2088 is unreachable and reported as missing from PuppetDB by Netbox report. No host should be powered on with puppet disabled or not working for longer period of time. Please either reimage it or shut it down now and reimage it at a later stage (before powering it on).
In T358046#9670564, @VRiley-WMF wrote:@RKemper Thanks for bringing this up! I missed running the script for this device. It's been run and decommissioned.
@VRiley-WMF In netbox I see cloudelastic1003 still listed as decommissioning, whereas the other cloudelastic hosts are marked as Offline. Is it just the step to set netbox status to Offline that we're missing or are there other steps that still need to be run on cloudelastic1003 as well?
Thu, Mar 21
Glancing at ryankemper@apifeatureusage1001:~$ sudo journalctl -u curator_actions_apifeatureusage_eqiad:
Wed, Mar 20
Had forgotten to properly assign dc-ops as well as tag for the DC. Straightened that out now, so this should be ready for dc-ops to do the decom.
Mar 15 2024
Mar 13 2024
With https://github.com/wikimedia/restbase/pull/1336 being merged, is this ticket now resolved?
Mar 4 2024
Mar 1 2024
This should be all done; @TJones can you confirm all is working as it should?
Feb 29 2024
@fnegri @brouberol Yeah, Brian and I will work on getting this tested and merged. Thanks for the heads up!
Feb 21 2024
@HinMar Okay, I think we've got the endpoints properly allowed. Queries appear to be working for me. Are you seeing the same?
@Hannah_Bast Okay, we figured out what was making the allowed endpoints not updated properly. https://w.wiki/6q2i doesn't get an error message anymore, although the query itself returns no results.
@Loz.ross Yes the change to allowed endpoints did not get properly deployed; we've fixed that now. However there's another issue now that we've gotten past the Service URI not being allowed:
Feb 20 2024
Feb 15 2024
Finished the upload process; next up is rolling restart of cluster and merge of https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-elasticsearch-image/-/merge_requests/7?commit_id=f1028a26dff38603bea67a8edda8337dab07bbfc
Feb 6 2024
Added new threshold markers at 95% for the 4 SLO graphs. We may want to revise the % SLO upwards, but let's stick with 95% for now until we get another quarter of data.
Feb 1 2024
In T351488#9505091, @HinMar wrote:@RKemper : Thank you for your message. The project has ended, but we still kindly ask you to whitelist this endpoint. We at the Trier Center for Digital Humanities will continue to work with LOD beyond this one project and would be delighted to be able to run federated queries starting from Wikidata directed towards the MiMoTextBase. We have developed an approach in MiMoText that we now want to transfer and adapt for other domains in a new project (“LODinG” – Linked Open Data in the Humanities). We are still very interested in the 'wikiverse' and in gaining as much experience as possible in the area of 'federation', which we see as the absolute key to the LOD vision. We are also planning to provide a showcase and if the whitelisting could be done relatively soon, we would like to include this new 'federation direction'. Can you estimate how long it will take?
In T339347#9504488, @Hannah_Bast wrote:Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.
For example, https://qlever.cs.uni-freiburg.de/api/dblp?query=SELECT+%2A+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10 gives you the results for SELECT * WHERE { ?s ?p ?o } LIMIT 10 as application/sparql-results+json .
Jan 31 2024
@Loz.ross Sorry for the delay, we've added the endpoint. Can you confirm it's working with an example query?
In T339347#9431527, @Nikki wrote:Could https://qlever.cs.uni-freiburg.de/api/wikimedia-commons also be added?
It looks like https://qlever.cs.uni-freiburg.de/api/wikidata is working now (https://w.wiki/8iGM) but https://qlever.cs.uni-freiburg.de/api/dblp (https://w.wiki/6m6B, from the description) is still not allowed.
@HinMar Sorry for missing this request - our bad! I see your earlier comment mentioned the project expiring by end of 2023. Is the project still ongoing and therefore we should still whitelist this new endpoint or should I instead close this ticket out?
Jan 30 2024
Jan 25 2024
Old masters are no longer master-eligible. They're still participating in the actual cluster; we're holding off on the physical decom until T355617 is done
Jan 24 2024
Forgot to add the Bug: label but https://gerrit.wikimedia.org/r/c/operations/puppet/+/992826 is part of this ticket as well
Jan 23 2024
This should be all done, with the new experimental services accessible at:
Experimental microsites are up and externally reachable:
We've rolled this out following the steps in https://wikitech.wikimedia.org/wiki/Cergen#Update_a_certificate
Jan 22 2024
These 3 new services have their internal certs working with Envoy. Moving to Done and spun off https://phabricator.wikimedia.org/T355593 for the last cert-related work.
Should be moved to Blocked / Waiting. However for now I think I need to leave it in incoming until it's been triaged by the Search Platform team.
Finished the documentation. With the new dashboard up in https://grafana-rw.wikimedia.org/d/xiWr1c5Iz/search-slos?orgId=1, this work is complete.
Jan 19 2024
Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/991680 to back out these microsites over the weekend and cut down on noise
Jan 18 2024
Deployed the following changes via sudo -i authdns-update:
Jan 16 2024
We'll need to add 3 entries to https://gerrit.wikimedia.org/r/c/operations/puppet/+/668543/4/modules/profile/manifests/microsites/wdqs.pp following the model in https://gerrit.wikimedia.org/r/c/operations/puppet/+/668543/
Talked with gehel, ebernhardson, and inflatador. We're going to start with full-experimental.query.wikidata.org, main-experimental.query.wikidata.org, scholarly-experimental.query.wikidata.org to get these 3 test endpoints up. Meanwhile, we can open up the convo with the community as far as what the ultimate "final" naming/domain scheme will be wrt https://phabricator.wikimedia.org/T354043
We did the initial work to get envoy via PKI / cfssl operational in https://phabricator.wikimedia.org/T354555#9454855. Next up is adding specific alt-names for the three new endpoints. Here's a few different proposals for naming scheme:
- full.query.wikidata.org, main.query.wikidata.org, scholar.query.wikidata.org
- full.wikidata.org, main.wikidata.org, scholar.wikidata.org
- full-query.wikidata.org, main-query.wikidata.org, and scholarly-query.wikidata.org
- full-query.wikidata.org, main-query.wikidata.org, and scholar.wikidata.org
- full-graph.wikidata.org, main-graph.wikidata.org, and scholar-graph.wikidata.org
Made some various improvements to the dashboard: collated SLIs into a single row, added threshold markers for every SLI, added y axis labelling and added a soft max of 600ms to automcomplete latency since currently grafana was setting the y axis max below 600 due to no data points existing >= 600
Jan 10 2024
@MoritzMuehlenhoff Oops it appears we made the same mistake twice :P Can you do one more check for us? I think everything is all set now:
Jan 9 2024
Finished adding the SLO dashboards to https://grafana-rw.wikimedia.org/d/H6f-bA7Sk/rkemper-search-sli-test?orgId=1&from=now-90d&to=now. Remaining steps:
Jan 8 2024
@Jclark-ctr Yes, these hosts are fully ready to be decom'd.
Jan 4 2024
elastic2087 has joined the cluster as a bullseye host. I haven't officially pooled it yet.
@MoritzMuehlenhoff This should be all done. Let us know if you see any rogue java processes hanging around!
Jan 2 2024
Dec 20 2023
After talking in the #wikimedia-sre IRC channel, I'll run the sre.network.configure-switch-interfaces myself, and then Volans will take care of the puppetdb/debmonitor stuff after seeing if the cookbook can be improved to handle those idempotently.
Decom cookbook ran: https://sal.toolforge.org/log/tSXrhYwBhuQtenzvzt4I
Decom cookbook ran: https://sal.toolforge.org/log/tSXrhYwBhuQtenzvzt4I