Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (379 w, 6 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Thu, Jan 13

EBernhardson claimed T295734: Bring up two copies of the CirrusSearch browser integration env in cloud.
Thu, Jan 13, 9:44 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T295734: Bring up two copies of the CirrusSearch browser integration env in cloud from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Thu, Jan 13, 9:44 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to P18732 elastic2051.codfw.wmnet reimage failure.

--fix-broken says it will only install the one package, suggesting the package is the problem:

ebernhardson@elastic2051:/var/log$ apt --fix-broken install --dry-run
NOTE: This is only a simulation!
      apt needs root privileges for real execution.
      Keep also in mind that locking is deactivated,
      so don't depend on the relevance to the real current situation!
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following additional packages will be installed:
  elasticsearch-oss
The following NEW packages will be installed:
  elasticsearch-oss
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
Inst elasticsearch-oss (6.5.4 Wikimedia:9/stretch-wikimedia [all])
Conf elasticsearch-oss (6.5.4 Wikimedia:9/stretch-wikimedia [all])
Conf wmf-elasticsearch-search-plugins (6.5.4-7~stretch Wikimedia:9/stretch-wikimedia [all])
Thu, Jan 13, 7:29 PM

Tue, Jan 11

EBernhardson updated the task description for T296470: Initialize WCQS production servers.
Tue, Jan 11, 6:57 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Mon, Jan 10

EBernhardson updated the task description for T296470: Initialize WCQS production servers.
Mon, Jan 10, 8:48 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T295705: Cleanup missing Commons index on Elasticsearch eqiad from In Progress to Needs review on the Discovery-Search (Current work) board.

In addition to the patch reenable the saneitizer, the patch to remove the swift plugin is also waiting for review: https://gerrit.wikimedia.org/r/c/operations/software/elasticsearch/plugins/+/741734/

Mon, Jan 10, 7:35 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T280487: Redirect requests from wcqs-beta.wmflabs.org to the final URL for WCQS.

For clarity, this is waiting for the production rollout of wcqs.

Mon, Jan 10, 7:16 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T296468: Modify flink-job.py script to handle WCQS streaming updater deployment from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Jan 10, 7:12 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T293638: Create and update the process for separate WCQS mutation topic from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Jan 10, 7:03 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Fri, Jan 7

EBernhardson moved T298622: Adapt EntityRevisionMapGenerator for wcqs from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Fri, Jan 7, 6:34 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T298622: Adapt EntityRevisionMapGenerator for wcqs from Incoming to Current work on the Wikidata-Query-Service board.
Fri, Jan 7, 6:34 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson claimed T298622: Adapt EntityRevisionMapGenerator for wcqs.
Fri, Jan 7, 6:33 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T297454: WCQS gives "502 Bad Gateway Error" from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Fri, Jan 7, 5:34 PM · Discovery-Search (Current work), Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, Wikidata, SDC General, Wikidata-Query-Service

Thu, Jan 6

EBernhardson moved T296468: Modify flink-job.py script to handle WCQS streaming updater deployment from To Be Deployed to Needs review on the Discovery-Search (Current work) board.
Thu, Jan 6, 9:59 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T297319: PHP Warning: {"type":"error","message":"cirrussearch-too-busy-error","params":[]} [Called from Wikibase\Lexeme\Search\Elastic\LexemeSearchEntity::getRankedSearchResults in /srv/mediawiki/php-1.38.0-wmf.9/extensions/WikibaseLexemeCirrusSearch/src/LexemeSearchEntity.php at line 207] from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Thu, Jan 6, 9:40 PM · MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), Discovery-Search (Current work), Wikimedia-production-error
EBernhardson committed rWDAN32979913bb00: Use correct rdf-spark-tools jar (authored by EBernhardson).
Use correct rdf-spark-tools jar
Thu, Jan 6, 8:13 PM
EBernhardson committed rWDAN63c162d31162: Generate entity revision maps for commons (authored by EBernhardson).
Generate entity revision maps for commons
Thu, Jan 6, 7:51 PM
EBernhardson committed rWDAN6f5caf9b101b: export queries to relforge: Allow for null columns (authored by EBernhardson).
export queries to relforge: Allow for null columns
Thu, Jan 6, 7:17 PM
EBernhardson committed rWDAN95872df49143: Pin wtforms to <3.0.0 (authored by EBernhardson).
Pin wtforms to <3.0.0
Thu, Jan 6, 7:17 PM

Wed, Jan 5

EBernhardson created T298648: mjolnir inputs are empty.
Wed, Jan 5, 6:54 PM · Discovery-Search (Current work)

Tue, Jan 4

EBernhardson moved T280008: Set up WCQS monitoring from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Jan 4, 8:06 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T280008: Set up WCQS monitoring.

I've redirected the lag metric in the primary Wikidata Query Service dashboard to read from lag reported by the kafka updater, rather than from the last update timestamp written into the database. This is sufficient for metrics purposes generally, but if we need to support reporting lag to mediawiki (like we do for wikidata, so they can slow edits if we're getting behind) we may have to look into how the metric gets to mediawiki. While WCQS doesn't have the updaters running yet, so doesn't have this value, this metric will show up when the updaters start running.

Tue, Jan 4, 8:05 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T297454: WCQS gives "502 Bad Gateway Error".

The beta service looks to have unintentionally picked up some of the configuration of the production cluster. I've put the configuration back and disabled the beta instance (puppet) from updating itself which should keep things in the current state as we make changes to roll out the production service.

Tue, Jan 4, 4:39 PM · Discovery-Search (Current work), Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, Wikidata, SDC General, Wikidata-Query-Service

Dec 15 2021

EBernhardson added a comment to T280008: Set up WCQS monitoring.

I can see metrics re-appearing in the WDQS dashboards, looks like jmx metrics are working again. The Thread count and Heap used graphs still didn't show anything for wcqs, I've updated th queries and values are shown for wcqs and wdqs now. This leaves only the Lag metric missing for WCQS in the primary dashboard. After looking into it the problem here is the prometheus exporter trying to query blazegraph through nginx, and wcqs nginx sending that query through auth. Viable solutions include setting up a lag endpoint in nginx, opening an un-auth'd nginx port, or having the prometheus exporter talk to blazegraph directly. A lag endpoint in nginx might be the best of these options, but there might be more options available.

Dec 15 2021, 11:12 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T297454: WCQS gives "502 Bad Gateway Error".

I had to stop (shutoff) and start the VM again (to clear the error state), the VM is up and running, and what's using all the space is:

root@wcqs-beta-01:/srv/wdqs-data# du -hs *
4.0K    aliases.map
4.0K    dumps
28G     latest-mediainfo.ttl.gz
28G     munged
72K     sdoc.jnl
4.0K    target
3.1T    wcqs.jnl

I'll turn off the VM again, so you can turn it on and cleanup before it fills up again, let me know how it goes.

Dec 15 2021, 6:11 PM · Discovery-Search (Current work), Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, Wikidata, SDC General, Wikidata-Query-Service

Dec 14 2021

EBernhardson added a comment to T297454: WCQS gives "502 Bad Gateway Error".

I've moved the other VM around, so there's a little bit of space free now, should be enough to start the VM but you'll have to make sure to cleanup right when it comes back up, if it's not possible to get it to less than half it's size, we will not be able to shrink the image though.

Let me know how it goes.

Dec 14 2021, 6:05 PM · Discovery-Search (Current work), Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, Wikidata, SDC General, Wikidata-Query-Service

Dec 13 2021

EBernhardson added a comment to T297454: WCQS gives "502 Bad Gateway Error".
In T297454#7567004, @Sj wrote:

Is this monitored by any of the status tools? Does it just need to be restarted?

Dec 13 2021, 5:52 PM · Discovery-Search (Current work), Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, Wikidata, SDC General, Wikidata-Query-Service

Dec 10 2021

EBernhardson added a comment to T297454: WCQS gives "502 Bad Gateway Error".

This is still the correct URL. I don't have exact logs, but it looks like around 2021-12-09T17:44Z the instance stopped running and upon inspection reports a status of error and power state of paused. Around 20 minutes after I started poking it the power state changed to No State. I have projectadmin rights for the project, but attempting to start the instance reports I don't have appropriate rights. Will likely need a wmcs admin to poke it.

Dec 10 2021, 7:54 PM · Discovery-Search (Current work), Cloud-Services-Origin-User, Cloud-Services-Worktype-Unplanned, User-dcaro, Wikidata, SDC General, Wikidata-Query-Service

Dec 9 2021

sbassett awarded T296767: Rotate swift auth key for mw:media account a Like token.
Dec 9 2021, 5:24 PM · SecTeam-Processed, SRE-swift-storage, SRE, Security, Security-Team

Dec 8 2021

EBernhardson moved T296897: Eqiad Geosearch API queries return errors on Commons from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

The other (archive, content and general) commonswiki indices have been snapshotted from codfw and then restored on eqiad, same as we previously did for the file index. Traffic is back on eqiad and the queries above now work as expected.

Dec 8 2021, 12:11 AM · Discovery-Search (Current work), CirrusSearch, Commons, GeoData

Dec 7 2021

EBernhardson claimed T293638: Create and update the process for separate WCQS mutation topic.
Dec 7 2021, 7:23 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T296897: Eqiad Geosearch API queries return errors on Commons from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Dec 7 2021, 7:22 PM · Discovery-Search (Current work), CirrusSearch, Commons, GeoData
EBernhardson claimed T296897: Eqiad Geosearch API queries return errors on Commons.
Dec 7 2021, 7:22 PM · Discovery-Search (Current work), CirrusSearch, Commons, GeoData
EBernhardson closed T297221: Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active, a subtask of T293953: 1.38.0-wmf.12 deployment blockers, as Resolved.
Dec 7 2021, 6:47 PM · Release-Engineering-Team (Doing), User-brennen, Patch-For-Review, Release, Train Deployments
EBernhardson closed T297221: Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active as Resolved.

These are no longer being emitted, the cluster now has primaries available for the given indices.

Dec 7 2021, 6:47 PM · Discovery-Search
EBernhardson added a comment to T297221: Search backend error during sending 1 documents to the commonswiki_content_1617495209 index(s): primary shard is not active.

These errors are unrelated, it is an expected outage (user traffic is on another cluster). The errors come out of a snapshot restore process that's currently running re T296897 and T295705. Ideally these would be a bit quieter, but we didn't have a process already set for snapshot/restore and just kinda making it work.

Dec 7 2021, 6:28 PM · Discovery-Search
EBernhardson added a comment to T293638: Create and update the process for separate WCQS mutation topic.

Provide a way to either configure or autodetect correct mutation topics in cookbooks.

Dec 7 2021, 4:51 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T295316: Add an image: pre-deployment model refresh from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Dec 7 2021, 4:47 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks
EBernhardson moved T296468: Modify flink-job.py script to handle WCQS streaming updater deployment from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Dec 7 2021, 4:46 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Dec 6 2021

EBernhardson added a comment to T293638: Create and update the process for separate WCQS mutation topic.

Create the topic (proposed name mediainfo-streaming-updater-mutation, up for debate)

Dec 6 2021, 6:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Dec 1 2021

EBernhardson moved T296468: Modify flink-job.py script to handle WCQS streaming updater deployment from In Progress to Needs review on the Discovery-Search (Current work) board.
Dec 1 2021, 11:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T296767: Rotate swift auth key for mw:media account.

If that's correct, then we'll need to co-ordinate such that the new credential is deployed to clients at (about) the same time as we do the rolling restart of the frontends. @EBernhardson are you the right person to know how this credential is deployed and when is going to be a good time to do its rollover?

Dec 1 2021, 10:45 PM · SecTeam-Processed, SRE-swift-storage, SRE, Security, Security-Team

Nov 30 2021

EBernhardson added a comment to T296468: Modify flink-job.py script to handle WCQS streaming updater deployment.

I started looking into this, changing the script is reasonably easy but the correct values to use are a bit more of a mystery. A few questions:

Nov 30 2021, 11:01 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson claimed T296468: Modify flink-job.py script to handle WCQS streaming updater deployment.
Nov 30 2021, 9:14 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T294961: Resolve kernel hang on wcqs* instances from Needs Reporting to Needs review on the Discovery-Search (Current work) board.
Nov 30 2021, 6:53 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T296767: Rotate swift auth key for mw:media account.

As far as I'm aware swift is not publicly accessible, using the key for anything requires being inside the prod network already. Not sure how that plays into risk assessment.

Nov 30 2021, 6:18 PM · SecTeam-Processed, SRE-swift-storage, SRE, Security, Security-Team
EBernhardson created T296767: Rotate swift auth key for mw:media account.
Nov 30 2021, 6:17 PM · SecTeam-Processed, SRE-swift-storage, SRE, Security, Security-Team

Nov 29 2021

EBernhardson claimed T279698: WDQS should retry when getting 404s.
Nov 29 2021, 10:52 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T279698: WDQS should retry when getting 404s from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Nov 29 2021, 10:51 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T280008: Set up WCQS monitoring from In Progress to Needs review on the Discovery-Search (Current work) board.
Nov 29 2021, 10:51 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T296376: Investigate rendering speed variations starting around 10 November.

From looking at https://sal.toolforge.org/production?p=0&q=deploy1002&d=2021-11-11, it looks like all CirrusSearch traffic was routed to codfw: https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-10_cirrussearch_commonsfile_outage. If search traffic is still routed through codfw (I think it is) that might be part of the reason why we see increased variability in the rendering speed.

Nov 29 2021, 10:01 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Performance Issue, GrowthExperiments-Homepage, Growth-Team (Current Sprint)
EBernhardson moved T295676: Redirect to previous URL after auth redirect in WCQS from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Nov 29 2021, 6:15 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T294961: Resolve kernel hang on wcqs* instances from Waiting to Needs Reporting on the Discovery-Search (Current work) board.

Another round of import tests completed, nothing fell over. Calling this done for now.

Nov 29 2021, 5:56 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T295705: Cleanup missing Commons index on Elasticsearch eqiad.

Snapshots have been deleted from swift. The snapshot configuration has been dropped from eqiad, codfw and relforge clusters. The plugin removal should be mergable, i imagine we won't need a specific rolling restart and the plugin removal can roll out whenever.

Nov 29 2021, 5:23 PM · Patch-For-Review, Discovery-Search (Current work)

Nov 24 2021

EBernhardson added a comment to T280008: Set up WCQS monitoring.

Most of the metrics are in. Parts of the dashboard were limiting to data from :9193, to avoid the instance (categories?) on :9194. wcqs comes in on :9195, so for now i made the patterns :919[^4] and it "works". Longer term we need a better way to distinguish the two instances that run in wdqs other than the port number.

Nov 24 2021, 9:44 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T295705: Cleanup missing Commons index on Elasticsearch eqiad.

Restore ran much faster than the snapshot, started at 16:10, finished around 19:00 and then elastic took another hour to spread replicas across the cluster. Ran the catchup procedure, for the ~15 hours since the snapshot was taken, ran better than in the past and took ~45 minutes to replay 110k updates.

Nov 24 2021, 8:54 PM · Patch-For-Review, Discovery-Search (Current work)

Nov 23 2021

EBernhardson closed T114849: Log lines on flourine overflow at 8092 bytes. as Declined.

might as well close it, this should mostly be irrelevant infrastructure.

Nov 23 2021, 11:49 PM · SRE Observability, observability, Wikimedia-Logstash, SRE
EBernhardson closed T114849: Log lines on flourine overflow at 8092 bytes., a subtask of T157850: Interacting with Wikimedia logs should be a pleasant experience, as Declined.
Nov 23 2021, 11:48 PM · Epic, WMF-General-or-Unknown
EBernhardson added a comment to T294961: Resolve kernel hang on wcqs* instances.

Started another round of imports today to see how it goes. If it doesn't fall over might as well call this done for now.

Nov 23 2021, 11:43 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T295705: Cleanup missing Commons index on Elasticsearch eqiad.

snapshot started earlier today, turns out i logged it to the previous ticket instead of this one:

Nov 23 2021, 10:57 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T295316: Add an image: pre-deployment model refresh.

Data loaded from clarakosi.search_imagerec into the eqiad and codfw cirrus clusters. This updated ~75k pages in each DC, the majority of the import was nop'd at indexing time due to not causing any change to indexed content. I've started up the process to clear pages, a dry run reported it will clear old recommendations from ~70k pages per cluster, expecting it to finish an an hour or so.

Nov 23 2021, 9:22 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Nov 22 2021

EBernhardson added a comment to T295365: Alert when the rate of pages fixed by Saneitizer is too high.

Poking at the data, the current formulation used on the graph (direct count of documents fixed against a time axis of when), isn't very clearly actionable. Brief moments of high fix rates aren't particular important, what we want to alert on is things that look like a systemic issue. I'm not sure how to get there though, taking something like the sum of fixes over the last 24h gives a graph that mostly looks the same, maybe a few less spikes but not really. To really push down the spikes have to go with something like a sum of fixes over the last 7 days. Reviewing graphs for the number of documents fixed per 7d period a few things pop out:

Nov 22 2021, 6:44 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T291818: Document WikibaseCirrusSearch dependency on wikimedia-extra Elasticsearch plugin from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Nov 22 2021, 5:35 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Wikidata, CirrusSearch

Nov 18 2021

EBernhardson claimed T295676: Redirect to previous URL after auth redirect in WCQS.
Nov 18 2021, 11:25 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T295676: Redirect to previous URL after auth redirect in WCQS from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Nov 18 2021, 11:25 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Nov 17 2021

EBernhardson claimed T291818: Document WikibaseCirrusSearch dependency on wikimedia-extra Elasticsearch plugin.
Nov 17 2021, 10:27 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Wikidata, CirrusSearch
EBernhardson moved T280487: Redirect requests from wcqs-beta.wmflabs.org to the final URL for WCQS from Ready for Development to Waiting on the Discovery-Search (Current work) board.
Nov 17 2021, 9:40 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson claimed T280008: Set up WCQS monitoring.
Nov 17 2021, 9:40 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T280008: Set up WCQS monitoring from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Nov 17 2021, 9:40 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson moved T295316: Add an image: pre-deployment model refresh from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Nov 17 2021, 9:39 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Nov 16 2021

EBernhardson claimed T295705: Cleanup missing Commons index on Elasticsearch eqiad.
Nov 16 2021, 8:04 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T295705: Cleanup missing Commons index on Elasticsearch eqiad from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Nov 16 2021, 8:03 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson claimed T290604: Create alerts for GC death spiral.
Nov 16 2021, 1:22 AM · Discovery-Search (Current work), CirrusSearch

Nov 15 2021

EBernhardson moved T293462: Add user blocking in WCQS from Needs review to To Be Deployed on the Discovery-Search (Current work) board.

Patches are merged, we could deploy but we can't test the deploy so I'm holding off. Before we can test the deploy we need to bring the wcqs servers back into lvs production state.

Nov 15 2021, 11:43 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson created T295736: Evaluate deployed elasticsearch plugins for 7.10 compatability.
Nov 15 2021, 11:36 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson created T295735: Decide on the future of Cirrus development/integration environment.
Nov 15 2021, 11:35 PM · CirrusSearch, Discovery-Search
EBernhardson added a comment to T295734: Bring up two copies of the CirrusSearch browser integration env in cloud.

In the past when rebuilding the integration env I often forget to save the ssh keys from the instance, and then have to spend extra time getting that all reset. This time around will try to remember to keep those.

Nov 15 2021, 11:35 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson created T295734: Bring up two copies of the CirrusSearch browser integration env in cloud.
Nov 15 2021, 11:34 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T295705: Cleanup missing Commons index on Elasticsearch eqiad.
  • I checked with network ops, if we rate limit to 2 gigabits (8MB/s/partition in repository config) of traffic it should leave plenty of room for everything else.
  • Talked with the team, agreed to ship a locally compiled jar of elasticsearch-repository-swift in our debian package, and then take it out when we are done prior to the 6.8 upgrade.
  • Ideally we should still check in regarding swift auth, but i don't suspect there is any problem with us using our existing data shipping credentials for this purpose.
Nov 15 2021, 5:11 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T295316: Add an image: pre-deployment model refresh.

@EBernhardson right, versioning would be simpler for the infrastructure but more complicated for clients and users who would have to somehow figure out what search keyword to use.

The exact versioning shouldn't be exposed to end users, indeed it would be crazy to expect end users to know which dump they should be referencing. That's what i was suggesting to use the metastore for, The translation from end-user visible keyword to whatever internal version is currently promoted should be able to happen by maintaining an array/map of whatever is currently the promoted versions in MetaStore. That can be updated whenever a new version finishes importing by the importing process.

Nov 15 2021, 3:44 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Nov 12 2021

EBernhardson updated subscribers of T295478: Searching on Special:Search and MediaSearch on Commons returns error.

After doing some testing, I have a rough recovery plan:

Nov 12 2021, 10:28 PM · Discovery-Search (Current work), Commons, Structured-Data-Backlog (Current Work), SDAW-MediaSearch
EBernhardson added a comment to T295316: Add an image: pre-deployment model refresh.

I think in the long term we'd like the search data to be automatically regularly refreshed (e.g. monthly) so versioning wouldn't be easy to manage on the client side. As a short-term solution it would work fine for us if that's your preferred approach.

Nov 12 2021, 4:13 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Nov 10 2021

EBernhardson added a comment to T295316: Add an image: pre-deployment model refresh.

We can clear the old suggestions, thats not a problem. I wanted it to be clear though that the search systems only update the pages that are referenced. If we want to update pages not referenced in a data dump it has to be done explicitly, it doesn't just happen.

Nov 10 2021, 7:53 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Discovery-Search (Current work), Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks
EBernhardson added a comment to T295480: Searching for files on Commons returns error.

Incident report will be coming up, but the short answer is the commonswiki_file index on eqiad cluster went missing. Traffic has been moved to a cluster that still has the index, working up processes now to copy the good index between clusters and restore the eqiad cluster.

Nov 10 2021, 7:48 PM · Commons, Discovery-Search

Nov 8 2021

EBernhardson added a comment to T294961: Resolve kernel hang on wcqs* instances.

The import that caused everything to fall over last time completed. I'm not sure that's enough to declare this fixed (it ran once before as well) but after putting the puppet patch in place we can probably wait on this one to see if it reoccurs.

Nov 8 2021, 5:52 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 5 2021

EBernhardson added a comment to T295192: missing search index on checkuserwiki.

With the error consistently reproducable when talking to individual machines i setup a script to check all the different directions we do cluster to cluster communications. There were 5 instances (1047, 1046, 1044, 1042, 1035) showing the issue, all in the psi->chi direction. I restarted each of the instances, logging shows the errors have stopped.

Nov 5 2021, 10:26 PM · Datacenter-Switchover, Discovery-Search (Current work)
EBernhardson added a comment to T295192: missing search index on checkuserwiki.

This doesn't look to be anything related to reindexing, rather something is out-of-sync inside the elasticsearch clusters with regards to cross-cluster functionality.

Nov 5 2021, 8:32 PM · Datacenter-Switchover, Discovery-Search (Current work)

Nov 3 2021

EBernhardson updated the task description for T294961: Resolve kernel hang on wcqs* instances.
Nov 3 2021, 9:40 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson edited P17669 wcqs2002 kernel hang.
Nov 3 2021, 8:14 PM
EBernhardson edited P17668 wcqs1003 kernel hang.
Nov 3 2021, 8:13 PM
EBernhardson edited P17667 wcqs1001 kernel hang.
Nov 3 2021, 8:13 PM
EBernhardson updated subscribers of T294961: Resolve kernel hang on wcqs* instances.

Some random info i looked up:

Nov 3 2021, 7:19 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson updated subscribers of T294961: Resolve kernel hang on wcqs* instances.
Nov 3 2021, 6:17 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson updated the task description for T294961: Resolve kernel hang on wcqs* instances.
Nov 3 2021, 6:14 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a project to T294961: Resolve kernel hang on wcqs* instances: SRE.
Nov 3 2021, 6:13 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson created T294961: Resolve kernel hang on wcqs* instances.
Nov 3 2021, 6:12 PM · wdwb-tech, SRE, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson created P17669 wcqs2002 kernel hang.
Nov 3 2021, 6:04 PM
EBernhardson created P17668 wcqs1003 kernel hang.
Nov 3 2021, 6:03 PM
EBernhardson created P17667 wcqs1001 kernel hang.
Nov 3 2021, 6:03 PM
EBernhardson edited P17664 wcqs1001 disk problems.
Nov 3 2021, 5:09 PM