EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (188 w, 6 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF)

Recent Activity

Sat, May 19

EBernhardson created T195042: insource fails to return highlights on some queries.
Sat, May 19, 10:24 AM · Discovery, CirrusSearch, Discovery-Search

Wed, May 16

EBernhardson edited P7136 (An Untitled Masterwork).
Wed, May 16, 4:06 PM
EBernhardson created P7136 (An Untitled Masterwork).
Wed, May 16, 4:05 PM

Tue, May 15

EBernhardson added a comment to T194678: Update OtherIndex to operate on a cluster other than the one holding the wiki.

For splitting wikis between clusters and ensuring sister searches stay on the same cluster i was hoping i could get by with a test case in mw-config that pokes at the SiteMatrix configuration (unfortunately without the SiteMatrix code in mw-debug test suite) and verifies things all belong on the "correct" clusters.

Tue, May 15, 6:23 PM · Discovery-Search

Mon, May 14

EBernhardson added a comment to T194678: Update OtherIndex to operate on a cluster other than the one holding the wiki.

For indexing we need to be a little more involved. The use cases look to be:

Mon, May 14, 7:21 PM · Discovery-Search
EBernhardson added a comment to T194678: Update OtherIndex to operate on a cluster other than the one holding the wiki.

For search we might consider Cross Cluster Search. This was added in 5.4 and came out of beta in 6.0 and is the blessed replacement for tribe nodes. It essentially allows us to query the other index as if it were local by prefixing the index with the cluster name, such as eqiad-large:commonsiwiki_file. This allows us to ignore the question of which cluster (eqiad, codfw?) to read from in the cirrus code, relegating it to elasticsearch configuration.

Mon, May 14, 7:11 PM · Discovery-Search
EBernhardson triaged T194678: Update OtherIndex to operate on a cluster other than the one holding the wiki as Normal priority.
Mon, May 14, 6:51 PM · Discovery-Search

Fri, May 11

EBernhardson added a comment to T194534: Researches by prefixes are out of order on the French Wiktionary.

This looks like fallout from https://gerrit.wikimedia.org/r/#/c/429815/. Will revert and re consider how to fix this.

Fri, May 11, 9:21 PM · Patch-For-Review, Discovery-Search, CirrusSearch, Discovery
EBernhardson moved T192615: Add 'type' field to store same information as was in es5 types. from Backlog to Needs review on the Discovery-Search (Current work) board.
Fri, May 11, 5:53 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T192699: Build a unified namespace index from Backlog to Needs review on the Discovery-Search (Current work) board.
Fri, May 11, 5:53 PM · Discovery-Search (Current work), Patch-For-Review
EBernhardson moved T192615: Add 'type' field to store same information as was in es5 types. from This Quarter to Current work on the Discovery-Search board.
Fri, May 11, 5:53 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson claimed T192615: Add 'type' field to store same information as was in es5 types..
Fri, May 11, 5:52 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T192615: Add 'type' field to store same information as was in es5 types..

We've decided to split archive into it's own index, and move namespace into metastore. As such only metastore needs the new type field now.

Fri, May 11, 5:52 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T192699: Build a unified namespace index from This Quarter to Current work on the Discovery-Search board.
Fri, May 11, 5:51 PM · Discovery-Search (Current work), Patch-For-Review
EBernhardson claimed T192699: Build a unified namespace index.
Fri, May 11, 5:51 PM · Discovery-Search (Current work), Patch-For-Review

Thu, May 10

EBernhardson closed T192693: Resolve document id clashes with unified type as Declined.

We decided to not add a new type field, and instead split archive into it's own index. To handle the sharding problem we will create 2 new "tiny" clusters on the existing hardware to split all of the tiny wiki's between.

Thu, May 10, 5:01 PM · Discovery-Search
EBernhardson closed T192693: Resolve document id clashes with unified type, a subtask of T183282: [epic] Search cluster upgrade to 6.x, as Declined.
Thu, May 10, 5:01 PM · Epic, Discovery-Search
EBernhardson closed T192681: Identify and update all query building to filter on the new 'type' field, a subtask of T183282: [epic] Search cluster upgrade to 6.x, as Declined.
Thu, May 10, 5:00 PM · Epic, Discovery-Search
EBernhardson closed T192681: Identify and update all query building to filter on the new 'type' field as Declined.

We decided to not add a new type field

Thu, May 10, 5:00 PM · Discovery-Search
EBernhardson closed T192616: Update saneitizer to reindex documents that havn't been indexed in N days as Declined.

We decided to split archive and into it's own index removing the need for a type field

Thu, May 10, 5:00 PM · Discovery-Search
EBernhardson closed T192616: Update saneitizer to reindex documents that havn't been indexed in N days, a subtask of T183282: [epic] Search cluster upgrade to 6.x, as Declined.
Thu, May 10, 5:00 PM · Epic, Discovery-Search
EBernhardson added a subtask for T183282: [epic] Search cluster upgrade to 6.x: T193654: [epic] Run multiple elasticsearch clusters on same hardware.
Thu, May 10, 4:59 PM · Epic, Discovery-Search
EBernhardson added a parent task for T193654: [epic] Run multiple elasticsearch clusters on same hardware: T183282: [epic] Search cluster upgrade to 6.x.
Thu, May 10, 4:59 PM · Epic, Discovery-Search
EBernhardson added a comment to T182717: Move fine tuning of search configs to mediawiki-config.

It looks like all the supporting code has shipped and deployed, we need only to deploy the config patch now? https://gerrit.wikimedia.org/r/419367

Thu, May 10, 4:08 PM · MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery, Wikidata
EBernhardson moved T173774: Create script to dump recently changed categories from Needs review to Done on the Discovery-Search (Current work) board.
Thu, May 10, 4:07 PM · MW-1.32-release-notes (WMF-deploy-2018-05-15 (1.32.0-wmf.4)), Advanced-Search, Datasets-General-or-Unknown, Discovery-Search (Current work), Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, User-Smalyshev, Discovery, Wikidata-Query-Service, Wikidata
EBernhardson moved T192614: Resolve current deprecation warnings in elasticsearch 5 from Needs review to Waiting/Blocked on the Discovery-Search (Current work) board.
Thu, May 10, 3:36 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson added a comment to T192614: Resolve current deprecation warnings in elasticsearch 5.

There are also some deprecation warnings coming from the phabricator index, I pinged them in T181393.

Thu, May 10, 3:36 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson added a comment to T181393: Make sure elasticsearch 6 is supported in phabricator.

6 months is approaching :) Still don't have a date, but am doing some preliminary work resolving current problems that would block the upgrade, specifically phabricator index in elasticsearch is created with include_in_all: false set on a variety of fields. The false is perfect, and the default value, but the property is to be removed in elasticsearch 6 and needs to be removed from the mapping.

Thu, May 10, 3:04 PM · Release-Engineering-Team (Kanban), Phabricator
EBernhardson moved T179266: search.wikimedia.org is source of lots of 500s from Needs review to Done on the Discovery-Search (Current work) board.
Thu, May 10, 2:56 PM · Patch-For-Review, Discovery-Search (Current work), Operations

Wed, May 9

EBernhardson added a comment to T194059: Index redirect content in categories.

Redirects are indexed, but not as their own thing. Redirects in search are considered a property of the parent document. Because of this the only information kept is the namespace and title of the redirect.

Wed, May 9, 11:27 PM · CirrusSearch, Discovery, Discovery-Search
EBernhardson added a comment to T188136: Migrate Mediawiki Monolog Kafka producer to Kafka Jumbo.

Potential avenues to investigate:

  • The send timeout on mediawiki kafka is 10ms. We could try increasing? Although this should be more than enough.
  • Exceptions are currently logged to 'wfDebugLogFile' channel, but that channel looks unconfigured in our production logging so the messages are all thrown away. We could start logging that channel, or turn on a dedicated channel. Whatever errors it's emitting currently are being thrown away (there might also be some uncertainty about logging new messages while the app is shutting down and already flushing logs, hard to say).
Wed, May 9, 6:59 PM · Discovery, Analytics, Patch-For-Review, Analytics-Kanban, Analytics-Cluster
EBernhardson added a comment to T188136: Migrate Mediawiki Monolog Kafka producer to Kafka Jumbo.

php-rdkafka would be our best bet, but unfortunately they do not support hhvm and we will not likely be rid of hhvm in this calendar year.

Wed, May 9, 6:42 PM · Discovery, Analytics, Patch-For-Review, Analytics-Kanban, Analytics-Cluster

Tue, May 8

EBernhardson added a comment to T194144: Find a solution for SpecialEntitiesWithoutPage (EntitiesWithoutTermFinder).

As another option, this would be a very simple option to provide as an additional filter in fulltext search as a keyword. I'm not sure what the use cases are for this tool and if that would help.

Tue, May 8, 9:36 PM · Wikidata-Ministry-Of-Magic-Tech-Debt, Performance, Wikidata, MediaWiki-extensions-WikibaseRepository
EBernhardson added a comment to T194013: Possible deadlock in the elastic cache used by the ltr plugin.

Upstream opened a bug: https://github.com/elastic/elasticsearch/issues/30428 which has a pull request now attached, and the bug is tagged to be backported to 5.6.x. Proabably we will skip 5.6 and go straight to 6.x which will be backported as well.

Tue, May 8, 7:37 PM · Discovery-Search, CirrusSearch, Discovery
EBernhardson added a comment to T194139: The argument //deepcategory// in CirrusSearch only reports the members of the root category for nowiki.

we might want to add a warning when using deepcat on an unsupported wiki as well

Tue, May 8, 7:35 PM · Discovery-Search (Current work), Patch-For-Review, User-Smalyshev, CirrusSearch, Discovery
EBernhardson moved T194184: rack/setup/install wdqs10[09|10].eqiad.wmnet from Needs triage to Up Next on the Discovery-Search board.
Tue, May 8, 7:26 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Epic, Discovery, Wikidata, Operations, Discovery-Search, Wikidata-Query-Service
EBernhardson moved T194185: Implement searching of 'depicts' on commons with the 'inscription' qualifier from Needs triage to Watching/Waiting on the Discovery-Search board.
Tue, May 8, 7:25 PM · Multimedia-Team-Working-Board, Discovery-Search, Epic, Multimedia, Structured-Data-Commons, Wikidata

Fri, May 4

EBernhardson added a comment to T193365: Evaluate using NUMA for Blazegraph.

Well, there were some hi level and low level changes visible on the elasticsearch servers:

Fri, May 4, 3:57 PM · Discovery-Wikidata-Query-Service-Sprint, Discovery, Wikidata, Wikidata-Query-Service

Thu, May 3

EBernhardson added a comment to T193195: Outdated "insource" in search.

There is an automated process that visits all pages and verifies they contain the latest revision every two weeks, so it would have eventually been fixed. But it would certainly be better if the problem never ocured in the first place.

Thu, May 3, 9:15 PM · Discovery-Search, Discovery, CirrusSearch
EBernhardson moved T193605: Alert when elasticsearch writes are frozen for too long from Needs triage to Current work on the Discovery-Search board.
Thu, May 3, 8:50 PM · Patch-For-Review, Discovery-Search (Current work), Operations, Elasticsearch, CirrusSearch, Discovery
EBernhardson removed a project from T177520: Experiment with different grouping of queries that get fed into the DBN: Discovery-Search (Current work).
Thu, May 3, 8:48 PM · Discovery
EBernhardson added a comment to T192972: Evaluate impact of adding ~2700 new shards to production cluster.

Calling this one done, I've exported the report to pdf so it lasts longer:

Thu, May 3, 8:34 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T192972: Evaluate impact of adding ~2700 new shards to production cluster from In progress to Done on the Discovery-Search (Current work) board.
Thu, May 3, 8:12 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T193392: Deprecate global namespace handling of the prefix keyword (reverted) from Needs review to Done on the Discovery-Search (Current work) board.
Thu, May 3, 8:09 PM · MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), Discovery-Search (Current work), Patch-For-Review, Discovery, CirrusSearch
EBernhardson added a comment to T193605: Alert when elasticsearch writes are frozen for too long.

After https://gerrit.wikimedia.org/r/430441 it will work fairly simply. Each cluster can have the following request issued:

Thu, May 3, 2:24 AM · Patch-For-Review, Discovery-Search (Current work), Operations, Elasticsearch, CirrusSearch, Discovery

Wed, May 2

EBernhardson moved T193112: Jobs writing to the Elasticsearch cluster in codfw are timing out, causing all type of issues from In progress to Waiting/Blocked on the Discovery-Search (Current work) board.
Wed, May 2, 11:31 PM · Discovery-Search (Current work), Operations, Discovery, CirrusSearch, Search-Platform-Programs
EBernhardson moved T188530: Externalize the parsing logic from SimpleKeywordFeature and FullTextQueryStringQueryBuilder from Needs review to Done on the Discovery-Search (Current work) board.
Wed, May 2, 11:30 PM · MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), Patch-For-Review, Discovery-Search (Current work), Discovery, CirrusSearch
EBernhardson claimed T192614: Resolve current deprecation warnings in elasticsearch 5.
Wed, May 2, 11:29 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson moved T192614: Resolve current deprecation warnings in elasticsearch 5 from Backlog to Needs review on the Discovery-Search (Current work) board.
Wed, May 2, 11:29 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson moved T192614: Resolve current deprecation warnings in elasticsearch 5 from This Quarter to Current work on the Discovery-Search board.
Wed, May 2, 11:29 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson created T193684: Reindex should retry requests for certain error classes.
Wed, May 2, 10:17 PM · Discovery-Search
EBernhardson moved T179266: search.wikimedia.org is source of lots of 500s from Backlog to Needs review on the Discovery-Search (Current work) board.
Wed, May 2, 9:30 PM · Patch-For-Review, Discovery-Search (Current work), Operations
EBernhardson claimed T179266: search.wikimedia.org is source of lots of 500s.
Wed, May 2, 9:30 PM · Patch-For-Review, Discovery-Search (Current work), Operations
EBernhardson added a comment to T192614: Resolve current deprecation warnings in elasticsearch 5.

I think i've gone through all the deprecation warnings from the last week and either resolved or submitted patches for them. There is one remaining that I haven't been able to track down:

Wed, May 2, 7:38 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson updated the task description for T193654: [epic] Run multiple elasticsearch clusters on same hardware.
Wed, May 2, 5:48 PM · Epic, Discovery-Search
EBernhardson updated the task description for T193654: [epic] Run multiple elasticsearch clusters on same hardware.
Wed, May 2, 5:44 PM · Epic, Discovery-Search
EBernhardson created T193654: [epic] Run multiple elasticsearch clusters on same hardware.
Wed, May 2, 5:04 PM · Epic, Discovery-Search

Tue, May 1

EBernhardson added a comment to T192614: Resolve current deprecation warnings in elasticsearch 5.

upgraded metastore on eqiad and codfw from 0.2 to 0.3 to fix more deprecation warnings about "index": "not_analyzed" which should be "index": "no". I'm not sure why but the minor upgrade didn't work so i forced a major (re-create and reindex) upgrade.

Tue, May 1, 8:57 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson added a comment to T192614: Resolve current deprecation warnings in elasticsearch 5.

eqiad elasticsearch cluster (not logstash) was out of sync with the apifeatureusage template in puppet causing it to create indices with deprecation warnings. I've updated the template from the one in puppet and new indices going forward should not log deprecation warnings. Some day we have to figure out how those templates get from logstash to elasticsearch, somehow or another it wasn't auto-magically deployed (should it be?).

Tue, May 1, 8:38 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson added a comment to T192614: Resolve current deprecation warnings in elasticsearch 5.

Surprisingly some of these warnings are simply for very old indices. Somehow commonswiki_general is dated feb 2017, although we've certainly done full reindexes since then. It's possible reindex failed somewhere and we never noticed, the logs for a reindex are so large we don't actually check they all succeed.

Tue, May 1, 7:59 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review

Mon, Apr 30

EBernhardson added a comment to T192972: Evaluate impact of adding ~2700 new shards to production cluster.

Finished new eqiad test with expected number of archives, notebook linked about has been updated. This shows a similar problem to the 2x-archive test, in that adding new shards to the cluster is typically finished in a reasonable timeframe, but sometimes waiting for the cluster to return to green takes several minutes (up to 5 in this test).

Mon, Apr 30, 9:46 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T192972: Evaluate impact of adding ~2700 new shards to production cluster.

Currently re-running the eqiad tests.

Mon, Apr 30, 8:27 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson created P7054 elastic_cluster_latency.py.
Mon, Apr 30, 4:55 PM · Discovery-Search, Elasticsearch

Fri, Apr 27

EBernhardson added a comment to T192972: Evaluate impact of adding ~2700 new shards to production cluster.

Basic report: http://paws-public.wmflabs.org/paws-public/User:EBernhardson_(WMF)/ElasticsearchMasterLatency/TooManyShards.ipynb

Fri, Apr 27, 9:54 PM · Patch-For-Review, Discovery-Search (Current work)

Thu, Apr 26

EBernhardson added a comment to T193112: Jobs writing to the Elasticsearch cluster in codfw are timing out, causing all type of issues.

I suppose we should lower the drop timeout, in $wgCirrusSearchDropDelayedJobsAfter, to something more reasonable as well. This was added to put a cap on the amount of time we backup into the job queue to keep it's size limited, but it seems the current value of three hours created more load than the system can handle.

Thu, Apr 26, 4:49 AM · Discovery-Search (Current work), Operations, Discovery, CirrusSearch, Search-Platform-Programs
EBernhardson added a comment to T193112: Jobs writing to the Elasticsearch cluster in codfw are timing out, causing all type of issues.

It looks like writes were frozen to the codfw cluster and never thaw'd. Moving forward we need a timestamp indexed along with the freeze. We should then start alerting on a freeze that has lasted more than N (60?) minutes so someone can unfreeze before we start dropping jobs on the ground.

Thu, Apr 26, 4:44 AM · Discovery-Search (Current work), Operations, Discovery, CirrusSearch, Search-Platform-Programs

Wed, Apr 25

ema awarded T191236: Resolve elasticsearch latency alerts a Love token.
Wed, Apr 25, 9:14 AM · Patch-For-Review, Discovery-Search (Current work)

Tue, Apr 24

EBernhardson triaged T192972: Evaluate impact of adding ~2700 new shards to production cluster as Normal priority.
Tue, Apr 24, 10:40 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T191236: Resolve elasticsearch latency alerts from In progress to Done on the Discovery-Search (Current work) board.
Tue, Apr 24, 9:38 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

Latency numbers for p95 and lower all look great, probably the most stable they've been since bringing load back to eqiad. p99 is still a bit spiky, some quick looks through graphs suggests a correlation between high io-wait and p99, but I think investigating that will need to be prioritized separately from this ticket. This looks to most likely be resolved, although a full cluster restart should follow to bring things into a consistent state. Currently -XX:+UseNUMA has been dpeloyed to all the machines, but only 1024-31 have been restarted in this configuration. The cluster restart will coincide with some plugin updates we have in the pipeline as well.

Tue, Apr 24, 9:38 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T192693: Resolve document id clashes with unified type.

I hadn't thought about copy_to and tried it out, indeed elasticsearch seems to handle multiple types with varied copy_to on the same field correctly. I don't see any obvious solution to this while moving away from multiple types, short of adding a field that is only populated for archive documents. Looking around the cluster we have at most 50k archive documents per-wiki.

Tue, Apr 24, 9:33 PM · Discovery-Search
EBernhardson created T192940: Run analysis on query explorer ab test.
Tue, Apr 24, 5:32 PM · Product-Analytics, Discovery-Search (Current work)
EBernhardson added a comment to T192699: Build a unified namespace index.

This could probably even use the existing metastore, but need to double check what analysis chains and query types we use.

Tue, Apr 24, 4:12 PM · Discovery-Search (Current work), Patch-For-Review
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

It's perhaps also worth considering that GC will likely behave a little differently with numa awareness enabled. At a minimum i've seen that GC will not compact across numa regions, which means those parts of the heap are working with some fraction of the heap instead of the whole thing.

Tue, Apr 24, 12:09 AM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

Restarted elastic1024-1031 with -XX:+UseNUMA. Inspecting the state of the jvm memory maps it looks like this causes the jvm to allocate three separate memory regions from the kernel instead of the single allocation it used before. This allocation is split between a shared heap that is interleaved between numa nodes, and then two allocations that are pinned to a specific node. The main benefit of numa awareness instead of a brute force interleave would be to have better memory locality so i took a look (again with intel pcm):

Tue, Apr 24, 12:06 AM · Patch-For-Review, Discovery-Search (Current work)

Mon, Apr 23

EBernhardson added a comment to T192693: Resolve document id clashes with unified type.

I was actually thinking namespace could be it's own single index, shared between the wiki's. I suppose we could use the metastore for that, it's tiny data and fits into the metastore concept.

Mon, Apr 23, 4:48 PM · Discovery-Search
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

Reviewing per-node latency graphs since rolling out the numa --interleave=all approach looks like a success. Pretty much all of the minor latency spikes above cluster baseline are coming from the older systems that have not had interleave enabled.

Mon, Apr 23, 4:43 PM · Patch-For-Review, Discovery-Search (Current work)

Apr 21 2018

EBernhardson claimed T191236: Resolve elasticsearch latency alerts.
Apr 21 2018, 2:37 AM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T191236: Resolve elasticsearch latency alerts from Backlog to In progress on the Discovery-Search (Current work) board.
Apr 21 2018, 2:37 AM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T192693: Resolve document id clashes with unified type.

If we split namespace into it's own single index for all wiks, the main difference remaining may now only be archive documents. Perhaps we need to revisit archive and see if it makes sense to unify the two as a single document with two different states (archived/current) ? Will need to think about it.

Apr 21 2018, 2:21 AM · Discovery-Search
EBernhardson triaged T192699: Build a unified namespace index as Normal priority.
Apr 21 2018, 2:20 AM · Discovery-Search (Current work), Patch-For-Review

Apr 20 2018

EBernhardson triaged T192693: Resolve document id clashes with unified type as Normal priority.
Apr 20 2018, 11:35 PM · Discovery-Search
EBernhardson renamed T192680: Update ttmserver for elasticsearch 6 from Update ttmserver for elasticsearch 6 index type removal to Update ttmserver for elasticsearch 6.
Apr 20 2018, 11:27 PM · Discovery-Search
EBernhardson added a comment to T192615: Add 'type' field to store same information as was in es5 types..

To clarify, index types are not being removed. What has been removed is the ability to have an index with more than 1 type. This means titlesuggest and ttmserver should need no direct changes, as they already meet this requirement afaict. The content/general indices contain multiple types and need this new field.

Apr 20 2018, 11:26 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson triaged T192681: Identify and update all query building to filter on the new 'type' field as Normal priority.
Apr 20 2018, 6:07 PM · Discovery-Search
EBernhardson triaged T192680: Update ttmserver for elasticsearch 6 as Normal priority.
Apr 20 2018, 6:06 PM · Discovery-Search
EBernhardson moved T192609: Search backend error during sending {numBulk} documents to the {index} index(s) after {tookMs}: {error_message} from Needs triage to Current work on the Discovery-Search board.
Apr 20 2018, 6:05 PM · Discovery-Search (Current work), MW-1.31-release-notes (WMF-deploy-2018-04-17 (1.31.0-wmf.30)), MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Discovery, CirrusSearch, Wikimedia-log-errors
EBernhardson added a comment to T192609: Search backend error during sending {numBulk} documents to the {index} index(s) after {tookMs}: {error_message}.

Having a hard time reproducing directly, although i am seeing semi-regular occurrences on mediawiki.org. For reference this isn't only limited to page type, i've seen logs for archive as well. It's some sort of generic problem but elasticsearch isn't logging any errors, and mediawiki isn't logging any useful errors. Will need to revisit what is logged on the mediawiki side after figuring out what should have been logged here.

Apr 20 2018, 1:30 AM · Discovery-Search (Current work), MW-1.31-release-notes (WMF-deploy-2018-04-17 (1.31.0-wmf.30)), MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Discovery, CirrusSearch, Wikimedia-log-errors

Apr 19 2018

EBernhardson removed a project from T192615: Add 'type' field to store same information as was in es5 types.: Epic.
Apr 19 2018, 11:37 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson triaged T192616: Update saneitizer to reindex documents that havn't been indexed in N days as Normal priority.
Apr 19 2018, 11:36 PM · Discovery-Search
EBernhardson triaged T192615: Add 'type' field to store same information as was in es5 types. as Normal priority.
Apr 19 2018, 11:35 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson removed a project from T192614: Resolve current deprecation warnings in elasticsearch 5: Epic.
Apr 19 2018, 11:34 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson triaged T192614: Resolve current deprecation warnings in elasticsearch 5 as Normal priority.
Apr 19 2018, 11:34 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Patch-For-Review
EBernhardson added a comment to T192609: Search backend error during sending {numBulk} documents to the {index} index(s) after {tookMs}: {error_message}.

The error message seems to be mostly of the form:

Apr 19 2018, 10:51 PM · Discovery-Search (Current work), MW-1.31-release-notes (WMF-deploy-2018-04-17 (1.31.0-wmf.30)), MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Discovery, CirrusSearch, Wikimedia-log-errors
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

Cluster just alerted on latency again. elastic1027 and 1025 has pushed server load > # of cores and are now doing about 2x the latency of other servers. The initial spike started at about 21:40 UTC, i was able to start taking measurements at 21:50 UTC after it alerted. Latency was still abnormally high, resolving around 22:15.

Apr 19 2018, 8:53 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

Ori had suggested looking at memory installed, and it brought up a pertinent point with respect to the memory bandwidth:

Apr 19 2018, 5:30 PM · Patch-For-Review, Discovery-Search (Current work)

Apr 18 2018

EBernhardson added a comment to T123442: Pageview API: Better filtering of bot traffic on top enpoints.

I was talking to someone about bot detection, and they mentioned that they have gotten good mileage in bot filtering by grading ip addresses by the ratio of html pages requested. I ran a quick query against a day's webrequest logs to get a top level idea of whats plausible:

Apr 18 2018, 10:42 PM · Analytics, Pageviews-API
EBernhardson updated subscribers of T191236: Resolve elasticsearch latency alerts.
Apr 18 2018, 8:14 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T191236: Resolve elasticsearch latency alerts.

High level theory: We are over consuming some resource on the machines. This is basically IO (network, disk), CPU, and Memory. IO was a problem in the past, but doesn't look like a problem this time around. So i grabbed intel's performance counter monitor and used it to look at some top level cpu/io stats and look for differences between the older servers performing poorly, and the newer servers that are doing well.

Apr 18 2018, 8:06 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson edited P7007 (An Untitled Masterwork).
Apr 18 2018, 3:34 PM