This will still send a lot of logs next time we do a reindex. We should be doing one in the near future. I can leave this open until the extra logs are removed.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 15 2020
ended up tweaking it just a little bit so I'll leave that here just on the off chance someone else (or me) needs it
import elasticsearch import sys
Apr 13 2020
From the analysis chain analysis comparing the chain with and without the homoglyph token filter on a sample of 10,000 random articles for each language:
Discussed and this might be a task better served by SRE tooling and possibly for a future Search Platform SRE person
Apr 9 2020
Apr 8 2020
I tested and everything works. Thanks @elukey so much for all of your help getting this done! I'm going to go ahead and mark this as closed.
Apr 6 2020
Apr 2 2020
Mar 18 2020
Mar 11 2020
Instead of filtering the query string queries, we want to move off of query string for spaceless languages and on to using the full text simple match query builder. This will help when we upgrade elastic search and no longer use query strings. In order to the make this move, the FTSM query builder has to be tested in relforge for Japanese. There's currently an upgrade going on with relforge, so this task will be paused until the python upgrade for relforge is complete.
Mar 10 2020
Mar 5 2020
Feb 13 2020
Config changes have been deployed but due to a configuration conflict with jawiki using the default for wgCirrusSearchFullTextQueryBuilderProfile (see https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Spaceless_Writing_Systems_and_Wiki-Projects), some further changes will have to be made to have the model name show up here
Feb 10 2020
@Jdforrester-WMF WMDE will be taking on responsibility for any new deployment methods. That work will be tracked in T192006 and T210286.
Feb 7 2020
Jan 21 2020
@Addshore that's correct, after removing the gui submodule, I won't be doing any further work
Jan 17 2020
very exciting to see it work here: https://gerrit.wikimedia.org/r/c/search/extra/+/563267. I know @Gehel mentioned trying to refactor where the same job runs for both pre-merge and post-merge but after chatting with @Jdforrester-WMF, it seems that convention is to have separate pre and post merge jobs. I would be happy to call this done.
Tabling this for now as it's not urgent
After a bunch of discussion with the team, it's been decided that removing the gui submodule from the RDF repository will suffice for now. That will fix our broken build issues (see https://phabricator.wikimedia.org/T242640) @Ladsgroup I definitely think you should work on that patch and getting things going with service runner if you have the bandwidth.
Jan 16 2020
@kostajh do we still need to separate sonar args for master vs non master branches then? it seems that we should be able to send all of the same sonar args whether or not the branch is master. I'm not sure how to tell the bot if something is pre or post merge.
Jan 13 2020
Had a quick sync meeting with WMDE. The outcome of that was to use this node patch as a starting point for service runner. It's unclear whether or not blubber needs to be involved in this process. Also, ideally the public image for the WDQS UI would be eliminated in favor of the new image used for this new build process.
Jan 10 2020
@Gehel I think we can consider this closed unless someone is able to reproduce
Jan 9 2020
@akosiaris Could we possibly use miscweb in front of a VM as an interim to serve up the static files before moving to service template?
@kostajh everything has been merged, and the code health job runs with sonar analysis after a patch for java projects. However we're not seeing any results from bots in the test patch in search extra (https://gerrit.wikimedia.org/r/563250) with analysis here: https://sonarcloud.io/project/activity?id=org.wikimedia.search%3Aextra-parent. Does the bot know about java projects?
you're right, it's a typo. It should be /run-java.sh. Pushing up a patch now
for clarification the correct response will contain a list that looks like this
@prefix schema: <http://schema.org/> . @prefix pq: <http://www.wikidata.org/prop/qualifier/> . @prefix pr: <http://www.wikidata.org/prop/reference/> . @prefix ps: <http://www.wikidata.org/prop/statement/> .
and the incorrect response is HTML that looks similar to
<!DOCTYPE html><html lang="en" dir="ltr"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,user-scalable=yes"><link rel="stylesheet" href="css/style.min.6c0e4865f687302c4d99.css"><link id="favicon" rel="shortcut icon"><script src="js/shim.min.6d0a3b4d4b50e4f73d3e.js"></script><style id="MJX-CHTML-styles">/* placeholder for MathJax */</style></head><body><div class="wikibase-queryservice container-fluid">
Jan 8 2020
from inside any of the WDQS machines ( 'wdqs1004.eqiad.wmnet','wdqs1005.eqiad.wmnet', 'wdqs1006.eqiad.wmnet','wdqs1007.eqiad.wmnet')
the following curls return the correct data
curl localhost:80/bigdata/ldf -> direct to nginx server on host
curl localhost:9999/bigdata/ldf -> direct to query service on host
but curl https://query.wikidata.org/bigdata/ldf is not working indicating some problem with routing traffic.
This could be the recent switch from varnish to apache.
Dec 21 2019
Dec 20 2019
I am all for reducing duplication but in this case, perhaps we can see if we get it working first and then try to reduce the duplication?
Dec 18 2019
What I was trying to say is that all of the projects are currently sending their analysis to SonarQube, so I didn't want to change any postmerge jobs. I put what I thought in the patch
I think specifically the updates are around this ticket, https://phabricator.wikimedia.org/T235833
Thanks @Jdforrester-WMF
@kostajh I had some questions about the layout. I know you said to create an extension-codehealth-java similar to https://github.com/wikimedia/integration-config/blob/master/zuul/layout.yaml#L884. There doesn't seem to be a standardized pattern for the urls in the same way for non extension projects. Also, some of the java jobs have different post merge jobs that run from each other. I was thinking of adding codehealth separately to each project, in these blocks, https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/zuul/layout.yaml#8115. I'm not sure if that will affect future plans that you have with the codehealth stuff going forward
@Gehel, @EBernhardson mentioned that our new Elasticsearch cluster version doesn't have the same issue with data replication when upgrading the cluster, which means that stopping writes might be less important in the next upgrade.
Dec 13 2019
Dec 12 2019
I put a WIP patch out. I haven't included the changes needed for the php renaming or anything that will have to happen in the jib directory. Those could also go in a separate patch if that makes things more readable. I would love to get feedback and I was wondering if there were any other ways to test outside of running the docker container locally which I've been doing. Also, I plan to put the project/job-template in the search.yaml file, that seems like a reasonable home.
Dec 6 2019
I too am requesting Kerberos credentials for the stat and notebook machines. My username is mstyles
Dec 3 2019
Thanks so much for your very detailed write up. I hope that can make it into official documentation somewhere
email address: mstyles@wikimedia.org
wikitech name: mstyles
Dec 2 2019
Nov 27 2019
Nov 26 2019
Yep, I'll take a look