Thu, Jun 13
Interesting, wonder whats different:
$ docker run --name some-mariadb -e MYSQL_ROOT_PASSWORD=password -d mariadb:10.3.15 a8a35ac96e5556b78a813df08eec9198e0ef6499e37428783d56ddfb8c531a65
Pulled a mysql 5.7.26 container, and indeed in boolean mode mysql doesn't like +-. Interestingly I tried with mariadb 10.3.15 and it emits the same error. This suggests we are doing some sort of different query building on mysql vs mariadb?
Tue, Jun 11
Thanks! The string keys thing is a bit of a shame...I did some googling and there are some flags to try flipping (hive.metastore.try.direct.sql = false on metastore daemon) but no clue what the knock-on effects are. This may still be useful in some cases even if it only handles strings.
Mon, Jun 10
Fri, Jun 7
Thu, Jun 6
Tue, Jun 4
Certainly matching an _all field is the only thing reasonably performant here. We could create a descriptions_all if that's needed.
Mon, Jun 3
not exactly on-wiki, but this is being worked on in a tool: https://tools.wmflabs.org/global-search/
The hive tables in glent.db need to be reassigned to the analytics-search user. Otherwise all the data is dropable.
Zero is perhaps too low of a threshold. While not expected, if 1 or 100 updates fail in a short blip it's not a big deal. I think what we want to know is if it starts failing at some significant percentage that requires intervention. Perhaps if >1% of updates fail?
Fri, May 31
Everything looks to be running appropriately. Thanks!
Thu, May 30
seems reasonable enough, should be relatively easy to implement.
Wed, May 29
you wont be able to include the relevance sort in global-search, it's a highly customized thing that needs the full cirrussearch implementation to work out. The remainder are property based sorts and can all be ported to global-search easily enough.
hiprand looks to be rocm specific, it should come from the package here: http://repo.radeon.com/rocm/apt/debian/pool/main/r/rocrand/
Tue, May 28
Moving it to stat1007 seems reasonable to me. A test run of t9he script using sudo shows it completing as expected. I expect the log output to /var/log/refinery on stat1007 will fail, the analytics-search user does not have write access there. Since this isn't really refinery, should we create a new log directory in /var/log/analytics-search or some such to emit to?
We have an analytics-search user, member of the analytics-search group. This was primarily done because we needed an account search platform could sudo to to submit oozie jobs, would it make sense to also run the systemd timer under this user and add it to the analytics group for read access to hite-site.xml ?
Fri, May 24
This is still only taking group0 updates, waiting to roll out group1 updates on figuring out a proper inbound loadbalancer for job runners -> cloudelastic. Without this a single host in cloudelastic being unavailable will result in a constant stream of errors in logstash.
Didn't end up building a presentation, but talked to a few people at hackathon and helped create https://tools.wmflabs.org/global-search/
No matches in last 7 days, this looks resolved.
Error message changed today:
Traceback (most recent call last): File "/srv/deployment/analytics/refinery/bin/refinery-drop-hive-partitions", line 153, in <module> hive.drop_partitions(table, partition_specs_to_drop) File "/srv/deployment/analytics/refinery/python/refinery/hive.py", line 190, in drop_partitions return self.query(q, use_tempfile=True) File "/srv/deployment/analytics/refinery/python/refinery/hive.py", line 355, in query out = self.script(f.name, check_return_code) File "/srv/deployment/analytics/refinery/python/refinery/hive.py", line 365, in script return self._command(['-f', script], check_return_code) File "/srv/deployment/analytics/refinery/python/refinery/hive.py", line 372, in _command return sh(cmd, check_return_code) File "/srv/deployment/analytics/refinery/python/refinery/util.py", line 121, in sh .format(command_string, p.returncode), stdout, stderr) RuntimeError: ('Command: hive --service cli --database discovery -f /tmp/tmp-hive-query-cFTKZi.hiveql failed with error code: 1', '', 'Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8\nPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8\nPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8\nPicked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8\nlog4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/conf.analytics-hadoop/hive-log4j.properties\nOK\nTime taken: 0.874 seconds\nFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Got exception: java.io.IOException Failed to move to trash: hdfs://analytics-hadoop/wmf/data/discovery/query_clicks/hourly/year=2019/month=2/day=19/hour=11\n')
Thu, May 23
We made a few fixes for other cloudelastic tickets and it seems to have resolved this as well.
The documentation isn't amazing, but I feel like we need user feedback to improve it much. Will wait and see what people ask about and update with parts people are curious about.
Copying from merged task, this should be fixed by the following patch which runs on this weeks train.
I took a closer look at this and indeed, the mapping is performing lowercasing for all queries to the template field (and we only have a single analysis chain applied). We can probably simply change the analysis chain there, will require some quick review that template boosting is all using appropriately cased template names across wikis that have configured it.
Patches are progressing to fix this, I think there is intention to cherry-pick the change if it can be finished up before no-deploy friday.
Wed, May 22
@Trizek-WMF Seems this might be useful to announce in tech news? I'm not entirely sure, but it seems global wikitext search ought to be useful to a number of editors.
Sorry to take so long, been out recently. I documented my process on wikitech in my user space but would be happy to answer any questions.