Create puppet configs for CQS. This will be applied to CQS when the servers are available and racked.
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Role for SDoC WDQS | operations/puppet | production | +137 -15 | |
| Role for SDoC WDQS | operations/puppet | production | +137 -15 |
Event Timeline
The configuration changes for SDC data are as follows. Note that namespace 'sdc' is used to store RDF data in blazegraph journal, might be changed as needed. It is not recommended to keep the namespace the same as for Wikidata (wdq), as it might result in conflicts while deploying the services on shared server (if such configuration will be implemented) and also might result in addressing the wrong namespace in the Blazegraph journal returning improper data for the queries.
- Blazegraph journal config (RWStore.properties)
replace the similar configuration for WDQS (search for com.bigdata.namespace.wdq prefix for the parameters to be replaced):
# Bump up the branching factor for the lexicon indices on the default kb. com.bigdata.namespace.sdc.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=400 com.bigdata.namespace.sdc.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=599 com.bigdata.namespace.sdc.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=300 # Bump up the branching factor for the statement indices on the default kb. com.bigdata.namespace.sdc.spo.JUST.com.bigdata.btree.BTree.branchingFactor=1024 com.bigdata.namespace.sdc.spo.OSP.com.bigdata.btree.BTree.branchingFactor=866 com.bigdata.namespace.sdc.spo.POS.com.bigdata.btree.BTree.branchingFactor=954 com.bigdata.namespace.sdc.spo.SPO.com.bigdata.btree.BTree.branchingFactor=934
Note, that the final configuration should be adjusted for the real production data according to instructions in T232768.
- Scripts to run Updater should be called with proper namespace:
On data load:
./loadRestAPI.sh -n wdq -d `pwd`/data/split
replace by
./loadRestAPI.sh -n sdc -d `pwd`/data/split
On single file load:
./loadRestAPI.sh -n wdq -d `pwd`/data/split/wikidump-000000001.ttl.gz
replace by
./loadRestAPI.sh -n sdc -d `pwd`/data/split/wikidump-000000001.ttl.gz
On run updater:
./runUpdate.sh -n wdq
replace by
./runUpdate.sh -n sdc
On any calls to Blazegraph REST, instead of
http://localhost:9999/bigdata/namespace/wdq/sparql
use
http://localhost:9999/bigdata/namespace/sdc/sparql
Categories store might need similar changes, but that has to be discussed, if separate categories are needed for production SDC data.
Matt's initial work has gotten us most of the way there. In reviewing whats available now, and booting a test instance to see if it can fully setup a new instance from scratch (hint: no).
Background info
- The current sdcquery instance applies the role::wdqs::labs.
- There is a tiny amount of instance specific puppet config in horizon
- Most puppet config is in the main puppet repo at hieradata/cloud/eqiad1/wikidata-query/common.yaml. This config is for a wikidata query service though, not the SDoC query service.
Issues to address
- Logging is configured to ship to deployment-logstash2, but logstash-beta.wmflabs.org doesn't report any logs from the existing instance or the one i recently booted.
- The instance is also configured with the beta cluster eventgate endpoint (for request logging), once an instance is running we can verify correct operation. Even if we get the events flowing, if we want to do anything with them followup will be required.
- Puppet currently includes the updater and categories, afaik we want those disabled for the new instance
- Brand new instances currently don't complete the puppet run due to trying to clone wdqs repo into a parent directory that doesn't exist
- Once the instance was up it only responded for a few minutes after which jetty reports Service Unavailable on /bigdata/. Unclear yet what causes this.
- Current sdcquery sets use_deployed_config=true which means puppet doesn't control the blazegraph configuration, it's whatever happens to be on the machine.
- Currently profile::query_service::blazegraph installs the primary blazegraph instance, but profile::query_service::categories installs the categories specific blazegraph instance. It's unclear if sdoc should have a new role and new profile, or if a new sdoc role should point at the current blazegraph profile. There are significant amounts of duplication between blazegraph and cateogies query_service profiles, but puppet doesn't make it super elegant to put all these things together generically and still understand what the differences are.
Probably more, but this is an initial look through. I'm going to put together a patch to address some of the above and get the instance starting from a cold boot, but there are some open questions above that could use any insight others might have.
Change 595041 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] Role for SDoC WDQS
Instructions for booting a new instance. Currently this requires pointing the instance at a puppetmaster in the wikidata-query project.
- Start new instance in horizon. Use debian-stretch and m1.large
- Set the puppetmaster to wsqspuppet.wikidata-query.eqiad.wmflabs
- Apply hiera first, then roles
- Run sudo puppet agent -tv to apply the change now (or wait and it will happen eventually).
hiera:
profile::query_service::blazegraph_heap_size: 6g profile::query_service::blazegraph_use_deployed_config: false profile::query_service::data_dir: /srv/wdqs-data profile::query_service::forward_rsyslog_host: deployment-logstash03.deployment-prep.eqiad.wmflabs profile::query_service::load_categories: none profile::query_service::package_dir: /srv/wdqs-package puppetmaster: wdqspuppet.wikidata-query.eqiad.wmflabs
roles;
role::labs::lvm::srv role::wdqs::sdoc
Change 602102 had a related patch set uploaded (by Gehel; owner: EBernhardson):
[operations/puppet@production] Role for SDoC WDQS
Change 602102 abandoned by Gehel:
Role for SDoC WDQS
Reason:
replaced by I11763c3ebbfa21e958a5933573eef627b134e573