Page MenuHomePhabricator

Install and configure new WDQS nodes on codfw
Closed, ResolvedPublic

Description

We have 2 servers ready and available in codfw for WDQS (T142864). Configuration must now take place.

I want to take this opportunity to start using the default partitioning config and move data to /srv. This requires some changes in the way we deploy / configure blazegraph and thus some discussion with @Smalyshev to make sure the situation is optimal both from an operation and from a dev perspective.

Details

Related Gerrit Patches:

Event Timeline

Gehel created this task.Aug 31 2016, 9:47 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptAug 31 2016, 9:47 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Ideally, I'd like to have the data directory entirely configurable from puppet. At the moment, this is managed through symlinks declared both in puppet and in scap configuration. We should be able to simplify this by:

  • moving RWStore.properties to /etc/wdqs/RWStore.properties, under puppet control
  • use a template to generate journal file location: com.bigdata.journal.AbstractJournal.file=<%= @data_dir %>/wikidata.jnl
  • modify runBlazegraph.sh to use this new configuration file: -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/etc/wdqs/RWStore.properties
  • remove /srv/deployment/wdqs/wdqs/wikidata.jnl symlink

Additional cleanup (nice but not absolutely necessary at this point):

  • externalize log4j configuration from blazegraph-service-*-dist.war, or at least the path to the rules.log file. This could be done in different ways:
    • -Dlog4j.configurationFile=/etc/wdqs/blazegraph-log4j.xml added to JVM options
    • use of environment variables for the path of the log file
    • use an include in the log4j configuration inside of the blazegraph war
  • remove /srv/deployment/wdqs/wdqs/rules.log symlink

All those changes would move some part of the configuration outside of the wdqs-deploy repo and thus outside of the full control of developers. It might have impact to the way developers work on wdqs.

Change 307868 had a related patch set uploaded (by Smalyshev):
Make config file path configurable

https://gerrit.wikimedia.org/r/307868

Change 307868 merged by jenkins-bot:
Make config file path configurable

https://gerrit.wikimedia.org/r/307868

Smalyshev moved this task from Needs triage to WDQS on the Discovery board.Aug 31 2016, 11:30 PM
Gehel added a comment.Sep 5 2016, 4:11 PM

The cleanup of rules.log file is low priority and tracked on T144539. It will not be done as part of this task.

Mentioned in SAL [2016-09-08T08:52:56Z] <gehel> initial data mimport on wdqs codfw cluster - T144380

Mentioned in SAL [2016-09-08T12:52:04Z] <gehel> redeploying wdqs on wdqs2002.codfw.wmnet - T144380

Gehel added a comment.Sep 8 2016, 5:32 PM

The RWStore.properties is not actually created during scap deployment (we get the default one). There is some issue with the scap groups. I need to dig a bit more on how scap works...

Gehel added a comment.Sep 9 2016, 4:44 PM

I'm starting to understand a few things about scap:

  • the [codfw.wmnet] section in scap.cfg is activated if we deploy from codfw, not to codfw
  • --environment cannot be used to enable a specific section

It might just be easier to deploy a templated RWStore.properties on both eqiad and codfw, which will be using a different vars.yaml with the correct patch to the data file. Especially since this is only a temporary solution until we move data to the same place on both eqiad and codfw.

the [codfw.wmnet] section in scap.cfg is activated if we deploy from codfw, not to codfw

That looks like a bug. Who needs different behavior on the same target depending on deployment host? Should we report it somewhere?

But if that doesn't work, let's do templates everywhere.

One strange thing that I see is that symlink is still created, even though the check is listed only for canary and wdqs groups. I wonder what's going on there.

the [codfw.wmnet] section in scap.cfg is activated if we deploy from codfw, not to codfw

That looks like a bug. Who needs different behavior on the same target depending on deployment host? Should we report it somewhere?
But if that doesn't work, let's do templates everywhere.

What I understood is that there is a concept of "realm" and a concept of "environments" which are similar and confusing, but not quite the same. I had some conversation with @thcipriani about that, and a lot of time reading documentation and code, but I'm still kind of lost.

Overall, I don't think scap3 is meant to do what we are trying to do with it.

Change 309961 had a related patch set uploaded (by Gehel):
wdqs - use the saem deployment strategy on both eqiad and codfw

https://gerrit.wikimedia.org/r/309961

Change 309961 merged by Smalyshev:
wdqs - use the same deployment strategy on both eqiad and codfw

https://gerrit.wikimedia.org/r/309961

Change 310307 had a related patch set uploaded (by Gehel):
wdqs - make RWStore configuration file configureable

https://gerrit.wikimedia.org/r/310307

Change 310308 had a related patch set uploaded (by Gehel):
wdqs - use configuration file generated by scap

https://gerrit.wikimedia.org/r/310308

Change 310307 merged by Gehel:
wdqs - make RWStore configuration file configureable

https://gerrit.wikimedia.org/r/310307

Change 310316 had a related patch set uploaded (by Gehel):
wdqs - generate RWStore cofig file in /etc/wdqs

https://gerrit.wikimedia.org/r/310316

Change 310316 merged by Smalyshev:
wdqs - generate RWStore cofig file in /etc/wdqs

https://gerrit.wikimedia.org/r/310316

Mentioned in SAL (#wikimedia-operations) [2016-09-13T18:08:23Z] <gehel> moving to scap deployed configuration for wdqs - T144380

Change 310308 merged by Gehel:
wdqs - use configuration file generated by scap

https://gerrit.wikimedia.org/r/310308

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Dec 16 2016, 9:59 AM
Gehel closed this task as Resolved.Dec 22 2016, 9:52 AM

Oops... yes, it has been done for some time... We now have 2 new servers (T152643 and T152644) but that's a different task...