Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Paste
P54284
WDQS Graph Split Manual Data Load Notes
Active
Public
Actions
Authored by
RKemper
on Dec 7 2023, 8:12 PM.
Edit Paste
Archive Paste
View Raw File
Subscribe
Mute Notifications
Award Token
Flag For Later
Tags
Discovery-Search (Current work)
Referenced Files
F41571200: WDQS Graph Split Manual Data Load Notes
Dec 7 2023, 8:12 PM
2023-12-07 20:12:52 (UTC+0)
Subscribers
bking
dr0ptp4kt
RKemper
# Downtime host(s) to reduce noise
ryankemper@cumin1001:~$ sudo -E cookbook sre.hosts.downtime --days
7
-r
'graph split experiments T350106'
wdqs102
[
2
-4
]
.eqiad.wmnet
# Set permissions on files if not already sufficient
chmod
555
/srv/T350106/gzips/gzips/gzips/nt_wd_schol/*
# Run from `/srv/T350106/gzips/gzips/gzips/nt_wd_schol/*` to change file ext from .txt.gz to .ttl.gz
for
FILE in *
;
do
NEW_FILE
=
"
$(
echo
$FILE
|
sed
's~.txt.gz~.ttl.gz~'
;
)
"
;
sudo mv
$FILE
$NEW_FILE
;
done
# Disable puppet, stop blazegraph, clear out jnl file, start blazegraph, restart exporter
sudo disable-puppet
"T350106"
&&
sudo systemctl stop wdqs-blazegraph
&&
sleep
5
&&
rm -fv /srv/wdqs/wikidata.jnl
&&
sleep
5
&&
sudo systemctl start wdqs-blazegraph
&&
sudo systemctl restart prometheus-blazegraph-exporter-wdqs-blazegraph.service
# Get slightly-modified loadData.sh into place if not present
scp loadData.sh ryankemper@wdqs1023.eqiad.wmnet:/home/ryankemper/loadData.sh
# Modify further to match file format of /srv/T350106/gzips/gzips/gzips/nt_wd_schol/* (in this case) if necessary
vi /srv/T350106/loadData.sh
# Run on first chunk
sudo /srv/T350106/loadData.sh -n wdq -d /srv/T350106/gzips/gzips/gzips/nt_wd_schol -s
0
-e
0
# Run on all remaining chunks
sudo /srv/T350106/loadData.sh -n wdq -d /srv/T350106/gzips/gzips/gzips/nt_wd_schol
Event Timeline
RKemper
created this paste.
Dec 7 2023, 8:12 PM
2023-12-07 20:12:52 (UTC+0)
RKemper
mentioned this in
T350106: Implement a spark job that converts a RDF triples table into a RDF file format
.
dr0ptp4kt
mentioned this in
T359062: Assess Wikidata dump import hardware
.
Apr 4 2024, 11:08 AM
2024-04-04 11:08:29 (UTC+0)
dr0ptp4kt
mentioned this in
T362920: Benchmark Blazegraph import with increased buffer capacity (and other factors)
.
Thu, May 2, 7:57 PM
2024-05-02 19:57:03 (UTC+0)
Log In to Comment