Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Paste
P8201
splitdump.py
Active
Public
Actions
Authored by
•
Gilles
on Mar 14 2019, 4:17 PM.
Edit Paste
Archive Paste
View Raw File
Subscribe
Mute Notifications
Award Token
Flag For Later
Tags
Performance-Team
Referenced Files
F28388247: raw.txt
Mar 14 2019, 4:17 PM
2019-03-14 16:17:39 (UTC+0)
Subscribers
None
import
subprocess
input_file
=
'/mnt/data/xmldatadumps/public/svwiki/20190301/svwiki-20190301-pages-meta-history.xml.bz2'
output
=
subprocess
.
check_output
([
'getlastidinbz2xml'
,
'-f'
,
input_file
,
'--type'
,
'page'
])
revid
=
int
(
output
.
split
(
':'
)[
1
])
file_id
=
0
step
=
1000000
fspec
=
[]
for
range_start
in
range
(
0
,
revid
,
step
):
file_id
+=
1
range_end
=
range_start
+
step
if
range_start
==
0
:
range_start
=
1
if
range_end
>
revid
:
range_end
=
revid
fspec
.
append
(
'svwiki-
%02d
.bz2:
%d
:
%d
'
%
(
file_id
,
range_start
,
range_end
))
fspec
=
','
.
join
(
fspec
)
command
=
[
'writeuptopageid'
,
'-fspec'
,
fspec
,
'-i'
,
input_file
,
'-odir'
,
'.'
]
print
(
' '
.
join
(
command
))
subprocess
.
call
(
command
)
Event Timeline
•
Gilles
created this paste.
Mar 14 2019, 4:17 PM
2019-03-14 16:17:39 (UTC+0)
•
Gilles
mentioned this in
T218316: writeuptopageid failing to split svwiki dump
.
Log In to Comment