Page MenuHomePhabricator

make sure all datasets in xmldatadumps/public/other on dataset1001 are accounted for on new labs boxes
Closed, ResolvedPublic

Description

Going to list here everything in that subdirectory, indicate whether there it's old static data or there's a job that updates it from somewhere.
As each of the items that gets updates has its job moved, it can be checked off.

Related Objects

StatusAssignedTask
Resolvedbd808
ResolvedArielGlenn
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
ResolvedArielGlenn
ResolvedArielGlenn
Resolvedezachte
ResolvedArielGlenn
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy
Resolved madhuvishy

Event Timeline

ArielGlenn triaged this task as Normal priority.Mar 2 2018, 12:32 PM
ArielGlenn created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMar 2 2018, 12:32 PM
ArielGlenn added a comment.EditedMar 2 2018, 1:16 PM
DIRECTORYSTATICJOBMOVED
analyticsYpuppet-
androidY--
articlefeedbackY--
bugzillaY--
categoriesrdf"misc" dumpsame job
cirrussearch"misc" dumpsame job
clickstreamdumps::web::fetches::statssame job
contenttranslation"misc" dumpsame job
diffdbY--
docsY--
experimentalY--
fundraisingY--
fundraising.20120213Y--
globalblocks"misc" dumpsame job
imageinfo"misc" dumpsame job
imageinfo_archiveY--
incr"misc" dumpsame job
incrsY--
index.htmlYpuppet-
index.html.20120301Y--
iOSY--
kiwixdumps::web::fetches::kiwixsame job
mediaEZachtedumps::web::fetches::stat_dumps
mediacountsdumps::web::fetches::statssame job
mediatitles"misc" dumpssame job
mediawikiY--
miscprofile::phabricator::maindumps::web::fetches::phab
multistreamY--
openzimY--
pagecounts-all-sitesY--
pagecounts-ezEZachtedumps::web::fetches::stat_dumps
pagecounts-newY--
pagecounts-rawY--
pagetitles"misc" dumpssame job
pageviewsdumps::web::fetches::statssame job
PlayBookY--
potyY--
scansetY--
searchY--
shorturlsY--
slow-parserole::logging::mediawiki::udp2logdiscontinued
staticY--
static_html_dumpsY--
surveysY--
testfilesY--
toolsY--
unique_devicesdumps::web::fetches::statssame job
wepY--
wikibase"misc" dumpssame job
wikichallengeY--
wikidata"misc" dumpssame job
wikitechdumps::web::fetches::wikitech_dumpssame job
win8Y--
  • "misc" dumps are everything except xml/sql dumps, run off of the 'misc dumps cron' snapshot host. they should already be available on labstore6,7.
  • "EZachte": cron jobs from EZachte's home directory on stat1005, see separate list below
  • "puppet": these html files are managed by puppet.
  • "dumps::web::fetches::stats": pulls via rsync from stat1005.eqiad.wmnet::hdfs-archive
  • "dumps::web::fetches::kiwix": pulls via rsync from download.kiwix.org
  • "dumps::web::fetches::wikitech_dumps": pulls via wget from https://wikitech.wikimedia.org/dumps/
  • "profile::phabricator::main": rsync from phab1001
  • "role::logging::mediawiki::udp2log": rsync from mwlog1001.eqiad.wmnet
  • "-" indicates there's no job, as these folders are not updated.

Notes on EZachte's jobs

  • wikistats/dumps/bash/count_editors_yoy.sh -- rsyncs to dataset1001=dataset1001.wikimedia.org::pagecounts-ez/wikistats/
  • wikistats/dumps/bash/rsync.sh -- rsyncs to dataset1001::pagecounts-ez/projectcounts
  • wikistats/image_sets/bash/publish_zips.sh -- rsyncs to dataset1001.wikimedia.org::media/contest_winners/WLM
  • wikistats/dammit.lt/bash/dammit_compact_daily.sh -- rsyncs to dataset1001.wikimedia.org::pagecounts-ez/merged/
  • wikistats/dammit.lt/bash/dammit_compact_monthly.sh -- rsyncs to dataset1001.wikimedia.org::pagecounts-ez/merged/
  • wikistats/dammit.lt/bash/dammit_projectviews_monthly.sh -- rsyncs to dataset1001.wikimedia.org::pagecounts-ez/projectviews
  • wikistats/dammit.lt/bash/dammit_published_merged.sh -- rsyncs to dataset1001.wikimedia.org::pagecounts-ez/merged/dataset1001.wikimedia.org::pagecounts-ez
  • wikistats/dammit.lt/bash/t2.sh -- rsyncs to dataset1001.wikimedia.org::pagecounts-ez/merged/
  • wikistats/dammit.lt/bash/t.sh -- rsyncs to dataset1001.wikimedia.org::pagecounts-ez/merged/
  • wikistats/dammit.lt/bash/dammit_sync.sh -- rsyncs to

Marking off which jobs have been moved or terminated, there is still at least one that has not been.
The job for other/media explicitly covers only the contest-winners subdirectory.

bd808 moved this task from Backlog to Dumps on the Data-Services board.Mar 2 2018, 1:42 PM

Change 416863 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] dumps: Switch rsyncer profile to use host settings from hiera

https://gerrit.wikimedia.org/r/416863

Change 416866 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] dumps: Move rsyncer to distribution profile path and rename

https://gerrit.wikimedia.org/r/416866

Change 416869 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] dumps: Split up rsync config to base, mirrors, and datasets

https://gerrit.wikimedia.org/r/416869

Change 416901 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] dumps: Fold distribution related rsync config to profile/ path

https://gerrit.wikimedia.org/r/416901

Change 416863 merged by ArielGlenn:
[operations/puppet@production] dumps: Switch rsyncer profile to use host settings from hiera

https://gerrit.wikimedia.org/r/416863

Change 416866 merged by ArielGlenn:
[operations/puppet@production] dumps: Move rsyncer to distribution profile path and rename

https://gerrit.wikimedia.org/r/416866

Change 416869 merged by ArielGlenn:
[operations/puppet@production] dumps: Split up rsync config to base, mirrors, and datasets

https://gerrit.wikimedia.org/r/416869

Change 416901 merged by ArielGlenn:
[operations/puppet@production] dumps: Fold distribution related rsync config to profile/ path

https://gerrit.wikimedia.org/r/416901

Change 420078 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] analytics: Allow labstore1006|7 to rsync from stat*

https://gerrit.wikimedia.org/r/420078

Change 420078 merged by Madhuvishy:
[operations/puppet@production] analytics: Allow labstore1006|7 to rsync from stat*

https://gerrit.wikimedia.org/r/420078

Note: Slowparse in other/ logs are being deprecated (T189284)

madhuvishy closed this task as Resolved.Apr 20 2018, 3:28 AM
madhuvishy claimed this task.
ArielGlenn reopened this task as Open.Apr 25 2018, 12:45 PM

T188149 is open and there's a changeset (wrong for our purposes but at least gives us the info we need) here: https://gerrit.wikimedia.org/r/#/c/428540/

Change 429197 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] pull phabricator dumps from phab server to dumps web server

https://gerrit.wikimedia.org/r/429197

bd808 added a subscriber: bd808.

If assigning this to you is the wrong thing to do @ArielGlenn, let me know what help is needed and I'll try to round up another person.

Change 429197 merged by ArielGlenn:
[operations/puppet@production] pull phabricator dumps from phab server to dumps web server

https://gerrit.wikimedia.org/r/429197

Change 429666 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] fix up source dir for sync of phab dumps to public webserver

https://gerrit.wikimedia.org/r/429666

Change 429666 merged by ArielGlenn:
[operations/puppet@production] fix up source dir for sync of phab dumps to public webserver

https://gerrit.wikimedia.org/r/429666

ArielGlenn closed this task as Resolved.Apr 30 2018, 6:59 AM

All datasets are now accounted for. Closing for good!