Page MenuHomePhabricator

Research team process and technical blockers (items from our 2018 offsite)
Closed, ResolvedPublic

Description

Here are the action items and team norms from the team process and technical blockers meeting at Research Offsite 2018. @everyone means everyone on the Research team.

Action items

  • Shorten the Monday Research Weekly meeting to one hour. @DarTar
  • Provide some sort of centralized onboarding/expectation setting documentation about the processes we have for formal collaborators, the way that we work together, template emails, etc. @everyone
  • Come up with a way to handle the backlog. Create a process for grooming it. @everyone
  • Create documentation that describes the process of writing research tasks in Phabricator. @leila

[declined] Modify the Research Weekly project to include forward-looking tasks.
[declined] Document Phabricator columns to indicate their meanings. For example, the Blocked column is currently used to indicate external dependencies. Could we possibly use this to indicate internal blockers too? This would allow us to avoid daily standup emails. @bmansurov

  • Dario needs specs for dedicated machines for Research. Dario makes the request to Mark via a ticket. @DarTar / @analytics

[addressed by other solutions] Diego tries stat1004, Hadoop for a month or so, and reports back if it's good or bad for his needs. @diego

[ongoing work with some improvements] Find a way to access Commons images easily/efficiently for research purposees. @bmansurov

  • Dario to talk with Ops folks to see if they can support GPU (either with current resources or with the resources that will join Ops in the coming months). @DarTar
  • Research needs to figure out the use-cases for GPU to justify introducing machines with GPUs. @Miriam
  • We need Commons data on stats machine. Analytics (Dan) to ask Filippo about direct access to Swift Object Store. @Milimetric See T184744.

[declined] We need better ways of communicating the pipeline of data generation - and reproducibility of generating data. @DarTar
[declined as are approaching needs and prioritization differently] We need to find ways of learning about our users: Who is using what? Who needs what? @DarTar
[declined. data engineering takes care of this.] Think about data infrastructures. @everyone

  • Discuss the future of the dumps (during FY19 session). One example of such needs at T182351.
  • Better ways of accessing dumps and media files.
  • Many questions that we have involve dump parsing - everytime scripts need to be re-written when mining data from the dumps.
  • Formalize ways of representing data structures of articles including: wiki projects, category, presence of media files. Talk to the parsing team. Research to work with Analytics to have this as part of Data Lake.

[x now on gitlab] Research our options of hosting code on Github. @bmansurov See T187795

Team norms

  • Point of contact rather than an external collaborator should take ownership of tasks in Phabricator.
  • When a task is being discussed outside Phabricator with external collaborators, update the task with a short description about developments.

Event Timeline

DarTar triaged this task as Medium priority.
DarTar unsubscribed.
DarTar renamed this task from Team process and technical blockers to Research team process and technical blockers (items from our 2018 offsite).Feb 5 2018, 6:47 PM

Marked the swift task done, I asked Fillipo and we updated the task Miriam opened. In short, we can access all of the files at low resolution through the API. If that's too slow, we can dump them to HDFS, Swift can do anything we want, but ideally we try the API first.

leila updated the task description. (Show Details)
leila added subscribers: DarTar, bmansurov.