Research team process and technical blockers (items from our 2018 offsite)
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• bmansurov
	Feb 1 2018, 9:31 PM

Description

Here are the action items and team norms from the team process and technical blockers meeting at Research Offsite 2018. @everyone means everyone on the Research team.

Action items

Shorten the Monday Research Weekly meeting to one hour. @DarTar
Provide some sort of centralized onboarding/expectation setting documentation about the processes we have for formal collaborators, the way that we work together, template emails, etc. @everyone
Come up with a way to handle the backlog. Create a process for grooming it. @everyone
Create documentation that describes the process of writing research tasks in Phabricator. @leila

[declined] Modify the Research Weekly project to include forward-looking tasks.
[declined] Document Phabricator columns to indicate their meanings. For example, the Blocked column is currently used to indicate external dependencies. Could we possibly use this to indicate internal blockers too? This would allow us to avoid daily standup emails. @bmansurov

Dario needs specs for dedicated machines for Research. Dario makes the request to Mark via a ticket. @DarTar / @analytics

[addressed by other solutions] Diego tries stat1004, Hadoop for a month or so, and reports back if it's good or bad for his needs. @diego

Create a Research project on Toolforge. @bmansurov See T186519.

[ongoing work with some improvements] Find a way to access Commons images easily/efficiently for research purposees. @bmansurov

Dario to talk with Ops folks to see if they can support GPU (either with current resources or with the resources that will join Ops in the coming months). @DarTar
Research needs to figure out the use-cases for GPU to justify introducing machines with GPUs. @Miriam
We need Commons data on stats machine. Analytics (Dan) to ask Filippo about direct access to Swift Object Store. @Milimetric See T184744.

[declined] We need better ways of communicating the pipeline of data generation - and reproducibility of generating data. @DarTar
[declined as are approaching needs and prioritization differently] We need to find ways of learning about our users: Who is using what? Who needs what? @DarTar
[declined. data engineering takes care of this.] Think about data infrastructures. @everyone

Discuss the future of the dumps (during FY19 session). One example of such needs at T182351.
Better ways of accessing dumps and media files.
Many questions that we have involve dump parsing - everytime scripts need to be re-written when mining data from the dumps.
Formalize ways of representing data structures of articles including: wiki projects, category, presence of media files. Talk to the parsing team. Research to work with Analytics to have this as part of Data Lake.

[x now on gitlab] Research our options of hosting code on Github. @bmansurov See T187795

Team norms

Point of contact rather than an external collaborator should take ownership of tasks in Phabricator.
When a task is being discussed outside Phabricator with external collaborators, update the task with a short description about developments.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T186270 Research team process and technical blockers (items from our 2018 offsite)
Declined	None	T187178 Mount XML dumps on stat1004
Resolved	• DarTar	T187795 Give access to Wikimedia github account

Event Timeline

• bmansurov created this task.Feb 1 2018, 9:31 PM

• DarTar claimed this task.Feb 5 2018, 6:37 PM

• DarTar triaged this task as Medium priority.

• DarTar unsubscribed.

• DarTar renamed this task from Team process and technical blockers to Research team process and technical blockers (items from our 2018 offsite).Feb 5 2018, 6:47 PM

• DarTar updated the task description. (Show Details)Feb 6 2018, 12:08 AM

• bmansurov updated the task description. (Show Details)Feb 15 2018, 2:50 PM

to make stat1004 we need to solve this: https://phabricator.wikimedia.org/T187178

Marked the swift task done, I asked Fillipo and we updated the task Miriam opened. In short, we can access all of the files at low resolution through the API. If that's too slow, we can dump them to HDFS, Swift can do anything we want, but ideally we try the API first.

• bmansurov updated the task description. (Show Details)Feb 15 2018, 4:56 PM

• bmansurov updated the task description. (Show Details)Feb 20 2018, 4:03 PM

• fdans closed subtask T187178: Mount XML dumps on stat1004 as Declined.Feb 22 2018, 6:18 PM

• bmansurov closed subtask T187795: Give access to Wikimedia github account as Resolved.Mar 5 2018, 5:19 AM

• DarTar removed • DarTar as the assignee of this task.Feb 15 2019, 11:37 PM

• DarTar subscribed.

leila edited projects, added Research-management; removed Research.Jul 11 2019, 12:31 AM

leila removed subscribers: • DarTar, • bmansurov.

leila closed this task as Resolved.Mar 3 2022, 6:24 PM

leila updated the task description. (Show Details)

leila added subscribers: • DarTar, • bmansurov.