Hey Baha,
this is the generic "systems and programs onboarding" checklist we put together for our newest team members. Many of these items won't apply to you since you're already familiar with our stack and workflows but I list them here nonetheless for consistency. Please check them as done or ping me if there are specific items you would like to prioritize in terms of gaps in your familiarity with any of these systems or programs. I expect your first week(s) will be pretty heavy on reading and asking questions to the rest of the team.
# Data and systems
[X] Stats machines (Analytics)
[ ] Hadoop / Hive / Analytics cluster (Analytics) -- familiarize yourself with [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest | webrequest logs ]]. (Note: This is good in general, but also helpful for a task such as T171694 which is well-contained task)
[ ] EventLogging (instrumentation) (Analytics)
[ ] Dump parsing libraries http://pythonhosted.org/mediawiki-utilities/ (Diego)
[ ] SQL replicas (Analytics)
[ ] Quarry (Jonathan)
[ ] Wikidata
[ ] general intro to the data model (Dario)
[ ] WDQS / SPARQL (Lukas)
[ ] JSON dumps
# Programs
Familiarize yourself with the various programs Research is contributing to through the end of FY18: https://phabricator.wikimedia.org/tag/research-programs/
//high-priority programs with engineering dependencies //
**Program 9: knowledge gaps**
* [ ] Read the annual plan: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Final/Programs/Technology#Program_9:_Growing_Wikipedia_across_languages_via_recommendations
* [ ] Read about vertical expansion: https://arxiv.org/abs/1604.03235
* [ ] Checkout GapFinder: https://www.mediawiki.org/wiki/GapFinder
* [ ] Read about horizontal expansion: https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_stubs_across_languages
* [ ] Read about recommendations of other assets: https://meta.wikimedia.org/wiki/Research:Recommending_Images_to_Wikidata_Items
* [ ] Schedule a 1:1 overview with the program owner (Leila) as needed
**Program 11: improving citations **
* [ ] Read the annual plan: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Final/Programs/Technology#Program_11:_Improving_citations_across_Wikimedia_projects
* [ ] Read the following:
- https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements
- https://meta.wikimedia.org/wiki/Research:Citation_Click_Data
* [ ] Browse the 2017 WikiCite report for an overview of the program and its goals https://doi.org/10.6084/m9.figshare.5648233
* [ ] Schedule a 1:1 overview with the program owner (Dario) as needed
//other programs with fewer or no engineering dependencies //
**Program 4: understand technical audiences**
* [ ] Read a summary of our main contribution here: T171220 and https://meta.wikimedia.org/wiki/Research:Growth_and_diversity_of_Technology_team_audiences
* [ ] Schedule a 1:1 overview with the project owner (Jonathan) as needed
**Program 12: increasing diversity**
* [ ] Read the annual plan: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Final/Programs/Technology#Program_12:_Grow_contributor_diversity
* [ ] Read the general direction for the research at https://meta.wikimedia.org/wiki/Research:Voice_and_exit_in_a_voluntary_work_environment
* [ ] Schedule a 1:1 overview with the program owner (Leila) as needed
**CD - Community Health (Segment 3: Research)**
* [ ] Read the annual plan: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Final/Community_Health#Segment_3:_Research_on_harassment
* [ ] Read the following research pages
- https://meta.wikimedia.org/wiki/Research:Study_of_harassment_and_its_impact
- https://meta.wikimedia.org/wiki/Research:Wikihounding_and_Machine_Learning_Analysis
- https://meta.wikimedia.org/wiki/Research:Detox
- https://meta.wikimedia.org/wiki/Research:Topical_coverage_of_Edit_Wars
- https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects
* [ ] Schedule a 1:1 overview with the segment owner (Dario) as needed
**CD - Structured data (Segment 4: Programs)**
* [ ] Read the annual plan https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Final/Structured_Data#Segment_4:_Programs
* [ ] Schedule a 1:1 overview with the segment owner (Jonathan) as needed