Page MenuHomePhabricator

MGerlach (Martin Gerlach)
Senior Research Scientist

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Sep 9 2019, 9:50 AM (241 w, 1 d)
Availability
Available
IRC Nick
mgerlach
LDAP User
MGerlach
MediaWiki User
MGerlach (WMF) [ Global Accounts ]

Recent Activity

Today

MGerlach added a comment to T349774: Maintain wikiworkshop.org website.

Request for a small (but important) change:

Thanks

Wed, Apr 24, 7:59 AM · Patch-For-Review, Research

Fri, Apr 19

MGerlach added a comment to T361944: Orphan articles as reading recommendations .

weekly update:

  • no update this week
Fri, Apr 19, 4:14 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

  • extracted 100 suitable pairs of original/simplified snippets for comparison (sheet)
  • from this we will select two groups of each 10 pairs for the pilot: i) "treatment group": the two version of the pair are very different in their automatic readability score; ii) "control group": the two versions of the pair are very similar in their automatic readability score.
Fri, Apr 19, 4:13 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T361947: Evaluate simplification in multilingual setting.

weekly update:

  • no update (still working on T354653)
Fri, Apr 19, 4:08 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • finished experiments with Flan-T5 model in English
  • writing up results for the meta-page (incomplete draft as doc)
Fri, Apr 19, 4:07 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352543: Review and select submissions for research track.

weekly update:

  • checked the review form
  • testing the bidding interface for reviewers and prepared upcoming comms with PC members
  • monitoring submissions (deadline next week) and answering questions by authors
Fri, Apr 19, 4:05 PM · Research (FY2023-24-Research-April-June), Research-outreach
MGerlach updated subscribers of T362419: Attend Wikimedia Hackathon 2024.
Fri, Apr 19, 8:51 AM · address-knowledge-gaps, Research-foundational, Research-outreach, Research
MGerlach updated the task description for T362419: Attend Wikimedia Hackathon 2024.
Fri, Apr 19, 8:48 AM · address-knowledge-gaps, Research-foundational, Research-outreach, Research
MGerlach created T362957: Create a dataset for training/evaluating models for summarizing (long) discussions.
Fri, Apr 19, 8:44 AM · Wikimedia-Hackathon-2024

Wed, Apr 17

MGerlach updated the task description for T362419: Attend Wikimedia Hackathon 2024.
Wed, Apr 17, 8:24 AM · address-knowledge-gaps, Research-foundational, Research-outreach, Research
MGerlach moved T362751: [Session] ML/AI models on LiftWing from Backlog to Proposed sessions on the Wikimedia-Hackathon-2024 board.
Wed, Apr 17, 7:52 AM · Wikimedia-Hackathon-2024
MGerlach created T362751: [Session] ML/AI models on LiftWing.
Wed, Apr 17, 7:52 AM · Wikimedia-Hackathon-2024

Fri, Apr 12

MGerlach moved T354653: Work on model optimization and scaling from FY2023-24-Research-January-March to FY2023-24-Research-April-June on the Research board.
Fri, Apr 12, 3:43 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T361944: Orphan articles as reading recommendations .

weekly update:

  • spent some time to try to figure out whats the bottleneck in the current setup
  • starting to brainstorm different options for improvement such as i) pre-computing look-up tables, ii) narrow down candidates earlier in the pipeline, iii) using embeddings to take advantage of fast approximate nearest neighbor lookup, etc....
Fri, Apr 12, 3:42 PM · Research (FY2023-24-Research-April-June)
MGerlach closed T347702: Improve link recommendation model for orphan articles as Resolved.

weekly update:

Fri, Apr 12, 3:38 PM · Research (FY2023-24-Research-January-March)
MGerlach closed T347702: Improve link recommendation model for orphan articles, a subtask of T293030: [EPIC] Specify new task for Linking articles as a structured tasks, as Resolved.
Fri, Apr 12, 3:38 PM · Research, Epic
MGerlach added a comment to T361942: Run revised 3rd survey on readability perception.

weekly update:

  • Set up a new survey in limesurvey with adapted questions (clarify instructions, and additional scales to capture education level of participants)
  • Revisied the layout of the survey for easier navigation/answering
  • Revising sampling of snippets (e.g. formatting of text by switching parsing from wikitext to HTML)
Fri, Apr 12, 3:35 PM · Research (FY2023-24-Research-April-June)
MGerlach updated the task description for T361942: Run revised 3rd survey on readability perception.
Fri, Apr 12, 3:34 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T361947: Evaluate simplification in multilingual setting.

weekly update:

  • no update this week
Fri, Apr 12, 3:34 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • no update this week
Fri, Apr 12, 3:34 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352543: Review and select submissions for research track.

weekly update:

  • added review form to OpenReview (thanks @KinneretG for helping with that)
  • preparing communication with PC members for bidding and review assignments
Fri, Apr 12, 3:33 PM · Research (FY2023-24-Research-April-June), Research-outreach
MGerlach updated the task description for T352543: Review and select submissions for research track.
Fri, Apr 12, 3:32 PM · Research (FY2023-24-Research-April-June), Research-outreach
MGerlach updated the task description for T362416: Attend ICWSM 2024 conference.
Fri, Apr 12, 3:04 PM · Research-foundational, address-knowledge-gaps, Research-outreach, Research
MGerlach created T362419: Attend Wikimedia Hackathon 2024.
Fri, Apr 12, 2:32 PM · address-knowledge-gaps, Research-foundational, Research-outreach, Research
MGerlach created T362416: Attend ICWSM 2024 conference.
Fri, Apr 12, 2:27 PM · Research-foundational, address-knowledge-gaps, Research-outreach, Research

Fri, Apr 5

MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • organizing results from different experiments with t5 models
  • came across a paper [1] which showed that the Flan-T5 large model (also considered above) performs exceptionally well for the related task of summarization:

Our experimental results show that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is a fine-tuned FLAN-T5-Large, which achieves performance on par with much larger LLMs (from 7B to more than 70B) used in zero-shot settings, while being significantly smaller. This makes smaller LLMs like FLAN-T5 a suitable cost-efficient LLM for real-world deployment.

  • Assuming that things hold similarly for simplification, this would support the choice to use Flan-T5-large model for our simplification task; and that we would not gain much from using much larger models (which we wouldnt be able to host in our infrastructure anyways)
Fri, Apr 5, 4:10 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • starting sprint to finish corresponding paper with results T356504
Fri, Apr 5, 3:57 PM · Research (FY2023-24-Research-January-March)
MGerlach created T361947: Evaluate simplification in multilingual setting.
Fri, Apr 5, 2:24 PM · Research (FY2023-24-Research-April-June)
MGerlach created T361944: Orphan articles as reading recommendations .
Fri, Apr 5, 2:13 PM · Research (FY2023-24-Research-April-June)
MGerlach created T361942: Run revised 3rd survey on readability perception.
Fri, Apr 5, 1:41 PM · Research (FY2023-24-Research-April-June)
MGerlach renamed T347701: [Stretch] Generate the google2wiki dataset from Generate the google2wiki dataset to [Stretch] Generate the google2wiki dataset .
Fri, Apr 5, 1:23 PM · Research (FY2023-24-Research-April-June)
MGerlach moved T347701: [Stretch] Generate the google2wiki dataset from FY2023-24-Research-January-March to FY2023-24-Research-April-June on the Research board.
Fri, Apr 5, 1:21 PM · Research (FY2023-24-Research-April-June)
MGerlach updated the task description for T361929: [Research Engineering Request] Building end-to-end training pipeline for the add-a-link model.
Fri, Apr 5, 11:04 AM · Research (FY2023-24-Research-April-June)
MGerlach created T361929: [Research Engineering Request] Building end-to-end training pipeline for the add-a-link model.
Fri, Apr 5, 11:02 AM · Research (FY2023-24-Research-April-June)
MGerlach added a project to T361926: Improve training and inference pipeline for multilingual link recommendation model: Research (FY2023-24-Research-April-June).
Fri, Apr 5, 10:38 AM · Research (FY2023-24-Research-April-June)
MGerlach created T361926: Improve training and inference pipeline for multilingual link recommendation model.
Fri, Apr 5, 10:36 AM · Research (FY2023-24-Research-April-June)

Tue, Apr 2

MGerlach changed the status of Restricted Task, a subtask of T335799: Review papers and give feedback, from Stalled to Open.
Tue, Apr 2, 6:17 PM · Epic, Research-outreach, Research

Thu, Mar 28

MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly updates

  • work on the model is finished
  • adding results to the meta-page
  • also working towards finishing the paper for submission by April 15 T356504
Thu, Mar 28, 4:59 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly updates:

  • collecting the results from the different experiments we have been running.
  • especially have been interested to check manually some of the differences of the different models to better understand their pros and cons. especially using a filtered, higher quality dataset for training seemed to yield much better simplification
  • did not progress as much as I wanted in the 2nd half of the week as I had to switch attention unexpectedly and on a short notice to responding to reviews for the submitted paper on readability within 2 days
Thu, Mar 28, 4:58 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347701: [Stretch] Generate the google2wiki dataset .

weekly update:

  • there hasnt been any progress on this as the work is still blocked by the bug we found in the data generation pipeline. these weeks, we (incl. collaborators) have not had the capacity to systematically debugging.
Thu, Mar 28, 4:55 PM · Research (FY2023-24-Research-April-June)

Mar 22 2024

MGerlach updated the task description for T347702: Improve link recommendation model for orphan articles.
Mar 22 2024, 3:45 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

Mar 22 2024, 3:43 PM · Research (FY2023-24-Research-January-March)

Mar 15 2024

MGerlach closed T347700: Present a synthesis about recent research on readers as Resolved.

weekly update:

Mar 15 2024, 8:33 AM · Research (FY2023-24-Research-January-March)
MGerlach updated the task description for T347700: Present a synthesis about recent research on readers.
Mar 15 2024, 8:30 AM · Research (FY2023-24-Research-January-March)

Mar 1 2024

MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • no updates this week
Mar 1 2024, 2:50 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • no updates this week
Mar 1 2024, 2:50 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347700: Present a synthesis about recent research on readers.

weekly update:

Mar 1 2024, 10:38 AM · Research (FY2023-24-Research-January-March)
MGerlach closed T352542: Recruit reviewers for Program Committee as Resolved.

weekly update:

  • finalized the program committee recruiting: we have 50 confirmed members for the PC. this is exactly our target to be able to cover up to 100 submissions. if we anticipate that we will receive more submissions, we can re-open the task and recruit more reviewers for the PC.
  • additionally, we have a handful of emergency reviewers for short-notice reviews.
  • task completed
Mar 1 2024, 10:36 AM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach closed T352542: Recruit reviewers for Program Committee, a subtask of T355729: Organize the Research track - Wiki Workshop 2024, as Resolved.
Mar 1 2024, 10:36 AM · Research
MGerlach updated the task description for T352542: Recruit reviewers for Program Committee.
Mar 1 2024, 10:28 AM · Research (FY2023-24-Research-January-March), Research-outreach

Feb 23 2024

MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • no update this week
Feb 23 2024, 2:50 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • investigated in more detail the output of the larger models trained with cloud-GPU:
    • No substantial improvement in metrics
    • Qualitative evaluation of small random sample shows 3 main types of results: no/small changes, good changes, Non-sensical changes (e.g. lots of repetitions)
    • Overall conclusion: The key to improving the simplification is not necessarily to go to larger model sizes. instead, we see that the model can be really good in some cases, but this only happens in extreme cases when the original text is very complicated.
  • this suggests, we need cleaner training data that contains pairs of parallel text that capture meaningful text simplification operations:
    • generated a filtered dataset where the simplified version is i) substantially simpler as measured via the Flesch-Kincaid grade level (i.e. at least 5 grades simpler), ii) the text is the same size or a bit shorter but not too extreme (reduction up to 66%). the values were chosen ad-hoc. this yields a much smaller training data (10% of original size) but, hopefully, provides a better signal for the model fine-tuning.
    • started to re-train the model with the filtered dataset.
Feb 23 2024, 2:50 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347700: Present a synthesis about recent research on readers.

weekly update:

  • organized feedback from last week
  • reduced the amount of content
  • added dedicated slides about findings from yu-ming (surveys) and mike (interviews) to give a more holistic picture of the team's work in this space
Feb 23 2024, 2:40 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347701: [Stretch] Generate the google2wiki dataset .

weekly update:

  • no update this week
Feb 23 2024, 2:40 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352542: Recruit reviewers for Program Committee.

weekly update:

  • sent first large batch of invitations to join program committee
  • after 1 week, we have 40 accepts (target size of PC is 50)
  • response rate is 50% so far, so we will send a reminder next week to those who havent responded yet.
Feb 23 2024, 2:38 PM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach updated the task description for T352542: Recruit reviewers for Program Committee.
Feb 23 2024, 2:35 PM · Research (FY2023-24-Research-January-March), Research-outreach

Feb 16 2024

MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • didnt manage to fully finish the paper due to some unexpected commitments for the lead author
  • instead, we will aim for the next deadline in the cycle of rolling reviews (April 15) https://aclrollingreview.org/ this will give us some extra time to polish the writing in the next weeks.
Feb 16 2024, 3:58 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • no update this week
Feb 16 2024, 3:56 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347700: Present a synthesis about recent research on readers.

weekly update:

  • gave a dry-run of the presentation in the tuesday meeting
  • slightly over time so needs some shortening
  • received lots of good feedback for how to improve the content.
  • will work on implementing the suggested changes in the next week.
Feb 16 2024, 3:55 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347701: [Stretch] Generate the google2wiki dataset .

weekly update:

  • When preparing documentation of some example cases, I discovered a bug with fetching clickstream data over time due to change in page-title
  • this needs fixing before we can proceed with data publication; easiest fix is by properly capturing redirects of article titles and combining their pageviews
Feb 16 2024, 3:54 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352542: Recruit reviewers for Program Committee.

weekly update:

  • figured out how to do the reviewer-recruitment in openreview; sent test invitation to myself and solving some smaller issues (avoid visibility of submitted papers to reviewers before submission deadline)
  • preparted and formatted list of first batch of invitees. I will be waiting for Monday next week to send invites in order to be able to answer to potential questions without delays
Feb 16 2024, 3:52 PM · Research (FY2023-24-Research-January-March), Research-outreach

Feb 15 2024

MGerlach changed the status of Restricted Task, a subtask of T335799: Review papers and give feedback, from Open to Stalled.
Feb 15 2024, 12:27 PM · Epic, Research-outreach, Research

Feb 9 2024

MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • working on the paper with collaborators to finalize it for the submission deadline next week (Feb 15)
Feb 9 2024, 4:47 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • coordinating with Fabian on next steps for using cloud-GPU
  • one of the challenges is to find a compromise between thorough experiments and costs
  • looking into more details of first training of a large model with 3B paramters to decide whether it makes sense to continue in this direction
Feb 9 2024, 4:43 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347700: Present a synthesis about recent research on readers.

weekly update:

  • revised slidedeck for the dry-run of the presentation in next week's tuesday meeting with the research team.
Feb 9 2024, 4:39 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347701: [Stretch] Generate the google2wiki dataset .

weekly update:

  • no update
Feb 9 2024, 4:38 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352542: Recruit reviewers for Program Committee.

weekly update:

  • checked contact details for potential reviewers
  • set up a co-working session with Kinneret next week to learn about using openreview reviewer invitations
Feb 9 2024, 4:38 PM · Research (FY2023-24-Research-January-March), Research-outreach

Feb 2 2024

MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • working on the paper to submit by Feb 15 methods/results with tables/figures are complete the text still needs work, especially in the intro/discussion, but the main lines of thought and talking points are sketched out and wont change **the paper is still too long. we are working on shortening by reomving/compressing (but nothing substantial will be added)
  • shared a draft with Leila for review https://phabricator.wikimedia.org/T356504
Feb 2 2024, 3:38 PM · Research (FY2023-24-Research-January-March)
MGerlach closed T347697: Drafting a paper about the multilingual readability model as Resolved.

weekly update:

  • Completed first full draft of the paper
  • Created task for leila to review readability paper T356415
  • pending minor corrections and the review, we have everything for submission on Feb 15
  • thus, closing task
Feb 2 2024, 9:39 AM · Research (FY2023-24-Research-January-March)
MGerlach closed T347697: Drafting a paper about the multilingual readability model, a subtask of T293028: [EPIC] Initiate Multilingual Readability Research, as Resolved.
Feb 2 2024, 9:39 AM · address-knowledge-gaps, Research, Epic
MGerlach updated subscribers of T354653: Work on model optimization and scaling.

weekly update:

  • Created repository with main code blocks for re-use https://gitlab.wikimedia.org/repos/research/text-simplification
  • @fkaelin used the data/code to start testing model training with cloud-GPUs. The hope is to train larger (and thus better) models which require more computational resources (e.g. more RAM) and then use them for inference in our infrastructure
  • First successful tests with t5-large on A100 GPU with 40GB of ram
Feb 2 2024, 9:36 AM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347700: Present a synthesis about recent research on readers.

weekly update:

  • Adding results from the most recent reader survey (especially about readers' education level) to complement results on readability of articles. Coordinating with yu-ming.
Feb 2 2024, 9:34 AM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347701: [Stretch] Generate the google2wiki dataset .

weekly update:

  • no update this week
Feb 2 2024, 9:33 AM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352542: Recruit reviewers for Program Committee.

weekly update:

  • finalized shortlist
  • defined target size for program committee and decided who to invite
  • prepared invitation email with timeline for review process (bidding period, etc)
  • next step: sending out invitations on openreview (coordinating with @KinneretG to set this up efficiently next week)
Feb 2 2024, 9:33 AM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach closed T352540: Create CfP / Call for contributions for research track as Resolved.

weekly update:

Feb 2 2024, 9:29 AM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach closed T352540: Create CfP / Call for contributions for research track, a subtask of T355729: Organize the Research track - Wiki Workshop 2024, as Resolved.
Feb 2 2024, 9:29 AM · Research
MGerlach updated the task description for T352540: Create CfP / Call for contributions for research track.
Feb 2 2024, 9:29 AM · Research (FY2023-24-Research-January-March), Research-outreach

Jan 26 2024

MGerlach added a comment to T347702: Improve link recommendation model for orphan articles.

weekly update:

  • during the past weeks, the focus of how to best improve the model has shifted
  • instead of optimizing which link to recommend to an orphan, we now systematically approached where to insert recommended links in the text. this would be even more useful to editors who would want to de-orphanize orphan articles, as the current model only recommends which link but not the position in the text. thus, the existing model would become even more actionable.
  • together with collaborators lead by Akhil, we have now completed the analysis of a multilingual model and shown that our model can identify suitable positions in the text for specific link targets (such as to orphans) beating all other baselines. we are currently preparing a paper for submission to a conference (deadline Feb 15).
Jan 26 2024, 6:01 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347697: Drafting a paper about the multilingual readability model.

weekly update:

  • added abstract and discussion section to the paper as well as results from the main experiments (comparison with previous benchmarks, and applying the model to get an overview on the overall state of readability in selected wikis)
  • With only a few minor results missing, we have a full first draft that will be ready to be shared with the Head of Research by the middle of next week; this is in time for the planned submission deadline on Feb 15 to ACL
Jan 26 2024, 5:50 PM · Research (FY2023-24-Research-January-March)
MGerlach updated the task description for T347697: Drafting a paper about the multilingual readability model.
Jan 26 2024, 5:47 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • started discussions with Fabian to use external cloud GPUs to use larger models. for example, for the case of the t5-model discussed above, this would allow us to take advantage of models with up to 11B paramters (instead of the 770M for large). adapting the existing pipeline would be straightforward and we have well-defined evaluation metrics to check if the additional resources reuqired to train those models are justified
  • Completed multilingual experiments with multilingual variant. I now train the model in different languages (not just in English) with our multilingual dataset from readability. We observe a substantial imrovement in performance of these models in non-English languages.
Jan 26 2024, 5:43 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T347700: Present a synthesis about recent research on readers.

weekly update:

  • started reviewing feedback for improvement
  • started working on changes highlighting actionable insights (e.g. preparing statistics for the number of articles above readability level X)
Jan 26 2024, 5:35 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T347701: [Stretch] Generate the google2wiki dataset .

weekly update:

  • no updates this week as I was busy with reviews for the research fund
Jan 26 2024, 5:33 PM · Research (FY2023-24-Research-April-June)
MGerlach updated subscribers of T352542: Recruit reviewers for Program Committee.

weekly update:

  • onboarded @Ptbeytia to the task; we coordinated how to proceed
  • we are in the process of finalizing the shortlist and defining a prioritization of who to invite to the program committee (until next week)
Jan 26 2024, 5:33 PM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach updated subscribers of T352540: Create CfP / Call for contributions for research track.

weekly update:

  • @KinneretG is working to get the content of the CfP on the website (thanks). update should be online by the start of next week (or earlier)
  • onboarded @Ptbeytia to remainder of the task (advertising)
Jan 26 2024, 5:31 PM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach updated Other Assignee for T352545: Organize sessions for research track, added: Ptbeytia.
Jan 26 2024, 9:19 AM · Research (FY2023-24-Research-April-June), Research-outreach
MGerlach updated Other Assignee for T352543: Review and select submissions for research track, added: Ptbeytia.
Jan 26 2024, 9:19 AM · Research (FY2023-24-Research-April-June), Research-outreach
MGerlach updated Other Assignee for T352542: Recruit reviewers for Program Committee, added: Ptbeytia.
Jan 26 2024, 9:19 AM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach updated Other Assignee for T352540: Create CfP / Call for contributions for research track, added: Ptbeytia.
Jan 26 2024, 9:18 AM · Research (FY2023-24-Research-January-March), Research-outreach

Jan 24 2024

MGerlach added a comment to T355729: Organize the Research track - Wiki Workshop 2024.

@MGerlach I was reviewing the learnings from last year's Wiki Workshop (T334112) and there is one item that is related to your track and that is: we decided to prioritize (based on popular vote) preparing a 1-2 page document that helps Research track chairs learn how to prepare for their session and how to run it more smoothly. Is this something you and Pablo can pick up for 2024? Bob and I would be happy to lend you 2 pairs of eyes for feedback.

Jan 24 2024, 8:14 AM · Research
MGerlach updated the task description for T352545: Organize sessions for research track.
Jan 24 2024, 8:12 AM · Research (FY2023-24-Research-April-June), Research-outreach

Jan 23 2024

MGerlach moved T347701: [Stretch] Generate the google2wiki dataset from FY2023-24-Research-October-December to FY2023-24-Research-January-March on the Research board.
Jan 23 2024, 1:23 PM · Research (FY2023-24-Research-April-June)
MGerlach moved T347702: Improve link recommendation model for orphan articles from FY2023-24-Research-October-December to FY2023-24-Research-January-March on the Research board.
Jan 23 2024, 1:23 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T355501: Check home/HDFS leftovers of shubhankar.

@MoritzMuehlenhoff could we keep Shubhankar's data on stat1008 for some time (e.g. in my home directory under /home/mgerlach/shubhankar)? Data on hdfs can be dropped (though I dont think there is any).
Thanks.

Jan 23 2024, 8:36 AM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Data-Engineering

Jan 19 2024

MGerlach added a comment to T347697: Drafting a paper about the multilingual readability model.

weekly update:

  • on track with writing the paper
  • full sketch for all sections/subsections and the corresponding content that should be added
  • good first drafts for sections describing the model and discussion
  • adding results of experiments
Jan 19 2024, 4:47 PM · Research (FY2023-24-Research-January-March)
MGerlach added a comment to T354653: Work on model optimization and scaling.

weekly update:

  • finished experiments and evaluation to increase model size for T5 (small, base, large)
  • while there are larger models, they cant be trained with GPUs on our stat-boxes
  • larger models do yield substantial increase in performance. considering the SARI-score, only the t5-large yields scores on-par with SOTA for the D-wikipedia benchmark data. suggests, there might be benefit in training larger models outside with external resources. serving them for inference in our own infrastructure would potentially be feasible as this requires fewer resources than training
  • starting experiments in multilingual setup
Jan 19 2024, 4:44 PM · Research (FY2023-24-Research-April-June)
MGerlach added a comment to T352542: Recruit reviewers for Program Committee.

weekly update:

  • Scheduled meeting with co-chair next week to revise shortlist of PC members and prepare invitiations
Jan 19 2024, 4:38 PM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach updated the task description for T352542: Recruit reviewers for Program Committee.
Jan 19 2024, 4:38 PM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach updated the task description for T352540: Create CfP / Call for contributions for research track.
Jan 19 2024, 4:38 PM · Research (FY2023-24-Research-January-March), Research-outreach
MGerlach added a comment to T352540: Create CfP / Call for contributions for research track.

weekly update

  • privacy statement was updated
  • coordinated with co-chair to revise CfP (approved from their side)
  • CfP is ready to publish for the website, only blocker is decision on website layout
  • compiled advertising material (incl. update of the logo )
Jan 19 2024, 4:36 PM · Research (FY2023-24-Research-January-March), Research-outreach