Page MenuHomePhabricator

Rewrite the Wikimedia Commons Wikidata-based Infobox in Lua
Closed, ResolvedPublic

Description

Approved license

I assert that this Outreachy internship project will released under either an OSI-approved open source license that is also identified by the FSF as a free software license, OR a Creative Commons license approved for free cultural works

  • CC-BY-SA

No proprietary software:

I assert that this Outreachy internship project will forward the interests of free and open source software, not proprietary software.

  • Yes

How long has your team been accepting publicly submitted contributions?

  • More than 2 years

How many regular contributors does your team have?

  • 1-2 people

Brief summary

Wikimedia Commons is a multilingual project that hosts over 60 million freely licensed multimedia files, which are available for use on Wikipedia and other websites. These files are mostly organised using categories, which often only include monolingual descriptions..

The current solution is https://commons.wikimedia.org/wiki/Template:Wikidata_Infobox . This is currently used to include a multilingual infobox in over 3 million Wikimedia Commons categories. The categories contain media related to a specific topic, while Wikidata holds structured data about the topic: the infobox brings them together to display summary information about the category contents, in around 300 different languages.

It is currently coded using MediaWiki ParserFunctions and calls to Lua modules, and it currently consumes a lot of server resources. The overall aim of this project is to rewrite it completely in Lua, so that it loads significantly more quickly and efficiently, and to make sure it is easy to expand it in the future.

This project is co-mentored by Mike Peel and RexxS. Knowledge of Lua is an advantage, although it can be learnt during the project. Knowing multiple human languages is useful to check the multilingual contents of the infobox, but is not required.

Minimum system requirements

Only an internet browser is required.

How can applicants make a contribution to your project?

To start with, you will make edits to the current version of the infobox, which mostly works by ParserFunctions, by adding new properties and fixing bugs in the sandbox version, to gain familiarity with how the infobox works, how it handles input data, and what other Wikimedia editors want from the infobox. You will then start migrating specific parts of the infobox to use Lua functions. Ultimately, you will convert the whole infobox into Lua, test its performance, and expand it with new features.

You will need to create an account on Wikipedia (if you don't already have one) via https://commons.wikimedia.org/w/index.php?title=Special:CreateAccount . Potential tasks are listed at https://commons.wikimedia.org/wiki/Template_talk:Wikidata_Infobox (best if you look at the tasks at the bottom of the page) - if you are interested in working on any of them then you should comment there and either I or RexxS (or another community member) will reply with guidance on the specific task, or if you have any questions then please ask through Outreachy or at https://commons.wikimedia.org/wiki/Template_talk:Wikidata_Infobox#Outreachy_project .

Repository

https://commons.wikimedia.org/wiki/Template:Wikidata_Infobox/core/sandbox

Issue tracker

https://commons.wikimedia.org/wiki/Template_talk:Wikidata_Infobox

Intern tasks

Potential tasks include:

  • Improving documentation: reading through the current code and documenting how it works
  • Adding support for additional properties in the infobox
  • Tackling some of the technical issues raised on the talk page (https://commons.wikimedia.org/wiki/Template_talk:Wikidata_Infobox)
  • Converting parts of the infobox to use Lua functions rather than ParserFunctions
  • Installing the current version of the template on other wikis

Intern benefits

You will learn, or improve your knowledge of, Lua coding. You will gain familiarity with how structured data is maintained on Wikidata, and how it is used on Wikimedia Commons. You will see your work live in millions of categories on Commons.

Community benefits

Improved load times and improved display of Wikidata information in Commons categories. A cleaner code base, written entirely in Lua, to be able to better maintain and expand it in the future.

Event Timeline

Based on my initial thoughts at T270429. @RexxS might you be interested in being a co-mentor of this project?

Based on my initial thoughts at T270429. @RexxS might you be interested in being a co-mentor of this project?

Sure, Mike. I've mentored for the last few years at Google Code-In, and I'm currently co-mentoring an Outreachy 21 project on Lua documentation, so I guess I'd be okay to help out in the next round. My co-mentor for that project, Pavithra Eswaramoorthy, is the expert on documentation, so you might want to ask her if she has to time to lend a hand or just do a review of any documentation produced.

@Mike_Peel Sounds like an interesting project for Outreachy! A quick question- how much time you think it would take for an intern with beginner-level skills on the topic to complete this project? Is it sufficient amount of work for 3-months?

@RexxS Thanks! I've added you to the task as a possible mentor.

@srishakatux I think this would take me about a week to code up, as someone that knows various programming languages but not Lua. So I think 3 months is a reasonable timeframe for a beginner programmer. It's also a flexible project: if they only manage to do part of the work, then it's still useful, and if they finish it quickly then I have a lot of follow-up tasks they can work on - see the feature requests at https://commons.wikimedia.org/wiki/Template_talk:Wikidata_Infobox ! BTW, could you clarify with Outreachy vs. Google Code, please? I think this project (and the other one) could fit into both projects, unless Lua isn't a suitable coding language for Google? I'm happy either way, though. :-)

@Mike_Peel The main difference is that a Google Summer of Code project is purely coding related. In contrast, Outreachy also covers non-coding areas (such as research, documentation, design, translation, outreach, etc.)

In my opinion, as GSoC is a purely coding project, ideally, interns feel rewarded for their contributions and can easily showcase them to others (like on Gerrit / Github).

My understanding is that this project will also touch a bit upon documentation, and the code contribution would likely stay on the wiki, right? If so, then the project fits under Outreachy.

@srishakatux The code produced would reside entirely in a module on-wiki, probably developed on Commons (or perhaps enwiki, but that's less likely) and ported to other wikis as required. As we expect editors to deploy the infobox in their native wikis, good documentation is a an essential part of the concept. It certainly looks more like an Outreachy project from my perspective.

@srishakatux Thanks for the background. I'm surprised that GSoC doesn't include documentation, as that's a rather important part of code writing! But as RexxS says, it will stay on-wiki (it wouldn't make sense on Github etc. since it only works on-wiki), so that is a bigger difference. I'm happy either way, though.

(as per discussion keeping it for Outreachy for now)

@Mike_Peel Whenever you are ready (but before the deadline which is March 14th), follow the steps in Step 3 here https://www.mediawiki.org/wiki/Outreachy/Mentors#_Before_the_program to upload a proposal on the Outreachy site. Let me know if you need any help w/ this.

Rewrite of {{Wikidata Infobox}} in Lua is a big task which is very needed. Current implementation of is functionally great, but it long-ago outgrew wiki-templates "language" it is written in. If someone can produce concise, clean, modular, well documented, and well tested (through unit-tests) Lua code which would be easy to pick up and maintain by others, than this task can have very good impact on Commons category namespace.

Mike_Peel renamed this task from Rewrite the Commons Wikidata Infobox in Lua to Rewrite the Wikimedia Commons Wikidata-based Infobox in Lua.Feb 17 2021, 9:01 PM
Mike_Peel updated the task description. (Show Details)

@Srichakradhar @RexxS I started to fill in the Outreachy form, but it asks very different questions from those that were asked here (or for other student projects that I'm used to). That's not a problem, but it was a bit unexpected. I'm rewriting the task description to match the questions they are asking, and will check back before submitting the project.

@Mike_Peel Probably you added @Srichakradhar by mistake.

Right, Outreachy asks for different set of questions but I believe that some of the questions are optional and for some you can reuse the information you have added to this task. Once you submit a request for mentoring a project, I will get a notification asking to approve it and then I can also take a look..

@Mike_Peel Deadline to submit a project on Outreachy site is coming up. Ideally, you submit a project before the end of this week following the instructions in comment below :)

@Mike_Peel Whenever you are ready (but before the deadline which is March 14th), follow the steps in Step 3 here https://www.mediawiki.org/wiki/Outreachy/Mentors#_Before_the_program to upload a proposal on the Outreachy site. Let me know if you need any help w/ this.

@srishakatux Sorry for the delay. I've rewritten the task description above, and I've now submitted it, hopefully that's OK. @RexxS any changes you want making? Thanks for the follow-up messages, and sorry for the wrong ping before!

@Mike_Peel No worries! I've approved the project :) As this will be your first time mentoring in Outreachy, if you have any questions about the program, feel free to send me an email anytime. If you would like a more in-depth overview of the program, we can also set up a call to discuss.

@srishakatux Unfortunately, it turns out that RexxS is no longer available to co-mentor, so I am going to have to withdraw this, sorry. The problem is that I am not too familiar with Lua (RexxS is), and without that the project can't really go ahead. I'm not sure who else could co-mentor it with me. Hopefully that will change in the future (if I can find another co-mentor, or if I learn Lua myself) and this can be submitted to a future Outreachy round instead.

@Mike_Peel Ahha :-/ The problem is that we reserved a slot (and budget) already for this project for Outreachy, and if this project doesn't make it to the list, then the slot will go unused. Would it be possible for you to find another mentor, maybe by reaching out to potential candidates via mailing lists / IRC / Telegram, and see if you can find someone? I can also offer a bit of help in finding a mentor with Lua skills.

@srishakatux If you can help find another mentor that knows Lua, that would be great. I pinged most of the relevant people I could think of in the other proposal, but didn't get any response. I'll try to think of other people too. I still want to see this project take place, I just didn't foresee this issue.

Another option, if you definitely need a project this time round: I've been working on auto-creating Wikidata items for new Wikipedia articles, using Python/pywikibot, and there's a good project there to improve matching with existing articles (particularly in other languages), and also to import more content from Wikipedia to Wikidata when creating the item (and also going back through existing items to import more info). This is https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Pi_bot_19 at present. I could sole-supervise a student working on this project, or finding a co-mentor that knows Python might be somewhat easier. I can write this up tomorrow if that would help.

@srishakatux I set up T276329 for the task I described in my previous comment, if that looks OK? As I said, that's a project where I know all of the steps, so I could sole-mentor it or could co-mentor with anyone that knows Python.

Or, @Jarekt @Ederporto @Multichill Might you be interested in co-mentoring this infobox project, particularly from the Lua side of things? Or know someone else that could?

@srishakatux I set up T276329 for the task I described in my previous comment, if that looks OK? As I said, that's a project where I know all of the steps, so I could sole-mentor it or could co-mentor with anyone that knows Python.

This is a nice project too and would work! Maybe then as a next step you can get the description up on Outreachy website and then withdraw the other project you uploaded?

@srishakatux I set up T276329 for the task I described in my previous comment, if that looks OK? As I said, that's a project where I know all of the steps, so I could sole-mentor it or could co-mentor with anyone that knows Python.

This is a nice project too and would work! Maybe then as a next step you can get the description up on Outreachy website and then withdraw the other project you uploaded?

Thanks - hopefully I've managed to do this now! I didn't understand before that I had to withdraw the mentor approval rather than just the project. The new project has also now been submitted.

I'm boldly removing the tag to avoid any confusion among applicants. :)

@srishakatux I'd like to propose this as a GSOC 2022 project, if that would be OK. I think it's important that this template gets converted to Lua, as it would lead to significant performance increases on Commons. Without RexxS being able to participate it's a much more challenging programming exercise with a prerequisite on Lua programming experience ('code this in Lua instead of wikitext' rather than 'learn Lua while doing this project'), which I think is more suitable to GSOC than Outreachy. There are also a *lot* of requests on the template talk page, which would challenge a decent programmer to implement, so I think there would be more than enough to do for even the best student. I'd be happy to co-supervise with others if anyone would be interested (and would do my best to find a Lua expert to help).

(I'm thinking that this would be much more focused on Lua coding than my initial proposal - documentation/research/design has mostly already been handled, and I could improve it as this work progresses - the challenge really is to implement this as efficiently as possible in Lua.)

@srishakatux Reading around the current Outreachy round, I'm more convinced that this would be more suitable for GSOC than Outreachy, since Outreachy explicitly says at https://www.outreachy.org/docs/community/#not-work-for-hire that "Outreachy is not a way to find a skill set that your community lacks. Mentors should be able to coach Outreachy interns on the skills they use in the project. For example, if a community wants an Outreachy intern work on a JavaScript project, they need at least one mentor experienced in JavaScript." - so if we don't have a Lua mentor, then we can't run this as an Outreachy project. I'm not sure if Google has a similar rule, but I got the feeling that they were more open to harder programming challenges.

Unless @Jarekt or others with Lua experience might be willing to co-mentor this in a future Outreachy round?

@Mike_Peel Would it be possible for you to add the details of the project also here https://www.mediawiki.org/wiki/Google_Summer_of_Code/2022#Ideas_for_projects? Thanks! Also, ensure that this task description adheres to the format in here https://phabricator.wikimedia.org/project/view/2537/.

@srishakatux Thanks! Let's make a fresh start with the project description - follow-up is at T302098.