Page MenuHomePhabricator

GSoC 2022 Proposal: Rewrite the Wikidata Infobox on Commons in Lua
Open, Needs TriagePublic

Description

Profile Information

Name: Lennard Hofmann
GitHub Profile: Ordoviz
Location: Germany
Typical working hours: 10:00–20:00 (UTC+2)

Synopsis

Almost 4 million category pages on Wikimedia Commons use the Wikidata Infobox template, which is not implemented efficiently: Previewing a category page can take several seconds—long enough to be annoying. This project aims to address this problem by rewriting the infobox in Lua.

@Mike_Peel is the mentor for this project.

Implementation Ideas

Currently, the infobox requests the values for certain Wikidata properties individually, resulting in hundreds of calls to Lua modules. Since the infobox needs almost everything from the connected Wikidata item, I am planning to write a Lua module that fetches this data in a single request using mw.wikibase.getEntity. A similar infobox, Module:Databox uses this approach successfully: It even processes "COVID-19 pandemic in Colombia" (one of the biggest Wikidata items) quickly.

Timeline

If I manage to complete any task ahead of schedule, I will start working on the next task. In that case, the left-over time at the end of the project will be used to fix bugs reported on the talk page, such as 'none' values breaking the template.

May 20 – June 12 (Community Bonding Period)
  • Figure out how to communicate and work together effectively.
  • Discuss how to handle Wikidata items with dozens of values for a single property (as seen on Category:LibreOffice).
  • Write Lua functions that fetch content from Wikidata.
  • Document these functions so that Mike Peel and others can easily use them to extend the infobox.
June 13 – July 10 [4 weeks]
  • Rewrite most property requests in Lua, many of which look like {{#invoke:Wikidata Infobox|formatLine|P196| {{#invoke:WikidataIB|getValue|P196|…}} }} and can easily be rewritten. The other requests are more complex; porting those will take some time. [3 weeks]
  • Test and deploy the new property requests. Fix bugs reported by the community. [1 week]
July 11 – July 31 [3 weeks]
  • Port features from old infobox: Label, description, images, and sitelinks (displayed at the top of the infobox). [2 weeks]
  • Test and deploy those features. Fix bugs reported by the community. [1 week]
  • Phase 1 Evaluation [July 25 – July 29]
August 1 – August 21 [3 weeks]
  • Port a feature from old infobox: Automatic categorization (autocat). [2 weeks]
  • Test and deploy autocat. Fix bugs reported by the community. [1 week]
August 22 – September 4 [2 weeks]
  • Port features from old infobox: Authority control and helper links (displayed at the bottom of the infobox) [1 week]
  • Test and deploy those features. Fix bugs reported by the community. [1 week]
September 5 – September 12 [1 week]

The code should look like this now (massively simplified):

Template:Wikidata Infobox/core
{{#invoke:Wikidata Infobox|autocat}}
<table class="infobox">
 {{#invoke:Wikidata Infobox|header}}
 {{#invoke:Wikidata Infobox|properties}}
 {{#invoke:Wikidata Infobox|footer}}
</table>
  • Rewrite it so that it invokes Wikidata Infobox only once for maximum performance.
  • Call for feedback from the community ("Tell us how the new infobox looks on your favorite categories!")
  • Submit finished project and final mentor evaluation.
Participation

I will publish my code on sandbox pages (here and here to be precise; this should allow us to test the new module on any category using Special:TemplateSandbox). I will stay in contact with Mike Peel on Zulip. For proposing changes to the template that need to be discussed in a larger group, I will resort to the template's talk page. I will write bi-weekly blog posts documenting my progress.

About Me

I am a 19-year-old student from Germany. When the community bonding period starts on May 20, I will have written my final Abitur exam, allowing me to fully focus on this project. Apart from a few individual days, I have no time commitments during the coding phase.

When I signed up on Wikipedia in 2017 I had no clue what I got myself into. Now, I am an active editor on Wikimedia and Fandom wikis, partly because I enjoy solving the technical challenges that come with maintaining a wiki. Since I love wikis and free software, I immediately went to see which projects Wikimedia has to offer when I heard about GSoC on Mastodon. This project caught my attention because figuring out how a template works is something I regularly do, but writing Lua modules is still pretty new and exciting to me.

I hope to gain experience through my internship that I can use in future projects to continue being a contributing member to the community beyond the summer.

Note that I am intentionally not linking to my Wikimedia account because I want to keep it separate from my real life identity for now. I have created a new account for this project.

Past Experience

Recently, I have fixed a little bug in Module:Databox (see its talk page) and an "expression error" in the Wikidata Infobox (see here).

I have made small contributions to various other open-source projects, including a MediaWiki syntax highlighter for the Kakoune text editor.

Event Timeline

Hi @LennardHofmann, this is looking good! I think you also have to submit this via Google's system (please make sure you do so!).

One substantive comment: it would be great if the timeline could take a more iterative approach. You've already broken the infobox down into key parts: ideally for each of these, after they're converted to Lua, we'd test and deploy them during the project, rather than having one big update at the end of the project. That way, it's easier to catch edge cases, there's more time for community feedback, and it doesn't matter so much if there are things that can't be converted before the end of the project. Since the infobox is quite modular, this might be possible - although it's a bit more complex. What do you think?

Hi! I am Srishti, one of the org admins - it's great to see your interest in applying to GSoC with Wikimedia! You can safely ignore this message if you have already followed our participants' guide. As you develop your proposal, we want to ensure that you follow the application process steps: https://www.mediawiki.org/wiki/Google_Summer_of_Code/Participants#Application_process_steps, primarily communicate with project mentors, integrate their feedback in your proposal, adhere to the guidelines around proposal submission, contribute to microtasks, etc. Let us know if there are any questions!

@Mike_Peel Great suggestion, thanks! Replacing each key part of the infobox one after the other would be no problem from a technical standpoint. But it would mean that I might have to fix urgent bugs reported after the deployment of a key part, while I'm already working on the next feature. Considering that the alternative is that I would need to fix my buggy code (written weeks ago) at the end of the project, this seems reasonable, though.

I've edited the proposal and submitted it to Google.

@LennardHofmann Thanks! Those changes look good. I can now see it in the Google dashboard. With "I might have to fix urgent bugs reported after the deployment of a key part, while I'm already working on the next feature" - yes, this is the norm when developing community-facing code!

As the GSoC deadline is soon approaching in less than 24 hours (April 19, 2022, 18:00 UTC), please ensure that the information in your proposal on Phabricator is complete and you have already submitted it on the Google's program website in the recommended format. When you have done so, please move your proposal here on the Phabricator workboard https://phabricator.wikimedia.org/project/board/5716/ from "Proposals in Progress" to the "Proposals Submitted' column by simply dragging it. Let us know if you have any questions.