Page MenuHomePhabricator

GSoC 2019 Proposal: Integrate SVG Translate with Content Translation
Closed, ResolvedPublic

Description

Profile Information

Name: Ton Creus Costa
IRC handle: townie
Location: Barcelona, Catalonia / Spain
Working hours: 10 AM - 5 PM EST

Synopsis

SVG Translation was one of the top wishes in the 2017 Community Wishlist. This tool makes it easier to translate SVG files for users who have no experience in doing so. This tool could also be integrated within the interface of the Content Translation tool, so that users have it even easier to translate the file's labels. This project will focus on this integration, working hand to hand with the developers of the Content Translator, the SVG Translation tool and the community.

Mentors: @Petar.petkovic (Content Translation)
Co-mentors: @Samwilson and @Niharika (SVG Translation tool), @Barcelona (Amical Wikimediay, community)

Timeline

PeriodTask
May 6 - 27, 2019Community Bonding period. In-depth study of the Content Translation interface, beginning mock-ups with the Content Translation team, maybe start working on the project during the 2019 Wikimedia Hackathon.
May 27 - 31, 2019GSoC officially begins. Finish the UI mock-ups and User Experience that began during the Community Bonding period.
June 3 - 7, 2019Study the SVG Translate back end, so as to reproduce it from the Content Translation tool, and begin this integration
June 10 - 14, 2019Detection of potentially translatable vector files. Detect if it has been already translated; if it hasn't, work on a warning and extend the Image Options card in order to replace the image, or start working on a translation from Content Translation itself.
June 17 - 21, 2019Create dialogue to start working on a translation from Content Translation itself. Replicate the SVG Translate tool interface, making the needed adjustments to follow the Content Translation conventions.
June 24 - 28, 2019Phase I Evaluation. Meanwhile, work on the "Replace the Image" option, and suggest possible replacements.
July 1 - 5, 2019Work on the automatic translations of the fields, as well on uploading the translation to Commons
July 8 - 12, 2019Ask for feedback to the Catalan Wikipedia (my home wiki, where Content Translator is widely used), as well as other communities. Study issues that may arise from translating images; for instance, finding ways to accommodate for different text lengths, options to avoid overlapping and overflowing, etc.
July 15 - 19, 2019Make sure that the Content Translation front-end doesn't become obsolete due to changes in the SVG Translation back-end.
July 22 - 26, 2019Phase II Evaluation
July 29 - August 2, 2019Incorporate automatic translations in the SVG Translate tool; that is, doing the inverse of what I'd been doing up to that point
August 5 - 9, 2019Study how to incorporate annotations using the data from text boxes. If it's feasible and worthwhile, work on it during this and the following week.
August 12 - 16, 2019Code clean-up, documentation and guides, make announcements, etc.
August 19 - 26, 2019Final Evaluation

Other deliverables

  • Weekly posts on the Amical Wikimedia blog
  • Work on further developing and consolidating Amical's technical team, finding future challenges (possible meet-up)
  • Create guides for Amical / Catalan Wikipedia community based on what I learn during GSoC

Participation

The main tools for communicating would be Phabricator, IRC, and e-mail/Skype when needed. I plan on publishing my code at Amical Wikimedia's GitHub. I may attend the 2019 Wikimedia Hackathon in Prague, so if any fellow developers of SVG Translation / Content Translation are there, we could already start. I'd also like to work on future editions of Google Summer of Code, and hopefully encourage other members of Amical to participate.

About Me & Past Experience

I am a freshman student of Aerospace Engineering at the Polytechnical University of Catalonia, and it's my first time applying for Google Summer of Code. My older brother, who is also a Wikipedian, introduced me to this program years ago, when I still couldn't participate. When I suggested applying for this program during the last annual meeting of Catalan Wikipedians, I got an overwhelmingly positive response, and they have been encouraging me throughout the process. I would be completely free of other commitments (college) from the 14th of June on, even though I could start earlier than the official date, and attend the 2019 Wikimedia Hackathon. I am not applying for Outreachy.

I have experience in working with C++, Python, Java, Javascript, HTML and CSS. Growing up, the PC I used ran on Ubuntu all my life, and I have been familiar with the open source culture since I was a kid. As such, I've been a Wikipedian for over 9 years, and I'm mostly active at the Catalan Wikipedia, where I'm a sysop. I have run this bot with the rest of my family, and most recently I worked on the new Main Page for the project, and this associated tool, hosted at Toolforge, which allows users to modify the database that contains the texts which appear on the Main Page. Last year, as a member of Amical Wikimedia, I helped out during the 2018 Wikimedia Hackathon, and it is possible that I will attend this year's hackathon in Prague.

As someone who had been using the Content Translation tool for years, it would be incredibly fulfilling to collaborate on the technical side of this project, and see this proposal come to life. One of the requisites for articles at cawiki to become Featured Articles is to translate images when possible, and since I am one of the only users at cawiki who regularly uses Inkscape, I often end up translating them. Also, cawiki was one of the early adopters of the Content Translation tool, and it would be nice that someone from the community ends up working on the technical side of this tool.

Any Other Info

@Pginer-WMF has proposed a UI/UX in task T206433, the [Epic] from which this proposal spawned. I'll keep working there until the official coding period begins.

Event Timeline

If you would like us to consider your proposal for review, please move it to the submitted column on Google-Summer-of-Code (2019) board.

We need to discuss some details of the proposal and scope of the project.

Original task from which the idea of integrating SVG translation into Content Translation came about (T206433) includes detecting whether SVG has some translatable labels and adding the options to translate image through an external tool - SVG Translate.
On the other hand, the proposal written in this ticket is about integrating the SVG translation UI into Content Translation, placed inside a dialog.

Content Translation has useful features when doing translations which would benefit those translating SVGs, mainly the usage of machine translation, which isn't available in SVG Translate tool.

I don't know about inner workings of SVG Translate tool, maybe it can provide some APIs that make this integration easier, but I doubt that.
Also, Content Translation has plans to go outside of beta, and relying on integration and keeping up to date with tool on labs doesn't sound like a good idea, even if we integrated (parts of) SVG Translate's UI directly into Content Translation. Someone would need to maintain this. There is a point in this timeline stating "Make sure that the Content Translation front-end doesn't become obsolete due to changes in the SVG Translation back-end", but I don't see a clear path in a long run to achieve this.
Whether we integrate a full fledged SVG editor into CX is also not a product decision I should make.

Given that, I'm willing to take a mentorship role for this project if it is narrowed down to original proposal by Content Translation product manager (@Pginer-WMF) - T206433.

With the comments above, I complete the "Review proposals on Phabricator and give feedback" part of this guideline. As for "Point them to self-contained, easy and newcomer-friendly bugs to fix" point, T193132 seems like a good place to start exploring Content Translation.

Let me also mention that I am not attending hackathon in Prague.

@Townie Hello! I tried to reach out to you via IRC but couldn't. We would like to learn what your thoughts are on @Petar.petkovic's comment above to be able to make a decision on your proposal in the next 1-2 days.

Sorry! I thought I had answered it and I completely forgot about it.

@Petar.petkovic When I started working on the timeline, I felt the original proposal was way too small for a three-month project. When I discussed this with fellow Amical members, they thought integrating tools from the WMFLabs onto the MediaWiki interface was the way to go: easier access, especially for newcomers, more visibility, and overall more practical for everyone. I understand that by making this project's scope bigger it becomes more complex, but in my view it makes it more attractive.

My concern is that I may finish @Pginer-WMF's proposal too early. I may find unexpected difficulties during the project which could set me back and take the whole summer, but if it's not the case, I wouldn't like to stay idle. How do you think I could advance after the first goal? How about doing the original proposal during the first phase, and if everything goes well and we feel confident enough, start working in the goals set in this proposal? An alternative could be fixing random bugs, but I'd rather have an ambitious objective.

Let me also mention that I am not attending hackathon in Prague.

What a pity :( If anyone from the CT/ SVG Translate teams are attending, I'd love to meet some time.

I also think what is proposed in T206433: SVG Translate to integrate with Content Translation is too small for 3 month project.
We could start with that initial goal, but I don't know how we could proceed after its completion. Integrating SVG editor is not the idea I support, for reason stated above.

With integrating SVG editor in Content Translation goes product decision @Pginer-WMF should make, not me. He also created the initial proposal (T206433) and might be able to suggest next steps to take after it is done.

@Pginer-WMF Do you have some ideas that can be worked on once SVG translate integration with Content Translation is complete? @Townie If there are any other suggested ideas, would you be willing to work on those?

If there are any other suggested ideas, would you be willing to work on those?

Of course, gladly!

My understanding is that the content translation tool is optimized for the use case where the two translations are independent (e.g. between wikis), where E:Translate is optimized for the use case of translating documents where one of the languages are controlling (e.g. Like software translation). Admittedly I don't follow translation stuff very closely, so I may just be mistaken, but assuming that assumption is correct, may I ask what the rationale is for integrating into ContentTranslation over E:Translate is?

@Pginer-WMF Do you have some ideas that can be worked on once SVG translate integration with Content Translation is complete? @Townie If there are any other suggested ideas, would you be willing to work on those?

I'm happy to share some related areas of work that can help expand the project:

Supporting user corrections in Content translation is an area that forms a small but consistent project and will bring value to users. Considering that machine translation is not always correct, it provides a quick way for users to correct it with an alternative. It can be applied to three areas with an increasing level of complexity (more details in the tickets below), so it gives room to complete as much or as little as time permits:

Another area that may be of interest is user mapping for templates where users can help provide the mapping between templates in different languages so that infoboxes and references can be reliably transferred across languages by Content Translation. This is in an early stage of exploration for the back-end part to define how/where to capture the metadata (T221534). The project would be to build on top of that a user interface that makes it easy to provide such metadata. There are some more details in T222059 (but some design aspects are still pending).

@Townie, feel free to comment about your interest on these areas or share any questions you may have. Thanks for your interest in helping to make the project better. @Petar.petkovic, feel free to share any thoughts about the technical complexity of the proposals, how they fit in the GSoC schedule or any other question.

My understanding is that the content translation tool is optimized for the use case where the two translations are independent (e.g. between wikis), where E:Translate is optimized for the use case of translating documents where one of the languages are controlling (e.g. Like software translation). Admittedly I don't follow translation stuff very closely, so I may just be mistaken, but assuming that assumption is correct, may I ask what the rationale is for integrating into ContentTranslation over E:Translate is?

Your assumption is correct. I guess the key aspect is where are SVGs with labels inside more frequent:

  • Content translation is used to translate Wikipedia articles. Many of these with images, and some of those being SVGs with labels. Looking for diagrams and checking where they are used it is easy to find cases where content is provided in a language different of the the wiki (example).
  • Translate Extension is mainly used for the translation of user interface strings. There SVGs with text labels inside are less frequent. The extension is also used for the translation of documentation pages, where this kind of content may be more common. So I think it may be also useful to integrate, but the impact in terms of content and effect on readers may be lower than the previous scenario.

@Pginer-WMF These are really solid projects! I particularly like controlling whether references are translated or not (this error often pops up at ca.wiki), and it could tie in really well with the user mapping for templates. Learning from user corrections is a daunting project, and I doubt we could do that only in a GSoC project.

@Pginer-WMF These are really solid projects! I particularly like controlling whether references are translated or not (this error often pops up at ca.wiki), and it could tie in really well with the user mapping for templates. Learning from user corrections is a daunting project, and I doubt we could do that only in a GSoC project.

Great. Yes, learning from user corrections is the more complex step. It is ok to leave it out, or create a solution that is more limited in scope (e.g., working only for a single word when it appears preceded by the same exact word in the machine translation). In any case, my point is that this line of work can be taken gradually and adapt to the time available. Thanks for your interest in the project!

@Petar.petkovic What do you think of these proposals? Which one would you go for first?

@Petar.petkovic What do you think of these proposals? Which one would you go for first?

First, SVG translate integration should be completed. Next, tasks @Pginer-WMF proposed should be tackled in the order he listed them.

If there isn't anything remaining in your proposal to address, feel free to close this task. Before you do so, make sure your project is listed here https://www.mediawiki.org/wiki/Google_Summer_of_Code/Past_projects#2019 and has the following information: Student name, Mentors, Relevant links and Outcomes (in not more than two lines). Thank you for your participation!

Since GSoC 2019 is over, I will resolve this one. I will keep T206433 open however, because the patch for GSoC project is not yet merged.

Also, I've completed the list on https://www.mediawiki.org/wiki/Google_Summer_of_Code/Past_projects#2019