[EPIC] (Proposal) Replicate core OCG features and sunset OCG service
Open, Needs TriagePublic

Description

Updated timeline

August - September 2017

October, 2017

The following is a proposal, pending the outcome of a consultation with the wikimedia communities: https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality (and previously https://www.mediawiki.org/wiki/Reading/Web/PDF_Rendering)

Summary

We would like to create a plan that results in there only being one service for rendering pdfs. OCG was not aging well, and rather than support two solutions, the foundation needs to focus on one, which will allow for better user experience development and maintenance.

Goal Visibility

This represents a response to community interest in an improved pdf solution, demonstrated in Wikimedia Deutschland's TCB team's wishlist (T135643), and necessitated by their work. Investing in pdf's is strategically aligned with WMF's "New Readers" project. The New Readers project is product's program #3, Goal 3 on the 2016-2017 annual plan as they have identified offline access as 1 of 3 barriers to Wikipedia access that they are looking to solve for.

Rationale

As documented here, in the comment by @cscott (https://en.wikipedia.org/wiki/Wikipedia_talk:Offline_Content_Generator) and various other places, OCG was built quickly to replace code by an outside organization, PediaPress, and has had scaling and architectural issues.

The OCG service does the following:

  1. converts wikitext pages to latex-formatted-pdf and plain text. In the past, it has also supported zim, epub and possibly more
  2. per above, applies an attractive layout where the print.css has not provided an attractive option
  3. when integrated with the collection extension, collates articles selected by a user into books + creates a table of contents

OCG is currently not well supported by the WMF and there are difficulties with Latex that have disabled table rendering in pdfs. Latex is a fairly brittle framework which is not well-suited to our incredibly flexible content-types. Furthermore, bugs in OCG or the Collection extension have greatly diminished the 3rd use of OCG (creating books).

There was significant desire from the community to provide a Latex alternative for single-article PDF rendering (captured here T135643) and we are doing this via a new service called Electron. Some of the decision making around Electron is captured here (T134205). Development and implementation of this service is currently underway. The WMF’s Operations team has strong reservations about running both services at once, given the heavy overlap in functionality, particularly when one of them is not well supported. They have kindly, and rightly asked us to transfer remaining features from OCG to electron and sunset it.

Success Metrics

  • Readers and contributors do not lose essential or popular functionality
  • Operations only has one service to support

External Dependencies

this will take effort from Wikimedia Deutschland TCB and the following WMF teams:

  • reading infrastructure
  • reading web
  • community liasions
  • services
  • operations

Unknowns

  • we have a solid overview for what readers needs are around pdf creation, but lack nuance and edge cases

EPIC Plan

  1. Stage 1, in parallel, Dec - March, 2017
    • turn on electron alternative to OCG to allow tables in pdfs, per community wishlist #9 (T135643)
    • improve print CSS so that default pdf's are more attractive (T135022#2672465)
    • measure user preference for new v. old pdfs (T150326)
    • introduce community to the implications of proposal and ask for feedback (T146757)
  2. Stage 2, April - May, 2017
    • replicate collation of articles into a single pdf within "book creator" using Electron to replicate core missing functionality (pending)
    • identify missing OCG uses that we have missed via community consultation (T146757),
  3. Stage 3, May - Jul, 2017
    • act on above results
    • communicate sunsetting (an announcement following the consultation in the earlier stage)
  4. Stage 4, August 2017
    • retire OCG service

Probable drawbacks

  • currently there are no plans to continue to support two-column layout favored by Latex
  • currently there are no plans to continue to support plain-text conversion, epub or zim (currently not supported by OCG)

Metrics Implementation

  • Current usage of API's is <1% of pageviews, which we consider significant
    • measure user preference for new v. old pdfs (T150326)
  • Current usage of book creator is very limited

Updated timeline

August - September 2017

October, 2017

Delivery Estimate

September 30th is our deadline for turning off OCG

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 16 2016, 5:06 PM
Kghbln added a subscriber: Kghbln.Nov 16 2016, 5:09 PM
Krenair updated the task description. (Show Details)Nov 16 2016, 5:10 PM
JKatzWMF updated the task description. (Show Details)Nov 16 2016, 5:30 PM
GWicke updated the task description. (Show Details)Nov 16 2016, 6:16 PM
GWicke added a subscriber: GWicke.Nov 16 2016, 6:18 PM

@JKatzWMF, corrected a minor point about OCG's origin: It was developed by WMF to replace third party pediapress code. There was a lot of time pressure during the development process, as the pediapress code was the last thing running in the tampa datacenter (blocking its decommissioning), and was considered impossible to move.

@GWicke Thanks!

Other corrections and input would be very welcome and the earlier the better.

Why "Identify and communicate sunsetting" a mere month before the service is scheduled to be turned off. It seems like if this task is the committed-to goal of the foundation, we should communicate that *now*, not wait until the last minute.

CC @Kelson and others who have worked on zim and epub support, and @Tbayer who has found plaintext output handy in the past.

Tgr added a subscriber: Tgr.Nov 17 2016, 7:53 AM
Boshomi added a subscriber: Boshomi.
Bernd_Gross removed a subscriber: Bernd_Gross.
Bernd_Gross added a subscriber: Bernd_Gross.
Tgr added a comment.Nov 23 2016, 3:10 AM

This approach seems backwards. Shouldn't we have a community consultation first, before making technology choices and putting resources into feature development? At least PDF-only vs. multiformat support needs to be known as early as possible.

JKatzWMF updated the task description. (Show Details)Nov 23 2016, 5:07 PM

@cscott an email bump caused me to just see your comment now- I apologize for the delay. I agree with you that up front communication is prefered and changed the description to be clearer that the final communication mentioned is simply the last communication (i.e. this thing we've been talking about is happening in 2 weeks). The primary consultation is intended to take place in April or May.

To your and @Tgr's broader point about starting with a consultation and then determining the path, I agree that it is not ideal. A consultation would be the ideal way to start something like this. I think we can try to move consultation earlier in the process, but think some of the assumptions should continue to operate for now.

So I want to (for those interested) give an explanation for why we are presuming a technical direction rather than starting with a consultation.

Given the work that Wikimedia Germany is doing with Electron (T135643), a new service was presupposed and our own operations team wanted to see a plan for migrating from OCG to Electron in order to operate sustainably. @Tgr and you had earlier brought up the options of repairing rather than sunsetting OCG. I can't comment on the merit of one v. the other, but given the development of Electron, the history of OCG, its current state, and the confusion around it, it seemed wise to decouple some of its features at the very least. Someone might find fault with any one of the reasons, but the aggregate is sufficient in my mind.

Setting up and conducting a consultation takes time and coordination. Your help would be appreciated! We know that better print.css and collation are necessary elements for replication so I figured we would start there and then identify where the other needs were. I made some assumptions about what would and wouldn't be necessary in order to generate this plan, and included consultation to check those assumptions.

Had we pounced on this in February when it first came up, I think it would have been possible to start with a consultation, and the blame for not addressing this rests with me.

JKatzWMF updated the task description. (Show Details)Jan 10 2017, 5:22 PM
JKatzWMF renamed this task from [EPIC] Replicate core OCG features and sunset OCG service to [EPIC] (Proposal) Replicate core OCG features and sunset OCG service.Feb 28 2017, 5:45 PM
JKatzWMF updated the task description. (Show Details)

exploratory testing of PDF creation OCG vs Electron: https://phabricator.wikimedia.org/T163287#3301632

Ljonka added a subscriber: Ljonka.May 31 2017, 11:47 AM
Ljonka removed a subscriber: Ljonka.
phuedx added a subscriber: phuedx.Aug 8 2017, 3:16 PM
greg added a subscriber: greg.Aug 16 2017, 6:02 PM

Stage 4, August 2017

  • retire OCG service

Just a note from T129142: Deploy ocg with scap3:

We (RelEng, Ops, and Services) need OCG gone (from our servers) or migrated to scap3 (from trebuchet) by end of this quarter (Q1) as Ops will be removing the underlying technology (Salt) at that time.

How are things looking? :)

@ovasileva @phuedx, could you update this task with your current estimate for OCG's sunsetting?

@GWicke - currently, the estimate for OCG replacement is by the end of September, however, the full work for the replacement service (implementing the post-processing step) will require an extra month or so.

@ovasileva, thank you for the update. Does this mean that OCG will be switched off by the end of September, or end of October?

ovasileva updated the task description. (Show Details)Tue, Aug 29, 2:32 PM
ovasileva added a subscriber: faidon.

@GWicke - timeline now updated in task description. OCG switching will be done by the end of September with the post-processing portion being completed immediately afterwards. @faidon - in terms of switching off OCG, are we missing any tasks to be created for the actual sunsetting?

Aklapper updated the task description. (Show Details)Tue, Sep 5, 8:33 AM

Thanks for the update & clarity on the timeline, @ovasileva! It is much appreciated.

ovasileva updated the task description. (Show Details)Wed, Sep 6, 10:13 AM

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.