Page MenuHomePhabricator

[EPIC] (Proposal) Replicate core OCG features and sunset OCG service
Closed, ResolvedPublic

Description

Updated timeline

August - September 2017

October, 2017

The following is a proposal, pending the outcome of a consultation with the wikimedia communities: https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality (and previously https://www.mediawiki.org/wiki/Reading/Web/PDF_Rendering)

Summary

We would like to create a plan that results in there only being one service for rendering pdfs. OCG was not aging well, and rather than support two solutions, the foundation needs to focus on one, which will allow for better user experience development and maintenance.

Goal Visibility

This represents a response to community interest in an improved pdf solution, demonstrated in Wikimedia Deutschland's TCB team's wishlist (T135643), and necessitated by their work. Investing in pdf's is strategically aligned with WMF's "New Readers" project. The New Readers project is product's program #3, Goal 3 on the 2016-2017 annual plan as they have identified offline access as 1 of 3 barriers to Wikipedia access that they are looking to solve for.

Rationale

As documented here, in the comment by @cscott (https://en.wikipedia.org/wiki/Wikipedia_talk:Offline_Content_Generator) and various other places, OCG was built quickly to replace code by an outside organization, PediaPress, and has had scaling and architectural issues.

The OCG service does the following:

  1. converts wikitext pages to latex-formatted-pdf and plain text. In the past, it has also supported zim, epub and possibly more
  2. per above, applies an attractive layout where the print.css has not provided an attractive option
  3. when integrated with the collection extension, collates articles selected by a user into books + creates a table of contents

OCG is currently not well supported by the WMF and there are difficulties with Latex that have disabled table rendering in pdfs. Latex is a fairly brittle framework which is not well-suited to our incredibly flexible content-types. Furthermore, bugs in OCG or the Collection extension have greatly diminished the 3rd use of OCG (creating books).

There was significant desire from the community to provide a Latex alternative for single-article PDF rendering (captured here T135643) and we are doing this via a new service called Electron. Some of the decision making around Electron is captured here (T134205). Development and implementation of this service is currently underway. The WMF’s Operations team has strong reservations about running both services at once, given the heavy overlap in functionality, particularly when one of them is not well supported. They have kindly, and rightly asked us to transfer remaining features from OCG to electron and sunset it.

Success Metrics

  • Readers and contributors do not lose essential or popular functionality
  • Operations only has one service to support

External Dependencies

this will take effort from Wikimedia Deutschland TCB and the following WMF teams:

  • reading infrastructure
  • reading web
  • community liasions
  • services
  • operations

Unknowns

  • we have a solid overview for what readers needs are around pdf creation, but lack nuance and edge cases

EPIC Plan

  1. Stage 1, in parallel, Dec - March, 2017
    • turn on electron alternative to OCG to allow tables in pdfs, per community wishlist #9 (T135643)
    • improve print CSS so that default pdf's are more attractive (T135022#2672465)
    • measure user preference for new v. old pdfs (T150326)
    • introduce community to the implications of proposal and ask for feedback (T146757)
  2. Stage 2, April - May, 2017
    • replicate collation of articles into a single pdf within "book creator" using Electron to replicate core missing functionality (pending)
    • identify missing OCG uses that we have missed via community consultation (T146757),
  3. Stage 3, May - Jul, 2017
    • act on above results
    • communicate sunsetting (an announcement following the consultation in the earlier stage)
  4. Stage 4, August 2017
    • retire OCG service

Probable drawbacks

  • currently there are no plans to continue to support two-column layout favored by Latex
  • currently there are no plans to continue to support plain-text conversion, epub or zim (currently not supported by OCG)

Metrics Implementation

  • Current usage of API's is <1% of pageviews, which we consider significant
    • measure user preference for new v. old pdfs (T150326)
  • Current usage of book creator is very limited

Updated timeline

August - September 2017

October, 2017

Delivery Estimate

September 30th is our deadline for turning off OCG

Related Objects

StatusSubtypeAssignedTask
Resolved JKatzWMF
Resolved Nirzar
ResolvedJdlrobson
Resolved Nirzar
ResolvedJdlrobson
Resolved Nirzar
Resolved Nirzar
ResolvedJdlrobson
Resolved Nirzar
Resolvedovasileva
ResolvedNone
ResolvedJohan
Resolved Moushira
ResolvedJohan
ResolvedJohan
InvalidJohan
ResolvedJohan
ResolvedNone
ResolvedWMDE-Fisch
ResolvedAddshore
InvalidNone
InvalidNone
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
Resolvedgabriel-wmde
ResolvedAddshore
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
DeclinedNone
ResolvedTobi_WMDE_SW
Resolved GWicke
Resolved mobrovac
Resolved Pchelolo
Resolved mobrovac
Resolved dpatrick
Resolved dpatrick
Resolved Lea_WMDE
ResolvedAddshore
ResolvedTheDJ
DuplicateNone
Resolved JKatzWMF
DeclinedNone
Resolvedpmiazga
InvalidNone
ResolvedSpikephuedx
ResolvedAklapper
ResolvedABorbaWMF
InvalidNone
InvalidNone
InvalidNone
Resolvedphuedx
Resolvedovasileva
Resolvedovasileva
Resolvedovasileva
Resolvedovasileva
Resolvedphuedx
ResolvedJoe
Resolved Cmjohnson
ResolvedMarcoAurelio
DuplicateNone
InvalidNone
InvalidNone
InvalidNone
ResolvedPRODUCTION ERRORphuedx
InvalidJdlrobson
ResolvedNone
InvalidJdlrobson
ResolvedJdlrobson
StalledNone
InvalidNone
DuplicateNone
InvalidNone
InvalidNone
DeclinedNone
InvalidNone
InvalidNone
Resolved bmansurov
Invalidovasileva
Resolvedovasileva
Resolvedpmiazga
Resolvedphuedx
Resolvedovasileva
Invalidovasileva
Resolvedphuedx
Resolvedphuedx
ResolvedJdlrobson
InvalidNone
Resolvedovasileva
InvalidNone
InvalidNone
InvalidNone
Resolvedovasileva
Resolved dpatrick
InvalidSpikeNone
Resolved Nirzar
Resolved Nirzar
DeclinedJdlrobson
DeclinedNone
Resolved Nirzar
Resolved Nirzar
Resolved Nirzar
Resolvedovasileva
ResolvedABorbaWMF
Resolvedovasileva
DeclinedNone
ResolvedABorbaWMF
ResolvedJdlrobson
DuplicateNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This approach seems backwards. Shouldn't we have a community consultation first, before making technology choices and putting resources into feature development? At least PDF-only vs. multiformat support needs to be known as early as possible.

@cscott an email bump caused me to just see your comment now- I apologize for the delay. I agree with you that up front communication is prefered and changed the description to be clearer that the final communication mentioned is simply the last communication (i.e. this thing we've been talking about is happening in 2 weeks). The primary consultation is intended to take place in April or May.

To your and @Tgr's broader point about starting with a consultation and then determining the path, I agree that it is not ideal. A consultation would be the ideal way to start something like this. I think we can try to move consultation earlier in the process, but think some of the assumptions should continue to operate for now.

So I want to (for those interested) give an explanation for why we are presuming a technical direction rather than starting with a consultation.

Given the work that Wikimedia Germany is doing with Electron (T135643), a new service was presupposed and our own operations team wanted to see a plan for migrating from OCG to Electron in order to operate sustainably. @Tgr and you had earlier brought up the options of repairing rather than sunsetting OCG. I can't comment on the merit of one v. the other, but given the development of Electron, the history of OCG, its current state, and the confusion around it, it seemed wise to decouple some of its features at the very least. Someone might find fault with any one of the reasons, but the aggregate is sufficient in my mind.

Setting up and conducting a consultation takes time and coordination. Your help would be appreciated! We know that better print.css and collation are necessary elements for replication so I figured we would start there and then identify where the other needs were. I made some assumptions about what would and wouldn't be necessary in order to generate this plan, and included consultation to check those assumptions.

Had we pounced on this in February when it first came up, I think it would have been possible to start with a consultation, and the blame for not addressing this rests with me.

JKatzWMF renamed this task from [EPIC] Replicate core OCG features and sunset OCG service to [EPIC] (Proposal) Replicate core OCG features and sunset OCG service.Feb 28 2017, 5:45 PM
JKatzWMF updated the task description. (Show Details)

Stage 4, August 2017

  • retire OCG service

Just a note from T129142: Deploy ocg with scap3:

We (RelEng, Ops, and Services) need OCG gone (from our servers) or migrated to scap3 (from trebuchet) by end of this quarter (Q1) as Ops will be removing the underlying technology (Salt) at that time.

How are things looking? :)

@ovasileva @phuedx, could you update this task with your current estimate for OCG's sunsetting?

@GWicke - currently, the estimate for OCG replacement is by the end of September, however, the full work for the replacement service (implementing the post-processing step) will require an extra month or so.

@ovasileva, thank you for the update. Does this mean that OCG will be switched off by the end of September, or end of October?

ovasileva added a subscriber: faidon.

@GWicke - timeline now updated in task description. OCG switching will be done by the end of September with the post-processing portion being completed immediately afterwards. @faidon - in terms of switching off OCG, are we missing any tasks to be created for the actual sunsetting?

Thanks for the update & clarity on the timeline, @ovasileva! It is much appreciated.

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

I'm not attached to OCG, but https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality#Update_September_2017 and https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality#Functionality_available_in_October.2C_2017 fail to state what's the advantage of switching from OCG to the new PDF creator at this point. It's also not clear whether the hard requirements for production use (such as correct copyleft attribution) have already been met.

I'm not attached to OCG, but https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality#Update_September_2017 and https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality#Functionality_available_in_October.2C_2017 fail to state what's the advantage of switching from OCG to the new PDF creator at this point. It's also not clear whether the hard requirements for production use (such as correct copyleft attribution) have already been met.

No changes to licensing and attribution information will be made outside of styling.

Can somebody give me a short status description of this project? On "Reading/Web/PDF_Functionality" [1] it says "Release of new PDF renderer – Jan, 2018" and "We should know more in early February". Is there any prototype available to the public yet?

[1] https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality#Later_addendum:_Turning_PDF_book_rendering_OFF_for_the_short_term

TheDJ subscribed.

I think this can be closed as OCG is clearly no longer running (wether or not it's replacement was feature complete)