[8hrs Spike] Propose solutions from the issues identified in mobilefrontend - ENG
Closed, ResolvedPublic

Description

Acceptance criteria

  • Identify and report on the pros and cons of different solution approaches from the issues identified in previous spikes
  • Look at:
    • performance
    • amount of technical debt
    • ease of working with codebase
    • ability to integrate with broader OSS ecosystem
    • ability to work with 3rd party developers and volunteers
    • other

Exercise should be done by

  • Jon
  • Joaquin
  • Baha
  • Sam
  • Piotr
ovasileva triaged this task as High priority.Jan 26 2017, 12:19 PM
ovasileva moved this task from To Triage to Needs Analysis on the Readers-Web-Backlog board.
ovasileva renamed this task from [Spike] Possible solutions - rewrite - ENG to [Spike] Possible solutions - rewrite vs refactor - ENG.Feb 9 2017, 2:07 PM
ovasileva moved this task from Needs Analysis to Upcoming on the Readers-Web-Backlog board.
ovasileva renamed this task from [Spike] Possible solutions - rewrite vs refactor - ENG to [8hrs Spike] Possible solutions - rewrite vs refactor - ENG.Feb 9 2017, 5:40 PM
ovasileva renamed this task from [8hrs Spike] Possible solutions - rewrite vs refactor - ENG to [8hrs Spike] Propose solutions from the issues identified in mobilefrontend - ENG.Feb 15 2017, 6:28 PM
ovasileva updated the task description. (Show Details)

FYI, engineers, I've gone over the first 10 problems that we've identified (in gdoc) and left comments with pros and cons. Please feel free to continue the discussion there. I'll continue with the remaining problems tomorrow.

[Disclaimer: This is an attempt at organizing activity inside my brain. Please assume good faith if anything I've said doesn't make complete sense or is down right wrong :)] I've read through all the problems and tried to group them into what I see as 4 broader problems....

Things we need to prioritise and give attention to.

The problems of Tech debt; responsiveness of site; accessibility; no clear development styles; standardization; no clear product; tests not well organized; No consistent voice/shared definition of high quality, robust software; code coverage is low; Very little documentation; We don't clean up after ourselves seem to come under the same problem area - from my perspective these are all failures in our process.

The only way to solve them, IMO, is to either focus on them and fix them OR accept some of them as not important OR throw everything away and start again.
Of course you could do the latter - and rebuild the site entirely - but in my opinion, then you are only trading tech debt for a new set of problems (e.g. losing features that people are now going to want and losing engineers who previously had expertise in what you are building). The first approach maybe hasn't happened as this kind of work is dull and can disengage certain engineers maybe pushing them away to other companies and projects. A balance has to be struck when solving this.

Let's be realistic. We are a small group of engineers and we've had multiple team members who've left us with different views on the world. The backlog is never going to be empty and we are not going to be able to solve every problem and the code is always going to have its problems.
The important thing for us to do is identify and solve the issues we care about the most.

Limitations of current architecture

Tight coupling to MediaWiki seems to mostly be slowing us down these days. You could argue this is a good thing - maybe clunky, reliable and faithful software should be the goal rather than rapidly changing software. We've tried to control the architecture with concepts such as the targets system (to keep out JS code) but arguably that might not be enough.

This leads to problems such as Being constrained to a one-fits all solution across all connection speeds, browsers and locations; lack of well-known frontend framework; PHP code is messy; some limitations in tooling; radical changes are impossible or extremely hard (e.g. introduce service workers) as they break this stability model; code sharing is hard

An alternative solution is to break out that architecture but that will come with risk.
I think this would be a worthy activity if we can run it as an experiment on a single wiki for a specific enhancement that would be tricky in the existing stack (e.g. Service Workers).

Yes. The current site is slow and would be faster if we lazy loaded content and pre-cached the application shell, something our existing architecture makes difficult/impossible. Service workers are clearly the answer here and a cheap first step would be to generate HTML using service workers (as is currently being proposed for the Discovery portal and pushed by GWicke). The risk here is low, but my hunch is that this won't feel snappy enough without us using ajax to load content a la mobile apps.

We'd need to demonstrate the advantages of such an approach; be disciplined about the time we spend on it; justify the trade-offs; demonstrate that it works at scale for many users.

Alternatively, we need to invest more in the RFC process to make change the norm rather than the exception.

Building things in both PHP and JS will always cause us problems. One extreme is to drop non-JS support and do everything in JavaScript. Gasp.

Does mediawiki slow us down?
It's hard to say whether this is MediaWiki or the fact that we have to support millions of users and the quality bar is a little higher than average. The deployment schedule also slows us down too or at least makes us more timid about putting things out there as we can't fix something 1 hr after deployment with another deployment. Unlike a lot of startups deployment is hard, but on the plus side this forces us to be disciplined by working in the constraints.

Of course building a web app on github is going to be quicker and easier as it doesn't need to scale.

Writing things twice is certainly an issue that we can address by writing in one language either at the .

We don't have complete control of a complex ecosystem

This is a tough one. The existence of templates makes it near impossible to replicate production locally unless you test on production data e.g. use the mobile content service. Templates are not going away and it would be wrong to think that we can ever tame them and completely control the experience 100% of the time nor would we want to. There's too many templates out there, too many niche concepts, with one designer we'd be overwhelmed. Instead template editors need to be engaged and we need to help them think about mobile ... even better make it impossible for them to not think about mobile.

Likewise, a developer can merge a patch that impacts our product directly e.g. increase CSS loaded in head.

I believe this helps Wikipedia scale at a small cost and is an acceptable trade off.

Even the apps are not safe from this as we've seen in countless occasions. This is the world we live in, a world where we don't have control.
Have we ever thought of *running a mobile specific workshop targeting and sharing knowledge with editors about how to improve templates on mobile*? Could be something at Wikimania for example.

Attracting new contributor developers

The problem with this is we are guessing a lot. We might believe a nicer stack would attract more developers but that's not necessarily true.

We could get better answers by adding a link to a survey in the JavaScript console. This would give us a much better idea of why people don't contribute to Wikimedia - whether it's because they don't have the time; don't like the stack or didn't know they could. We could also run a similar survey at hackathons.

This requires support from community engagement (specifically targeting engineers).

I should note I've focused on the larger problems as I think the smaller problems such as "what to do with Minerva desktop", "code coverage is low", "the backlog is too big" can easily be resolved with conversation (I also don't agree with many of the assumptions that some of them are in fact problems).

I've finished leaving comments in gdoc. I chose to not summarize the points here in the interest of not loosing important ones (may not seem important too everyone, but may be in fact really important). It's also nicer to scrutinize each point in the comments of the document. We can have a real conversation going there. Once we come to an agreement about each point we can easily summarize the winning points.

phuedx updated the task description. (Show Details)Mar 15 2017, 4:25 PM
phuedx added a subscriber: phuedx.
Jdlrobson moved this task from To Do to Doing on the Reading-Web-Sprint-94 board.Mar 20 2017, 7:45 PM
Jdlrobson assigned this task to ovasileva.

We talked about this during the offsite.

We spoke about this problem and I think almost everything is already said. The biggest problems are related to technical debt and current MediaWiki architecture. I see a trend - "Let's use restbase, our life will be better" but somehow I don't feel like the restbase is a cure for all problems. It's not stress tested yet, lots of stuff is still missing and we have to implement it anyway, just in JS, not PHP.

First step: Alignment

The first step is to align ourselves with a set of expectations and rules how to write code, how to deal with tech debt and how to write a good extension that is not highly coupled with MW. We're getting much better but still there are some areas to improve. We could use tech lead time to guide us through conventions and update, I also noticed that we already had something similar, called Dev sessions. We have a pretty good process, we need to be more communicative and transparent with each other, the pair programming could help.

Second step: scope reduction

The second step, instead of trying to fix a monster lets split MobileFronted into smaller bits, for example

  • extract SkinMinerva to standalone extension
  • extract MobileParser to an extension
  • pull mobile domain/auth stuff into MediaWiki

Currently, MobileFrontend is one huge piece that handles too many things at once, it's not standalone app but it's built around MediaWiki, once MediaWiki changes we have to adjust, coupling between MobileFrontend and MediaWiki is so high that it is not possible to change things quickly as most of them are applied via complicated logic inside hooks/some other MW quirks.

Third step: improve, drop all dead/unnecessary code, repeat

I'm against rewriting as it always takes longer than expected (chasing moving target), there is no direct benefit to the user and time devoted to rewriting isn't use to improve existing feature set.
IMHO the third step is an ongoing process where we try to make our life easier and slowly improve code quality, adjust after each loop. Before each loop, we have to check the document and ask ourselves - what is the worst thing right now? what causes the most pain? Pick it and fix it.
It's difficult to set strict roadmap as we have many issues related to MobileFrontend codebase, maybe after 3-4 sprints focused on improving existing code quality is enough? Maybe we want to drop MF support and focus on the new progressive web app?

  • Someone said "let's remove all FIXME", is it going to make our life better? Not sure, for some changes probably, for other not really, probably we can leave some FIXMEs as they don't hurt us until we have to modify current behavior. I don't see a reason in spending time on something if the outcome is just "better code". The key to success is not perfection. It's making peace with the phrase "Good enough" and moving on.
  • other person said "let's create better tools" - is going to make life better? again not sure, for some changes we already have proper tooling. If we find we struggle with something we can automate, definitely let's write a tool and this should be in our lazy developers' blood. You don't want to do same things more than 2 times if you're doing the same thing for the second or even third time - it means time to automate that, no excuses.

What I'm trying to say, most of our troubles happens in on projects across all organizations and companies. If someone doesn't care enough about code quality - the code will rot, and even if spend 10 sprints or even a year and improve everything after some time it will rot again. There is no "immediate cure" that would ease the pain of software development, only a shared vision and shared code quality norms/conventions plus proper process can help us tackle MF technical debt and keep it nice&clean. It's ongoing process, not a one-time thing.

pmiazga updated the task description. (Show Details)Apr 5 2017, 12:38 AM
ovasileva closed this task as Resolved.Apr 5 2017, 5:04 PM

We spoke about this problem and I think almost everything is already said. The biggest problems are related to technical debt and current MediaWiki architecture. I see a trend - "Let's use restbase, our life will be better" but somehow I don't feel like the restbase is a cure for all problems. It's not stress tested yet...

I can't speak to what was discussed at the offsite, but I think your trend is an unfair characterisation of some of the arguments put forward for using RESTBase ๐Ÿ˜‰ However, I think we all agree on the need for alignment and continuous improvement on the codebase.

What I really want to highlight is that Page Previews makes roughly 2.4k requests per second to RESTBase at peak time. This number will be going up within three weeks.