Tue, Mar 10
okay, hotfix made it to production branch. dropped the local changes on production host, pulled, and redeployed the stack. Production is looking happy. Staging rebase will be another day.
before making that change, I tried a stack redeploy, some manual network dis/reconnects, package updates, and a reboot.
it's not clear why the alias name spontaneously stopped resolving on both staging and production hosts.
hotfix posted to master. I was able to work around this issue by using the name entry used internally by the docker network manager, eg. s/db/tasks.production_db/g where production is the stack name and db is the service alias
I'll stash the change on prod and pull again once it makes it to the production branch. Will do the same for staging as part of the upcoming rebase.
Feb 28 2020
the should-be fix has been merged to staging. Should be live on the staging site upon deploy of latest staging image
merged into staging.
Feb 27 2020
and now we've got Editor.ignore_wp_blocks to force Editor.wp_valid to true if blocks are the reason it's false.
Now that we've gone through all of the bundle launch blockers, I decided to see if I could do something simpler than what I had initially planned here.
This PR should now have global block checks working (they were broken previously).
The way in which the central auth extension presents its data means that we don't even have to make an extra query after all. We're just getting back a larger response from a request we're already having to make.
estimate 25 hours of work from here
Feb 25 2020
PR in progress
Feb 14 2020
there are statistics available for download via the ezproxy reporting interface. The login details are in 1password
Feb 11 2020
okay, tracked down the query for global blocks:
It returns some really nice contextual information, such as block duration and reason. I'm going to store concatenated json of just the records that contain blocks. It's also really fast, so I think we can just let it stay synchronous after all.
ah, yeah I can see that we already have some bundle info in the about page.
The signal in this pr posts a comment to the application indicating that the partner is now part of the bundle. We discussed adding a link to the 'my collections' page which makes sense for bundle eligible editors. Should we be sending them to the terms page (or somewhere else) if they are ineligible for bundle at the time?
as a sidenote, this doesn't try to do anything clever in the case of open apps from editors who have deleted their data/accounts. Not sure if that is even a case that needs to be accounted for, but I thought I'd mention that I haven't even looked.
Feb 10 2020
note that the current code doesn't actually update block status: as we only checking it against the oauth information, which is updated at user login. Related questions in T234552 comments
in the current code on staging, users marked as blocked from the oauth process have no way to access bundle content.
this code has been written and is live on staging. awaiting feedback.
As I was looking at this, I realized that we need to verify that our current check for blocks is actually a global check, not just checking for blocks on meta. Assuming we have/get a global block check in place, allowing overrides might turn out to be tricky. I'm imagining implementing this as a per-editor flag to not enforce the blocked status check for validity.
Feb 3 2020
Okay, I have the eligibility as described here implemented:
Per our off-thread conversations, we're not going to merge this to directly to master. Rather, I've force pushed this branch to staging, against which we will do dependent bundle-related PRs. Once all bundle related work is merged into staging and happy, we'll do a staging -> master PR for all bundle functionality.
Jan 14 2020
we need separate props for each requirement, account age, total edit count, not blocked and 30 day edit count
Dec 12 2019
@HAKSOAT what version of Ubuntu are you running? Also, can you verify if your computer has a 64-bit AMD/Intel processor?
Nov 20 2019
Since the terms agreement is monolithic, we probably shouldn't allow authenticated users who haven't agreed to do anything other than agree, delete their data, or log out.
Nov 19 2019
we're now on python 3.
Python 3 upgrade is merged in and wending its way through the deployment pipeline.
Nov 18 2019
Okay, fix deployed. I was able to access rock's backpages. If you run into the error message, try using that logout link and then try again. Let me know how it goes.
I verified that this is an EZProxy configuration issue on our end. Working on fix.
Hmm, I thought there was another error message being discussed, but I can see this is it. I don't think this is an issue with RBP, I think it's an issue with us
So, just to pitch in a little additional info:
Nov 15 2019
good deal. For the record, the error definitely didn't have anything to do with your account. For whatever reason, the production system never picked up on the additional proxy configuration details that were added some weeks ago. We had do delete the docker stack and re-add it for the configuration to be available.
@Nikkimaria This issue *should* be resolved. I verified that I was able to apply for and access MIT via proxy. Could you please verify?
Nov 12 2019
Nov 8 2019
Work is underway:
Nov 7 2019
I'm going to close this issue out. The underlying cause (that we can't complete the oauth process when it gets interrupted) isn't really resolvable, but we are now doing something more useful than letting an exception whiz by uncaught and sending users to a generic 500 error.
Verified that we no longer dump users out to a 500 error in this circumstance. Now we throw a 403 with a warning message:
It *looks* like this is from a user trying to complete an oauth process that got interrupted midway through. That can happen a few ways:
- halt and continue (that is hitting 'x' during oauth, then going to the address bar and hitting enter)
- hitting reload during the login process.
- having a network disruption during login and then trying to continue
@AVasanth_WMF It looks like this is impacting you directly? Does this only happen intermittently? I set my client settings to be as similar to yours as possible (language set to en-GB, tried both WMF and personal handles) and was not able to reproduce. Clearly a problem is occurring as evidenced by your reports and the error logs. If the issue is intermittent, I suspect that it's due to meta not completing the oauth process occasionally, or perhaps a network interruption. Your browser is what makes the requests between the Oauth IdP on meta and the Oauth SP on twl (eg. there is no backend server <-> server communication for this part of the process). Do you have any interesting browser configuration that you could turn off for a while? or have you been experiencing intermittent network problems?
Nov 4 2019
hopeful fix deployed. Let's see if this goes away.
It looks like we're not getting a request_token from the oauth initialization. Upon researching this further, it looks like there are several ways in which the mediawiki oauth implementation deviates from standards, and it also appears to have some limitations in dealing with unicode data from service providers. It looks like there are situations in which initialization will silently fail for non-english users due to one of these unicode issues. We use the mwoauth package as a shim that works around many of these issues, and it looks like they released an update 5 days ago that might fix this particular issue. The update is currently wending its way through the deployment pipeline
Oct 22 2019
This is complete. The total number of applications without a populated sent_by was 5389.
so, we decided on slack to add a system user.
The steps to resolve this will look like:
There are about 2500 imported sent apps for which the partner has no coordinator. Should I just set sent_by to @Nikkimaria in that case? Or I could create a system/statistical user for these historical cases where we can't know
Oct 21 2019
hotfix deployed, keeping open until we get some verification that it's definitely fixed all known cases.
based on the work I did on T233508 , I think this issue will best be addressed by looping through sent applications and setting sent_by to the latest reviewer (the step that was missing on affected applications), then I can loop through applications with no authorizer and fill it in with the sent_by data as appropriate. That should catch many affected applications and authorizations.
deleted the backfilled auths with this issue
hotfix in the pipeline
Oct 17 2019
So we talked about this in the weekly meeting, and the only need we can think of for having this is that this is a place where coordinators can see applicant email addresses. We talked about adding a show/hide email address toggle in the review and application views to serve that need.
Oct 10 2019
My thought is that we should just show authorizations and not applications at all. This would be implemented after we backfill missing authorizations.
Sep 5 2019
Sep 3 2019
This was a hangover from an accidental encoding regression that occurred when we migrated to docker. It's been resolved with a combination of config, code, and ops.
Aug 27 2019
deployed. Let's wait till users confirm things are working to close this out
Aug 22 2019
Okay, I created a PR with code that seems to resolve the auth limbo issue, but it wouldn't be bad for @Samwalton9 to take a peek at it since it touches auth.
I've got the underlying issue resolved.
The user accounts that encountered this are in a half-created state that in practice will keep the user in authentication purgatory until we make the login process a little more robust or delete the affected accounts.
I'm going to try the coding option first, as we have some other non-operational accounts that were created for statistical purposes, and I don't want to have to try to identify just the right set of strange looking accounts to delete.
Jun 13 2019
still an issue post-dockerization
This is still an outstanding task post-dockerization
This was closed when we moved to docker.
Jan 10 2019
I haven't paid to much attention to this until looking at @AVasanth_WMF 's excellent mockup this morning.
From a spaminess perspective, I'd recommend that we not have a public/anonymous form that sends email.
Jan 8 2019
This has been resolved. It turns out that our underlying email infrastructure changed, but we weren't aware of it. All withheld emails should now be delivered.
Jan 3 2019
fix merged into master
So, while fixing this problem in a branch, I encountered/reproduced the originally reported issue. The scripts that directly use a virtualenv do in fact have the user check thanks to the virualenv_activate script. Some of the wrapper scripts that compose those into useful workflows don't because the test for the "right thing" needs to be a little more complicated.
This is a super simple fix. I'll get it pushed right in.
Dec 4 2018
hotfix in place. verified that things are happy.
pushed hotfix in git. Will get it to live site asap
Oct 31 2018
Yep, we definitely need nfs, that's where we keep a month's worth of nightly backups, which we've occasionally used to roll back a bad data migration or just as a place to keep our state when we throw away a vm and provision a new one.
Oct 19 2018
I'm planning on separating the different bits of our platform into containers running on a small kubernetes cluster, and running the database as a vm outside the cluster. We haven't determined the best way to configure persistent storage (e.g. user uploaded attachments). If we can use a project-wide nfs export (which we only use for backups in twl) that's great. If that's a performance/workload nonstarter, then I'm open to guidance, including just running a container that provides storage, which backs itself up to a project nfs export.
Aug 1 2018
This is done
Jul 19 2018
A few thoughts:
I'm not even sure if we should fix this, since blocking comments is one of the design goals of this addon. @Xover might be interested to know that the makers of 1blocker have also released 1blocker x, which appears to be designed to address this concern for their users.
Jul 17 2018
Yeah, it's because the username is part of a larger block of text that's marked for translation.
Jul 13 2018
The tags were running over the bullets because the bullets weren't in the right place. The background-position css attribute must be explicitly set for CSS janus to modify the position of those images. Default is top left. That issue is now fixed in the dev branch.
Okay, I've got the issue with the empty values resolved so that we're defaulting to English when there's no suitable translation. The biggest issue I now see is that our pipe delimited tag list is running into the blue tag bullet when there are a lot of tags. Working on it.
@Samwalton9 In a dev branch, I've added the partial Arabic translation and fixed up numerous issues that were apparent to me both on the RTL rendering side and the translatewiki "add languages from the translation files" side. I can see that we have the additional issue if we're missing a translation for tags or descriptions, we're not displaying anything, rather than defaulting to the English version.
Jul 11 2018
The fix for number 1 has been pushed to master, along with a fix related to number 2. In this case, we really needed to be testing the management command that creates the reminder email signal, as that's where the problem was. Let's keep this open until we see reminder emails coming down the wire.
Jul 6 2018
I've deployed the fix for item #3.
Jun 29 2018
@Samwalton9 These emails are still scheduled. There are 3 issues that together caused them to stop working without our knowledge.
- I've modified the ListApplicationsView in such a way that the reminder email template no longer works.
- As we've previously identified, we have inadequate test coverage for emails, so the new builds didn't start failing after the change.
- We're not alerting on failed cron tasks, so the system didn't tell us that it wasn't successfully completing its housekeeping as designed.
May 2 2018
looks like I could be doing this more efficiently:
just keep me posted, as it's quite time consuming to reformulate the inserts from a dump into non-destructive updates. Let's try to get any other missing data identified so that we can catch it in one pass.
Apr 30 2018
Issue resolved. there was a new CSRF setting that didn't come into play in local dev and travis because of the lack of SSL there.
It's looking like it related to the fact that we're behind a proxy for the labs infrastructure. The issue doesn't happen in local dev, but it's there for prod and staging. I've verified that csrf is working correctly for non-admin forms.
Apr 16 2018
This change has been pushed to master, and should get picked up at tomorrow's scheduled update.
I suppose we should mark this as closed since I have resolved all related display issues I could produce on iOS. I haven't heard back from the reporting editor, though.
Apr 5 2018
1st cut of the app code is complete. I'm in the process of extending the test suite to accommodate the way the code works now. Right now the test suite creates partners and coordinators in unrelated way. I'll need to designate coordinator/partner relationships in the test suite so that I can separately test the cases of display for a coordinator that should or shouldn't have access to a given bit of content.
In addition to our recent css updates, I added a 400 error template, as that was the most interesting situation I was able to create while trying to break discussions. I tested on a recently acquired iphone and application discussions seem to work as expected. Awaiting feedback from the user that reported the issue.
Apr 3 2018
Yeah, we could probably be using requests a little smarter. It returns the status code of the logged http object...which means that you get a 404 when clicking on a 404 request object. We've got some context on that buried in closed phab task somewhere.
Mar 6 2018
I'm currently evaluating using a report builder of some kind to meet this and similar needs.