Page MenuHomePhabricator

Figure out a way to keep MerlBot running when the HTTP POST loophole is closed
Closed, DeclinedPublic

Description

@Merl runs a java bot that requires Java 1.8 to run with SSL -- which of course is required these days on wikimedia servers. As I understand it, it is difficult to run it under java 1.7, so this task is to list options to provide java 1.8.

More details about the bot: T121020#1867916

Possible options:

  1. backport OpenJDK-8 to Trusty. Takes a nontrivial effort: T121020: Update Java 7 to Java 8, so not possible
  2. jessie exec hosts are not possible due to the lack of SGE packages for jessie
  3. help @Merl to backport code to Java 1.7?
  4. run the bot on k8s/jessie, which has an openjdk-8 backport. @yuvipanda, is this feasible yet?
  5. maybe a temporary private jessie node? No SGE, just SSH in, until k8s is production-ready?

I think the most reasonable options are 4 and 5, but I would like some input from others. They would also mean a lower support level for @Merl, but maybe this is 'good enough' for this use case.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
  1. Put this in its own projects until we can do (4), which is a few months off now.

What features of tools does merlbot use? I think that's an important consideration in figuring out what to do.

Specifically:

  1. Does it use qsub/jsub?
  2. Does it interact with other bots via NFS?
  3. Does it use labsdb?
  4. Does it have a webservice?
  5. Does it store persistant data on NFS?
  6. Does it use Redis?

Are there any plans to support ubuntu 16.04 as sge execution nodes next year? If so the easiest solution would be to keep the ssl force exception for labs until this is available at tools.

I don't know why this is so a big problem because labs is a known internal routed ip range. Maybe this can be discussed by wikimedia internally between the two admin teams for labs and production api servers.

Another easy option would be a special sge execution host running oracle java which is available since ubuntu 12.04. But i don't know if this exception of the open source rule is acceptable.

My Bot uses sge and labsdb replication dbs with user databases. It currently uses redis and catgraph as external tools, but that's not so important. The webservice is not affected by this problem.

Btw: i am running funtoo/gentoo on my two year old laptop and emerging jdk8 with a clean compile cache uses about 6GB of ram, 7 GB temp space, but completes within 40 minutes and not 3.5 hours as mentioned in T121020#1870527

Are there any plans to support ubuntu 16.04 as sge execution nodes next year? If so the easiest solution would be to keep the ssl force exception for labs until this is available at tools.

Unfortunately SGE is unavailable on more recent distributions (including debian jessie, which is what we're switching to).

I don't know why this is so a big problem because labs is a known internal routed ip range. Maybe this can be discussed by wikimedia internally between the two admin teams for labs and production api servers.

@BBlack, would you be able to say something on this?

My Bot uses sge and labsdb replication dbs with user databases. It currently uses redis and catgraph as external tools, but that's not so important. The webservice is not affected by this problem.

Does it use SGE to submit jobs, or does you use SGE to start the bot? In other words, would the bot be able to run in a screen on a seperate host that doesn't have SGE?

Upcoming version of my Commons:WLX Jury Tool will use Play 2.4 which requires Java 8. So I won't be able to run it on toollabs either.

@Merl - Note: T105794#2294355 - deadlines for HTTPS are coming soon. MerlBot continues to be on the shortlist of bots making the highest numbers of insecure requests / week.

I assume we still can't have jessie exec nodes.

Looking at Launchpad, it seems that Ubuntu has openjdk-8 packages that are maintained for Wily (15.10), Xenial (16.04), and Yakkety (16.10), which were last updated about a month ago. Are any of these possible to backport successfully to Trusty?

Regardless - Ping @Merl again - time is running short to come up with solutions here.

See T121020: Update Java 7 to Java 8 for discussion. The lack of official packages for 14.10 and 15.04 suggests to me that backporting to those distributions is nontrivial, and there is no-one in the Tool Labs admin team with enough experience in debian packaging and java to actually get this done if it's more than 'run these 5 commands'.

k8s (option 4) might be possible by now, but it's nontrivial (see e.g. the current grrrit-wm situation). I think @yuvipanda might be able to clarify what is possible and not.

I can definitely make k8s work, but on looking at the scripts merlbot is running, they all make *heavy* use of SGE options specified in the script file (look at the *.qsub files in the merlbot dir), none of which will work on k8s since it is not SGE.

bd808 raised the priority of this task from Medium to High.Jun 2 2016, 9:29 PM
bd808 subscribed.

This bot will begin to break randomly on 2016-06-12 and be completely broken by 2016-07-12 if the insecure POST issue is not handled.

I am a java expert. And i am using a libary which normally needs java 1.8. I rewrote some parts, so that i can use the libary with java 1.7 on labs.
Problem is that this library in this limited rewrite version does not know ssl protocol, but i have added a trigger which adds ssl support when getting a redirect answer from server.
This was quite simply the implement, but rewriting the whole code the have full ssl support with java 1.7 is imo too much unnecessary work.
The first http request of my bot is always a sitematrix request on meta and then a http siteinfo request on local wikis. While doing these first requests the bot changes to https. So when doing the login action later my bot is already using https.

I don't understand why i should investigate so much work although i have a currently working solution and one day when labs changes to java 1.8 no additional work will be needed for full ssl support (exempt replacing the library to original version).

Hi, sorry i am inactive atm and will not be able so access my tools in the next weeks.

My mailbox is full of failed cronjobs reports because of:
/bin/sh: 1: date: not found
/bin/sh: 1: jlocal: not found

@bd808 can you please correct the path in tools.merlbot and tools.merlbot2 crontab, so that these scripts are found again? Hope thats helps to run all my bot scripts again.

And sorry that i am using the advantages of sge. I have no experience in using kubernetes. So please keep already in mind that i will need help to migrate my script to kubernetes. Needed features like dependency trees, limiting parallel execution count and so on are currently administered all by sge.

The current situation is that MerlBot is a highly active bot on dewiki that is implemented with dependencies on both advanced job control SGE features and an unnamed library using Java8 features. SGE is not supported on distributions that Labs has available with Java8 support and vice versa.

The options as I see them are to:

  1. Forwardport SGE to Jessie and add a special exec node just for that combination.
  2. Backport JDK8 to Trusty and add a special exec node just for that combination.
  3. Help @Merl and anyone interested in helping him get up and running with a Labs project that they maintain which can support the combination of needs that MerlBot has.
  4. Help @Merl and anyone interested in helping him replace the JDK8 or SGE dependencies in the project.

Option #1 is realistically off the table based on past investigation.

Option #2 is possible but has a very high maintenance cost for the Cloud-Services team in terms of adding more non-standard software and dedicated hosts. Both types of exceptions are prone to having a higher than normal number of issues in the long term.

Option #3 is not trivial, but could be done. It would leave MerlBot's maintainers to manage a SGE cluster on their own which is may require a significant time investment.

Option #4 is really the best way forward. Moving in that direction requires a body of FLOSS-licensed source code, a reasonable understanding of how the existing bot jobs work, and some volunteers to join the project and help. There's not much help that can be offered in fixing the code today. I cannot find java source code matching the binary jar files in /data/project/merlbot. Maybe it is listed on wiki somewhere that I haven't found?

@Luke081515 I know you have shown interest in the jobs that MerlBot supports in the past. Are you interested in helping find a way to keep the bot up and running?

See T121020: Update Java 7 to Java 8 for discussion. The lack of official packages for 14.10 and 15.04 suggests to me that backporting to those distributions is nontrivial, and there is no-one in the Tool Labs admin team with enough experience in debian packaging and java to actually get this done if it's more than 'run these 5 commands'.

What I saw of the past discussion there, it seemed like it was based on an older backport, it was relatively-unmaintained, and while it was able to be built on trusty it failed a bunch of tests. I was hoping that since there are new packages that look maintained to try backporting, perhaps they actually pass tests?

  1. Put a http: → https: proxy running locally (eg. on localhost) that handles the tls layer with the Wikimedia servers, while Merlbot still only speaks http:

Possible. Assuming Merlbot doesn't have to follow links from the data it receives? The proxy would have to edit the responses in that case...

A generic non-modifying proxy might work if the java code has some generic support for explicitly using a proxy too, e.g. via some env var.

Change 292960 had a related patch set uploaded (by Yuvipanda):
tools: Install jdk8 in trusty nodes

https://gerrit.wikimedia.org/r/292960

A generic non-modifying proxy might work if the java code has some generic support for explicitly using a proxy too, e.g. via some env var.

Java does support this via the -Dhttp.proxyHost=... -Dhttp.ProxyPort=... command line options.

I went ahead with and set an nginx reverse proxy up that is designed to work with this and translate HTTP input to HTTPS requests to the real server. See T137235: Create temporary http -> https reverse proxy for MerlBot for details.

With this running I think that someone can edit all of the /data/project/merlbot/**/*.qsub job control files to include -Dhttp.proxyHost=tools-merlbot-proxy.tools.eqiad.wmflabs -Dhttp.ProxyPort=80 and transparently switch from HTTP POST messages to HTTPS.

bd808 renamed this task from Figure out a way to support java 1.8 on tool labs (Merl's bot) to Figure out a way to keep MerlBot running when the HTTP POST loophole is closed.Jun 8 2016, 3:23 AM
bd808 set Security to None.

(...)
@Luke081515 I know you have shown interest in the jobs that MerlBot supports in the past. Are you interested in helping find a way to keep the bot up and running?

(Sry, I missed the ping). Yep, I'm interested, but I can't help with the code, since i don't have access to it, and not that level of Java knowlage, he has.

Maybe interesting for you: As @Umherirrender pointed out at https://de.wikipedia.org/wiki/Benutzer_Diskussion:Merlissimo#Merlbot_Status_-_Info_und_Bitte_um_Unterst.C3.BCtzung

Merl hatte mal geschrieben (ich finde es aber nicht mehr), das nur jeweils der erste Request (des Tages?) über http ist und anschließend die Urls aus der Sitematrix genommen wird, die alle https sind. Muss nichts heißen und wird wohl nicht beruhigen, aber wenn der erste Request fehlschlägt, ist der Bot vermutlich auch nicht so nutzbar.

which means translated to english, that Merl wrotes some time ago, that actually just the first request from the bot is in http (Actually we don't know first request at which time window, if this is a day, or less), and after that the bot uses the Urls from the sitematrix. That means, that most traffic of the bot is secure (if this information is up to date), but this means also that, if the frist request fails, the bot for this time period fails too.

(...)
@Luke081515 I know you have shown interest in the jobs that MerlBot supports in the past. Are you interested in helping find a way to keep the bot up and running?

(Sry, I missed the ping). Yep, I'm interested, but I can't help with the code, since i don't have access to it, and not that level of Java knowlage, he has.

Since there is no source code on Tool Labs we are pretty much stuck with making changes that are outside the java source until @Merl can come back to help with deeper changes. Hopefully we will be able to keep things limping along with the -Dhttp.proxyHost=tools-merlbot-proxy.tools.eqiad.wmflabs -Dhttp.ProxyPort=80 flags to the JVM.

Maybe interesting for you: As @Umherirrender pointed out at https://de.wikipedia.org/wiki/Benutzer_Diskussion:Merlissimo#Merlbot_Status_-_Info_und_Bitte_um_Unterst.C3.BCtzung

Merl hatte mal geschrieben (ich finde es aber nicht mehr), das nur jeweils der erste Request (des Tages?) über http ist und anschließend die Urls aus der Sitematrix genommen wird, die alle https sind. Muss nichts heißen und wird wohl nicht beruhigen, aber wenn der erste Request fehlschlägt, ist der Bot vermutlich auch nicht so nutzbar.

which means translated to english, that Merl wrotes some time ago, that actually just the first request from the bot is in http (Actually we don't know first request at which time window, if this is a day, or less), and after that the bot uses the Urls from the sitematrix. That means, that most traffic of the bot is secure (if this information is up to date), but this means also that, if the frist request fails, the bot for this time period fails too.

I had seen this claim before on another ticket. In the last 24 hours processes associated with the MerlBot account on-wiki have made 3391 insecure HTTP requests to the Action API. I don't have numbers for the total count of requests made in the same period as the logging I'm looking at is only for requests that have the https-expected warning attached.

I have been in contact with @Bmueller and others in the dewiki community about how to proceed with helping @Merl in advance of the 2016-06-12 deadline. We have not been able to contact @Merl directly for various reasons that are reasonable but not pertinent to completing the current task. We have a situation where either the bot is allowed to break or someone takes action without the explicit consent of the bot owner. I have decided to act on behalf of the dewiki community in my role as the Tool Labs developer liaison for Community-Tech and an admin in the Toolforge project and attempt to fix the HTTPS problem in the short term using the proxy solution.

This is not a long term commitment by me or the WMF teams I represent to monitor and support the merlbot tool. @Bmueller and the WMDE-TechWish will continue to work on a longer term plan to help @Merl attract co-maintainers to assist in longer term solutions or to migrate key tasks managed by the bot to other projects. Ideally we can start some broader discussions around T87730: Set up process / criteria for taking over abandoned tools and other norms such as multiple maintainers, FLOSS licensing, versioned software, and maintenance documentation for critical tools as well.

I've done an audit of the *.qsub files in /data/project/merlbot and found that there are 28 job control files using /usr/bin/java ... that will need to be updated. To make tracking the changes and rolling them back if necessary easy I have initialized a local git repository (no remotes) in /data/project/merlbot and added all of the files that are not log output, java jar files, mysql credential files, or dotfiles.

My next step will be to actually edit the 28 qsub control files and add the -Dhttp.proxyHost=tools-merlbot-proxy.tools.eqiad.wmflabs -Dhttp.ProxyPort=80 configuration. Once this is done we can monitor the https-expected warnings related to the Merlbot account to see if there is an improvement.

The job control files have been edited. I have left the changes as uncommitted diffs in /data/project/merlbot making them easy to examine and revert if necessary.

@bd808, thanks for all your efforts and for coming up with an interim solution!

My attempt to intercept and correct HTTP traffic from MerlBot's scripts to the Wikimedia servers has not been successful. Error logs for the jobs are still showing Action API warnings:

api-warning/error(de.mabomuja.mabomwframework.merlbot.lists.A_GetPagesParser): /api/warnings/main->HTTP used when HTTPS was expected

Logs at the proxy do show some activity, but only for access to the http://tools.wmflabs.org/copyvios API. This indicates that the Java configuration is correct but that the library being used for communicating with the MediaWiki API is not using standard java.net.* classes.

Looking at the jar files present, it seems likely that the Apache HttpCore library is being used. This library does support -Dhttp.proxyHost=... settings, but only if the java code [[https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html#d5e485|explicitly installs a SystemDefaultRoutePlanner instance]]. This puts us back at the point of needing access to the source code to fix the bot.

@valhallasw dug up some information that setting an http_proxy environment variable may influence the HttpCore library. I have added #$ -v http_proxy=http://tools-merlbot-proxy.tools.eqiad.wmflabs:80 control comments to the qsub job control files to see if this makes any difference.

@valhallasw dug up some information that setting an http_proxy environment variable may influence the HttpCore library. I have added #$ -v http_proxy=http://tools-merlbot-proxy.tools.eqiad.wmflabs:80 control comments to the qsub job control files to see if this makes any difference.

I've seen more https-expected warnings logged since the env var change, so I'm fairly confident that fix does not work either.

<whine>With source code (and a FLOSS license) I think I could have fixed all of this in an afternoon.</whine>

Why is it not possible to postpone the closing of HTTP POST loophole?

At German Wikipedia MerlBot is a very important bot with lots of scripts and tasks. Unfortunately the owner Merlissimo is not available on medical grounds, but not forever. We have tried very much to contact him, but without success.

So I hope, you are kind enough to postpone the planned closing of HTTP POST loophole for a limited while.

Thank you very much indeed ...

Because Merlbot is a relatively-special case, we've already discussed in some related email threads about leaving a loophole only for this one bot if necessary, and that's most likely what will happen at least temporarily.

It's a slippery slope as a temporary solution, though. There are several other bots that probably won't be ready in time either, and odds are high we'll never actually close this loophole if we wait for them all indefinitely. Only certain kinds of pressure actually cause change in situations like this...

Several of these cases (not just Merlbot) are clear examples of where a stronger FLOSS policy on tools/bots would be beneficial. Locked-up source code with a short list of maintainers with access, who may go dormant while the tool itself is still in critical use, is always going to lead to situations like these over time. We shouldn't have important wiki-maintenance happening with closed-source tools at all, IMHO, for many of the same reasons that the Foundation itself only works with open source code where at all possible.

The 10% random-failure patch went live ~13:00 UTC. Merlbot is temporarily excluded from the 10% (with a match on Merlbot's UA string *and* the eqiad labs instances subnet).

@BBlack, thanks for the info & action, that gives everyone a bit more time. About a stronger FOSS policy on tools/bots - there is already the ticket on how to deal with abandoned tools (T87730), but do we have a more general one somewhere? I think it would be good to start/continue collecting lessons learned and ideas on this issue and go into a broader discussion soon.

@bd808 Concerning the http libary etc MerlBot is using, I found T105794#1486712.

Hi, i have tried to read a lot of discussions to get back to the current state of discussion.
What i really don't understand is why i should not possible to set up a special sge execution host with ubuntu 14.04 and openjdk 8 as ppa.

In T121020#1965525 @valhallasw denied any usage of any PPA because of security concerns. In this special case can you think about a exception? The Package is maintained by Matthias Klose. This guy is very well known in the debian community because he started adding patches to debian during his study in computer science in berlin at the end of the 90s. Since many years he is reposible for security fixes especially for components like gcc/toolchain and java. If you don't trust him you shouldn't trust the complete debian/ubuntu distributions, too.

The only difference is that Matthias is now offering openjdk8 support for Ubuntu 14.04 directly through a PPA. You would like to see that Canonical is republishing the code on their own repository that was added to official debian repository by Matthias before.

So is it really impossible by your labs policy to make an exemption? Implementing a proxy should be also be possible. I'll check that.

btw: All my code i licensed under MPL.

btw: All my code i licensed under MPL.

Is the code available somewhere? I don't think anyone looking at this issue has been able to find it.

Hi, i have tried to read a lot of discussions to get back to the current state of discussion.
What i really don't understand is why i should not possible to set up a special sge execution host with ubuntu 14.04 and openjdk 8 as ppa.

Maintaining a special exec node is actually a pretty annoying state for system administration and for general grid health.

In T121020#1965525 @valhallasw denied any usage of any PPA because of security concerns. In this special case can you think about a exception? The Package is maintained by Matthias Klose. This guy is very well known in the debian community because he started adding patches to debian during his study in computer science in berlin at the end of the 90s. Since many years he is reposible for security fixes especially for components like gcc/toolchain and java. If you don't trust him you shouldn't trust the complete debian/ubuntu distributions, too.

The only difference is that Matthias is now offering openjdk8 support for Ubuntu 14.04 directly through a PPA. You would like to see that Canonical is republishing the code on their own repository that was added to official debian repository by Matthias before.

So is it really impossible by your labs policy to make an exemption?

The patch at https://gerrit.wikimedia.org/r/292960 proposes instead to install JDK8 on all bastion and exec hosts. We are not planning on using the PPA directly but instead compiling and hosting the package locally based on the PPA's provided source. The techops team is already tracking security issues in JDK8 for WMF production servers which makes this easier to deal with than it would be for an arbitrary software package.

The JDK8 option was discarded as a quick fix on the presumption that code changes and/or recompilation would be needed to use the full featured library mentioned in T105794#1574563.

Implementing a proxy should be also be possible. I'll check that.

See T121279#2371978 for notes on what you will need to do to make Apache HttpCore honor the -Dhttp.proxyHost=... options that have already been added to all of the qsub files.

btw: All my code i licensed under MPL.

Excellent. Is the source published somewhere as well?

Change 292960 abandoned by Yuvipanda:
tools: Install jdk8 in trusty nodes

Reason:
This was solved in other ways for now, not sure what the long term solution is - but if it is this, we can resurrect it.

https://gerrit.wikimedia.org/r/292960

Ubuntu Xenial 16.04 LTS seems to have openjdk-8-jre. Perhaps we can have exec nodes in xenial for this?

Wikimedia as a whole has been moving to Debian for a while, so I do not think we'll support Ubuntu 16.04 ever.

The general insecure access cutoff date is coming up tomorrow. We had exempted Merlbot from the 10% random failure rate a month ago. Has there been any movement on the Merlbot issues in the past two weeks or so?

MerlBot is now down, he is not making edits anymore. How can we resolve the actual situation? I think we should:

  • a) Add an excemption for MerlBot again, OR
  • b) Wait till Merl is reachable again, so that he can interact with us again. While he is offline, we can disable the crontab entrys of the merlbot (with #), so that the jobs are not running, because the bot is using a lot of ressources. I mean it is not a problem to use them, but use a lot of ressources while getting no advanteages by using the ressources is not very good, so my proposal is to just comment out the crontab-entrys, that Merl can, after he is back easily reenable them.

MerlBot is now down, he is not making edits anymore. How can we resolve the actual situation? I think we should:

  • a) Add an excemption for MerlBot again, OR

I don't think that we can reasonably re-introduce an exception for Merlbot. This would set a precedent that we don't want; namely that some bots/tools/api clients are "too big to fail" and that even when >12 months of warning has been given the Wikimedia software and network cannot be changed in breaking ways.

Here's my current proposal for "resolving" this task:

  • Close this ticket with a status of "Declined" since we weren't able to rescue the bot before the POST loophole cutoff.
  • Disable the cron jobs for the merlbot tool account.
  • Open a new ticket about getting the java source updated to either use the http reverse proxy created in T137235: Create temporary http -> https reverse proxy for MerlBot or to find another means to correctly use HTTPS for all communications with the Wikimedia Action API. Any resolution of that task is going to require @Merl to publish the source code so that others can help or to do the work himself. The ideal solution from my point of view would be publishing the source and recruiting some additional maintainers.

@Bmueller and @Lea_WMDE: does this sound reasonable and would WMDE-TechWish be willing to follow up on the new task to fix and re-enable the tool's cron jobs?

I don't think that we can reasonably re-introduce an exception for Merlbot.

I agree as well. Keep in mind Merl first responded to notification about this issue on phab back in July 2015: T105794#1484721 .

I can replace MerlBot's bot actions in a very short time - I am working on this right now.

Java 1.8 is supported on Kubernetes in Tool Labs. If the SGE scripts could be converted or replaced with Kubernetes equivalents, then MerlBot could probably run on that. I recall that not being able to use Java 1.8 was a blocker in getting MerlBot to use HTTPS because it could not use an updated version of a library or additional features provided in Java 1.8 (I might be remembering this wrong. Probably)

Java 1.8 is supported on Kubernetes in Tool Labs. If the SGE scripts could be converted or replaced with Kubernetes equivalents, then MerlBot could probably run on that. I recall that not being able to use Java 1.8 was a blocker in getting MerlBot to use HTTPS because it could not use an updated version of a library or additional features provided in Java 1.8 (I might be remembering this wrong. Probably)

The system of scripts that power the merlbot tool are very heavy users of grid engine features. Porting to run on k8s would be a major code change. We can find a way to make JDK8 available on our OGE nodes, but that won't help without other source level or at the very least packaging changes in the merlbot scripts. See T105794#1486712 where @Merl said "My main problem is that i am using a java http libary which normally would need java 1.8. To keep my bot working with https i added a dirty hack which makes it complicated to also send the initial request using http. After java 1.8 is installed on labs i can change this a lot easier." With source code it should be trivial to install SystemDefaultRoutePlanner for the Apache HttpCore library and get the scripts to use the http->https proxy (T121279#2371978).

@doctaxon, this is great news! If we (wmde WMDE-TechWish) can support you with code reviews please let us know!

Thank you, @Bmueller ... but it's not written in Java. I'll replace the bot jobs in another script language.

Here's my current proposal for "resolving" this task:

  • Close this ticket with a status of "Declined" since we weren't able to rescue the bot before the POST loophole cutoff.
  • Disable the cron jobs for the merlbot tool account.
  • Open a new ticket about getting the java source updated to either use the http reverse proxy created in T137235: Create temporary http -> https reverse proxy for MerlBot or to find another means to correctly use HTTPS for all communications with the Wikimedia Action API. Any resolution of that task is going to require @Merl to publish the source code so that others can help or to do the work himself. The ideal solution from my point of view would be publishing the source and recruiting some additional maintainers.

The cron tasks for the merlbot tool have been disabled. See T140925: merlbot cron jobs disabled due to HTTP POST errors for the follow up work that is needed.

I can replace MerlBot's bot actions in a very short time - I am working on this right now.

@doctaxon let me know if I can be of any help to you in this project.

Here are a few things I would suggest for any project attempting to replace merlbot (or really any project in Tool Labs):

  • Pick an OSI approved license for your source code and document that with a LICENSE file in the tool's home directory and header comments in the source code. See http://choosealicense.com/ if you need help deciding what license to choose.
  • Use public version control (gerrit, diffusion, github, bitbucket, ...) for our tool's source code and deploy changes to the Tool Labs servers by updating a checkout of that public version control.
  • Keep passwords and other credentials (OAuth secrets, etc) separated from the main application code so that they are not exposed publicly in your version control system of choice.
  • Create a page in the [[https://wikitech.wikimedia.org/wiki/Category:Tool_Labs_tools|Tool: namespace on wikitech]] documenting the basics of what your tool does and how to start and stop it.
  • Find co-maintainers for your tools who can help out at least with starting/stopping jobs when needed.
  • Make many small tools that do one specific task rather than a catch-all tool that does many different tasks.