Page MenuHomePhabricator

Huge increase in log sizes on Phabricator's notification service host
Closed, ResolvedPublicSecurity

Description

Size of Aphlict logs on host aphlict1001.eqiad.wmnet has dramatically increased in the last couple of days:

root@aphlict1001:/# ls -lh var/log/aphlict/*
-rw-r--r-- 1 aphlict aphlict  11G Oct 19 11:57 var/log/aphlict/aphlict.log
-rw-r--r-- 1 aphlict aphlict  60M Oct 17 00:00 var/log/aphlict/aphlict.log.3.gz
-rw-r--r-- 1 aphlict aphlict  53M Oct 16 00:01 var/log/aphlict/aphlict.log.4.gz
-rw-r--r-- 1 aphlict aphlict 116M Oct 15 00:01 var/log/aphlict/aphlict.log.5.gz
-rw-r--r-- 1 aphlict aphlict  93M Oct 14 00:01 var/log/aphlict/aphlict.log.6.gz
-rw-r--r-- 1 aphlict aphlict  69M Oct 13 00:01 var/log/aphlict/aphlict.log.7.gz
-rw-r--r-- 1 aphlict aphlict  58M Oct 12 00:00 var/log/aphlict/aphlict.log.8.gz

Noticeably, a significant portion of the messages in the latest log come from user PHID-USER-2jawdezwv2ifbl2sgpfe:

jnuche@aphlict1001:~$ wc -l /var/log/aphlict/aphlict.log
74330212 /var/log/aphlict/aphlict.log
jnuche@aphlict1001:~$ grep -c 'PHID-USER-2jawdezwv2ifbl2sgpfe' /var/log/aphlict/aphlict.log
27362217

No sensitive data in this ticket, but tagged with "security" just in case this could be the symptom of some kind of attack.

Details

Risk Rating
Low
Author Affiliation
WMF Technology Dept

Event Timeline

hashar set Security to Software security bug.Oct 19 2022, 3:19 PM
hashar added a project: Security-Team.
hashar changed the visibility from "Public (No Login Required)" to "Custom Policy".
hashar changed the subtype of this task from "Task" to "Security Issue".
hashar subscribed.

Tagging with Security is not sufficient to protect a task ;)

@hashar It's not clear yet this is a security issue. I wanted to draw this to their attention in case it made sense to them to promote the ticket

I think logrotate had a trouble, the log increment jump straight to 3:

-rw-r--r-- 1 aphlict aphlict  53M Oct 16 00:01 var/log/aphlict/aphlict.log.4.gz
-rw-r--r-- 1 aphlict aphlict  60M Oct 17 00:00 var/log/aphlict/aphlict.log.3.gz
-rw-r--r-- 1 aphlict aphlict  11G Oct 19 11:57 var/log/aphlict/aphlict.log

The first entry in aphlict.log shows [10/17/2022, 12:00:57 AM]. That leads me to believe the logrotate did not work on Oct 18th and October 19th at midnight. We thus end up with multiple GBytes file where as when compressed a day worth of log is less than a hundred MBytes.

The logrotate config is more than two years old:

$ ls -la /etc/logrotate.d/aphlict 
-r--r--r-- 1 root root 127 Jul 17  2020 /etc/logrotate.d/aphlict
/etc/logrotate.d/aphlict
/var/log/aphlict/*.log {
    daily
    compress
    delaycompress
    copytruncate
    missingok
    notifempty
    rotate 7
}

Looking at some random logs:

$ sudo grep logrotate /var/log/daemon.log
Oct 16 00:01:35 aphlict1001 systemd[1]: logrotate.service: Succeeded.
Oct 17 00:00:58 aphlict1001 systemd[1]: logrotate.service: Succeeded.
Oct 18 00:01:37 aphlict1001 logrotate[30497]: error: error writing to /var/log/aphlict/aphlict.log.1: No space left on device
Oct 18 00:01:37 aphlict1001 logrotate[30497]: error: error copying /var/log/aphlict/aphlict.log to /var/log/aphlict/aphlict.log.1: No space left on device
Oct 18 00:01:37 aphlict1001 systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Oct 18 00:01:37 aphlict1001 systemd[1]: logrotate.service: Failed with result 'exit-code'.
Oct 19 00:00:23 aphlict1001 logrotate[29327]: error: error writing to /var/log/aphlict/aphlict.log.1: No space left on device
Oct 19 00:00:23 aphlict1001 logrotate[29327]: error: error copying /var/log/aphlict/aphlict.log to /var/log/aphlict/aphlict.log.1: No space left on device
Oct 19 00:00:24 aphlict1001 systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Oct 19 00:00:24 aphlict1001 systemd[1]: logrotate.service: Failed with result 'exit-code'.
hashar@aphlict1001:/var/log$ sudo ^C

So disk went full. On Oct 18th logrotate failed cause of full disk and kept failing subsequently.

Note the unit is marked as failing, that should raise an alarm somewhere:

$ sudo systemctl status logrotate
● logrotate.service - Rotate log files
   Loaded: loaded (/lib/systemd/system/logrotate.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-10-19 00:00:24 UTC; 15h ago
     Docs: man:logrotate(8)
           man:logrotate.conf(5)
  Process: 29327 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
 Main PID: 29327 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

See also https://secure.phabricator.com/T12908 Allow verbosity of Aphlict log to be reduced

I think this was at least approximately resolved by D21115, which made the default log much quieter and the "debug" log more verbose.

We run a much older version of Aphlict from what I can tell. The currently running version seems to be out of release/2019-08-21/1. I am guessing aphlict1001 is not part of the general Phabricator scap deployment.

sbassett subscribed.

@hashar It's not clear yet this is a security issue. I wanted to draw this to their attention in case it made sense to them to promote the ticket

Just FYI, you'll want to tag our team tag for that - Security-Team - as @hashar did. Security is a super-general tag with I think just over a thousand open issues right now. We typically triage newly-created security tasks (which automatically tags our team) and anything else that has been newly-created and tagged with Security-Team, every Monday during our clinic.

I don't see a relation to security here. Worst case this would do is that Phabricator users don't get some pop-up notifications.

manually compressed and cleaned up some logs, started logrotate.

we are back to 92% usage

Arguably it's not even a huge increase since the large one was simply not compressed, while the other existing ones were compressed. It's about 3 times the average size for a single day and this would easily happen if some user / bot is mass editing.

We can figure out which user did this though.

MariaDB [phabricator_user]> select username,realname from user where phid="PHID-USER-2jawdezwv2ifbl2sgpfe";
+----------+--------------+
| username | realname     |
+----------+--------------+
| cmelo    | Claudio Melo |

https://phabricator.wikimedia.org/p/cmelo/

https://www.mediawiki.org/wiki/User:CMelo_(WMF)

edit: I have reached out to the user to find out what they were doing. I think that's all that is todo here.

Dzahn claimed this task.

The user has replied and they are not aware of mass editing or anything like that. Just a normal Phabricator user. shrug.

I have also cleaned up some more files from /.

Hey all - I don't see anything here that would obviously prevent us from opening up this bug, save the mention of outdated dependencies and the light DoS vector, if it's even that. If nobody objects, I'll go ahead and make this task public.

@Dzahn @sbassett Thank you so much for following up so quickly

Just FYI, you'll want to tag our team tag for that - Security-Team - as @hashar did. Security is a super-general tag with I think just over a thousand open issues right now. We typically triage newly-created security tasks (which automatically tags our team) and anything else that has been newly-created and tagged with Security-Team, every Monday during our clinic.

Thanks for sharing that. No objections to making the task public.

sbassett changed Author Affiliation from N/A to WMF Technology Dept.
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".
sbassett changed Risk Rating from N/A to Low.
sbassett moved this task from Incoming to Our Part Is Done on the Security-Team board.
sbassett added a project: SecTeam-Processed.