Google Compute Engine VMs Multiple Remote Denial of Service Vulnerabilities
----------------------------------------------------------------------------------------------------

Overview
------------

Google Compute Engine (GCE) is a "cloud"-based, virtualized
platform-as-a-service.  Users may "rent" compute resources in the form
of a virtual machine running various distributions of Linux.


Users may select a variety of machine types and locations.
Unfortunately, no combination of machine type of location has a
meaningful impact on our ability to exploit this vulnerability.


GCE VMs appear to have a pool of resources assigned to them, which
includes its ability to process network traffic, CPU cycles, IO, etc.
We are able to trivially overflow this pool and cause a DoS on the
targeted instance(s).


GCE platform provides users with a mechanism to "firewall" traffic
from ever reaching the OS/kernel of the VM.  However, network traffic
targeting ports which are not permitted via GCE firewall still count
against the VM's pool of resources.  This allows an attacker to target
a port they are certain will be not permitted which will go undetected
by the owner of the VM, as the packets aren’t visible to the kernel
for tcpdump.


Impact
---------

A remote, unauthenticated user is able to trivially DoS GCE VMs.


Exploit
---------

#1:  TCP SYN flood on an open port.


Using a simple TCP SYN flood on an open port will allow the attacker
to consume the available resources for the VM, causing a DoS.
However, this is visible from within the OS via tcpdump.


#2:  TCP SYN flood on a blocked port.


This is identical to #1 above, however, the user of the VM is not able
to detect the flood of traffic and will only suffer from poor
performance/DoS.


#3:  UDP flood.


UDP floods have different behaviour than the TCP SYN flood.  We are
able to use iperf to simulate this attack:


iperf -c <gce.ip.addr> -u -i1 -b150M -t1 -p<blocked UDP port>


Sending 1 second of high volume UDP traffic will cause extreme latency
on the GCE VM.  In testing, we were able to observe upwards of 50
seconds of latency incurred on subsequent traffic.  This appears to be
some sort of buffer mechanism, where the traffic is buffered and
processed sequentially.  No lost packets were observed.

Note:  See update.  Regular UDP flooding still works; this particular
issue has been resolved by Google.


Mitigation
------------

There are no significant avenues of mitigation for this vulnerability
as it appears to be by design.  The best case scenario for concerned
GCE users is to use RFC1918 addresses for all of their VMs and use a
single/pool of source-nat VMs to allow external communication, while
using GCLB for inbound communications.


GCLB does not have an intrinsic ability to block this DoS - rather, it
will evenly spread (like ECMP) traffic across a group of VMs.  If the
pool is overflowed at 300K PPS, and the target has 5 instances, you
need only generate 300K * 5 PPS in order to knock the systems offline.


However, GCLB does provide additional protection by dropping packets
that are not expressly permitted, rather than counting them against
the GCE VM’s pool of resources.  Thus, the attack vector through GCLB
is limited to open ports.


Response:
--------------

After reporting these issues to Google, they appear to have addressed
the 1-second UDP flood.


However, you are still able to perform a typical TCP or UDP flood on
any Google Compute Engine VM on a port blocked by the GCE firewall and
knock it offline, cause increased latency and packet loss.