what you don't know can hurt you
Home Files News &[SERVICES_TAB]About Contact Add New

Linux unmap_mapping_range() Race Condition

Linux unmap_mapping_range() Race Condition
Posted Aug 30, 2022
Authored by Jann Horn, Google Security Research

For VM_PFNMAP VMAs, there is a race between unmap_mapping_range() and munmap() that can lead to a page being freed by a device driver while the page still has stale TLB entries.

tags | advisory
SHA-256 | 0c343119926cb622181935b2b8688c9dde2b0e898e81a4a44edd9820611241df

Linux unmap_mapping_range() Race Condition

Change Mirror Download
Linux: unmap_mapping_range() race with munmap() on VM_PFNMAP mappings leads to stale TLB entry

For VM_PFNMAP VMAs, there is a race between unmap_mapping_range() and
munmap() that can lead to a page being freed by a device driver while
the page still has stale TLB entries.


There are drivers (in particular GPU drivers) that create
VM_PFNMAP VMAs containing PTEs that point to normal pages
from the page allocator. VM_PFNMAP means that the core kernel
won't track this using the page mapcounts; instead, the driver
is responsible for holding references to the page as long as
it is mapped into userspace.

Some of these drivers have codepaths that can remove userspace
mappings of such pages using unmap_mapping_range(), then give these
pages back to the page allocator.
For example, i915 has a shrinker callback i915_gem_shrink() that does
this.
To make this driver behavior correct, it is necessary that by the time
unmap_mapping_range() returns, all the PTEs in the specified range have
been removed and the corresponding TLB flushes have been executed.

However, munmap() ends up in unmap_region(), which does this:


struct mmu_gather tlb;

lru_add_drain();
tlb_gather_mmu(&tlb, mm);
update_hiwater_rss(mm);
unmap_vmas(&tlb, vma, start, end);
free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
next ? next->vm_start : USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);


unmap_vmas() removes all PTEs in the range, but does not necessarily
perform a TLB flush yet.
free_pgtables() then removes the VMA from the mapping's rbtree
(unlink_file_vma()) before tearing down page tables in the range:


void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
unsigned long floor, unsigned long ceiling)
{
while (vma) {
struct vm_area_struct *next = vma->vm_next;
unsigned long addr = vma->vm_start;

/*
* Hide vma from rmap and truncate_pagecache before freeing
* pgtables
*/
unlink_anon_vmas(vma);
unlink_file_vma(vma);

if (is_vm_hugetlb_page(vma)) {
[...]
} else {
[... irrelevant optimization ...]
free_pgd_range(tlb, addr, vma->vm_end,
floor, next ? next->vm_start : ceiling);
}
vma = next;
}
}


The TLB flush corresponding to the PTEs that were removed in
unmap_vmas() might only happen afterwards, in tlb_finish_mmu().


This is bad because starting at unlink_file_vma(), the VMA won't
be visible to unmap_mapping_range() anymore. If the driver calls
unmap_mapping_range() directly after munmap() called
unlink_file_vma(), unmap_mapping_range() won't notice the
existence of this VMA, it might return while there are still
stale TLB entries pointing to this page, and the driver could
then free the page while userspace can still read/write it
through the stale TLB entry.


It would be a pain to actually hit this bug through the i915
driver though, since the only time it ever uses
unmap_mapping_range() like this is in the i915_gem_shrink()
shrinker callback. Instead, I wrote a reproducer against some
out-of-tree GPU driver where the unmap_mapping_range() path
can be triggered directly from userspace, and on a system
with CONFIG_PAGE_POISONING, I managed to read PAGE_POISON
(0xaa) out of the stale PTE from userspace after a few
iterations. So sadly I don't have a nice reproducer for this
issue that works upstream.


I guess if we want to avoid having extra TLB flushes for
non-PFNMAP/MIXEDMAP VMAs, a possible fix would be to add
a new bit in struct mmu_gather to track the existence of
PTEs without struct page, and then conditionally flush
before free_pgtables() if either that bit is set or
mm_tlb_flush_nested() is true?


This bug is subject to a 90-day disclosure deadline. If a fix for this
issue is made available to users before the end of the 90-day deadline,
this bug report will become public 30 days after the fix was made
available. Otherwise, this bug report will become public at the deadline.
The scheduled deadline is 2022-10-04.




Found by: jannh@google.com

Login or Register to add favorites

File Archive:

September 2024

  • Su
  • Mo
  • Tu
  • We
  • Th
  • Fr
  • Sa
  • 1
    Sep 1st
    261 Files
  • 2
    Sep 2nd
    0 Files
  • 3
    Sep 3rd
    0 Files
  • 4
    Sep 4th
    0 Files
  • 5
    Sep 5th
    0 Files
  • 6
    Sep 6th
    0 Files
  • 7
    Sep 7th
    0 Files
  • 8
    Sep 8th
    0 Files
  • 9
    Sep 9th
    0 Files
  • 10
    Sep 10th
    0 Files
  • 11
    Sep 11th
    0 Files
  • 12
    Sep 12th
    0 Files
  • 13
    Sep 13th
    0 Files
  • 14
    Sep 14th
    0 Files
  • 15
    Sep 15th
    0 Files
  • 16
    Sep 16th
    0 Files
  • 17
    Sep 17th
    0 Files
  • 18
    Sep 18th
    0 Files
  • 19
    Sep 19th
    0 Files
  • 20
    Sep 20th
    0 Files
  • 21
    Sep 21st
    0 Files
  • 22
    Sep 22nd
    0 Files
  • 23
    Sep 23rd
    0 Files
  • 24
    Sep 24th
    0 Files
  • 25
    Sep 25th
    0 Files
  • 26
    Sep 26th
    0 Files
  • 27
    Sep 27th
    0 Files
  • 28
    Sep 28th
    0 Files
  • 29
    Sep 29th
    0 Files
  • 30
    Sep 30th
    0 Files

Top Authors In Last 30 Days

File Tags

Systems

packet storm

© 2024 Packet Storm. All rights reserved.

Services
Security Services
Hosting By
Rokasec
close