April 28, 2014

Order of Memory Reads of Intel's String Instructions

Neither the Intel Manual nor Kip R. Irvine's assembly book discusses the behavior I'm describing about x86 string instructions in this post.

Given the following instruction that compares the byte at ESI to the byte at EDI.

cmps    byte ptr [esi],byte ptr es:[edi]

To perform comparison the instruction must read the bytes first. The question is whether byte at ESI or byte at EDI is read first?

Intel Manual says:
Compares the byte, word, doubleword, or quadword specified with the first source operand with the byte, word, doubleword, or quadword specified with the second source operand
Kip R. Irvine's book titled Assembly Language for Intel-Based Computers (5th edition) says:
The CMPSB, CMPSW, and CMPSD instructions each compare a memory operand pointed to by ESI to a memory operand pointed to by EDI
Both of the descriptions explain what the instructions do but none of them says how. So I needed to do some experiments in Windbg to find the answer to the question.

The first experiment was not a good one. Initially, I thought I'd put processor breakpoint (aka memory breakpoint) at ESI and another one at EDI. I also thought to execute CMPS and let the debugger to break-in on either of the processor breakpoints. And here it goes why it was a bad idea. The execution of CMPS has to complete for debugger break-in. And, by the time the CMPS completes it hits both of the breakpoints.

The other experiment I came up with is like this. Set both ESI and EDI to point two distinct memory addresses that are unmapped. The assumption is when CMPS is executed it raises an exception when trying to read memory. By looking at the exception record we can tell the address the instruction tries to read from. Given that, we can tell if that value was assigned to ESI, or to EDI, and so we can tell whether byte at ESI or byte at EDI is read first.

Here is how I did the experiment in Windbg.

I opened an executable test file in Windbg. I assembled CMPS to be placed in the memory at EIP.

0:000> a
011113be cmpsb
cmpsb

I changed ESI to point to invalid memory. And I did the same with EDI.

0:000> resi=51515151
0:000> redi=d1d1d1d1

Here is the disassembly of CMPS. Also, you know from the highlighted text that both ESI and EDI point to unmapped memory addresses.

0:000> r
eax=cccccccc ebx=7efde000 ecx=00000000 edx=00000001 esi=51515151 edi=d1d1d1d1
eip=011113be esp=0022fb70 ebp=0022fc3c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
test!wmain+0x1e:
011113be a6              cmps    byte ptr [esi],byte ptr es:[edi] ds:002b:51515151=?? es:002b:d1d1d1d1=??

I executed the process leading to an expected access violation triggered by CMPS instruction.

0:000> g
(5468.1eec8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=cccccccc ebx=7efde000 ecx=00000000 edx=00000001 esi=51515151 edi=d1d1d1d1
eip=011113be esp=0022fb70 ebp=0022fc3c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
test!wmain+0x1e:
011113be a6              cmps    byte ptr [esi],byte ptr es:[edi] ds:002b:51515151=?? es:002b:d1d1d1d1=?? 

I got the details of the exception like below.

0:000> .exr -1
ExceptionAddress: 011113be (test!wmain+0x0000001e)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: d1d1d1d1
Attempt to read from address d1d1d1d1

As you can see the access violation was occurred due to an attempt to read from d1d1d1d1 that is the value of EDI. Therefore, to answer the question at the beginning of the article, the byte at EDI is read first.

To give more to the fun, you may try how emulators handle this - that is the order of memory reads of CMPS instructions.

If you like this you may click to read more debugging posts.

UPDATE 28/April/2014 Of course I don't encourage people to rely on undocumented behavior when developing software. Stephen Canon says Intel may optimize the microcode for operands from time to time, and so we don't have a good reason to believe this behavior to be stable.

UPDATE 29/April/2014 Here is a Windows C++ program to test this behavior on your architecture: cmps-probe.cpp

  This blog is written and maintained by Attila Suszter. Read in Feed Reader.