Reversing a Mystery Function
One of my hobbies is reverse-engineering old binaries. I even wrote a custom decompiler to help me do it (here). Reverse-engineering is something like doing a crossword puzzle. You start with only a few small hints and each hint you solve gives you more information about the other hints. Eventually, the entire picture comes together. Here we have an excellent example of the process. And the result makes for a fun discovery.
The machine code
We start with this weird-looking x86-16 machine code.
Any bets on what it does?
0000:03b2: 59 pop cx
0000:03b3: 0e push cs
0000:03b4: 51 push cx
0000:03b5: 33 c9 xor cx,cx
0000:03b7: eb 16 jmp 0000:03cf
0000:03b9: 59 pop cx
0000:03ba: 0e push cs
0000:03bb: 51 push cx
0000:03bc: b9 01 00 mov cx,0x1
0000:03bf: eb 0e jmp 0000:03cf
0000:03c1: 59 pop cx
0000:03c2: 0e push cs
0000:03c3: 51 push cx
0000:03c4: b9 02 00 mov cx,0x2
0000:03c7: eb 06 jmp 0000:03cf
0000:03c9: 59 pop cx
0000:03ca: 0e push cs
0000:03cb: 51 push cx
0000:03cc: b9 03 00 mov cx,0x3
0000:03cf: 55 push bp
0000:03d0: 56 push si
0000:03d1: 57 push di
0000:03d2: 8b ec mov bp,sp
0000:03d4: 8b f9 mov di,cx
0000:03d6: 8b 46 0a mov ax,WORD PTR ss:[bp+0xa]
0000:03d9: 8b 56 0c mov dx,WORD PTR ss:[bp+0xc]
0000:03dc: 8b 5e 0e mov bx,WORD PTR ss:[bp+0xe]
0000:03df: 8b 4e 10 mov cx,WORD PTR ss:[bp+0x10]
0000:03e2: 0b c9 or cx,cx
0000:03e4: 75 08 jne 0000:03ee
0000:03e6: 0b d2 or dx,dx
0000:03e8: 74 69 je 0000:0453
0000:03ea: 0b db or bx,bx
0000:03ec: 74 65 je 0000:0453
0000:03ee: f7 c7 01 00 test di,0x1
0000:03f2: 75 1c jne 0000:0410
0000:03f4: 0b d2 or dx,dx
0000:03f6: 79 0a jns 0000:0402
0000:03f8: f7 da neg dx
0000:03fa: f7 d8 neg ax
0000:03fc: 83 da 00 sbb dx,0x0
0000:03ff: 83 cf 0c or di,0xc
0000:0402: 0b c9 or cx,cx
0000:0404: 79 0a jns 0000:0410
0000:0406: f7 d9 neg cx
0000:0408: f7 db neg bx
0000:040a: 83 d9 00 sbb cx,0x0
0000:040d: 83 f7 04 xor di,0x4
0000:0410: 8b e9 mov bp,cx
0000:0412: b9 20 00 mov cx,0x20
0000:0415: 57 push di
0000:0416: 33 ff xor di,di
0000:0418: 33 f6 xor si,si
0000:041a: d1 e0 shl ax,0x1
0000:041c: d1 d2 rcl dx,0x1
0000:041e: d1 d6 rcl si,0x1
0000:0420: d1 d7 rcl di,0x1
0000:0422: 3b fd cmp di,bp
0000:0424: 72 0b jb 0000:0431
0000:0426: 77 04 ja 0000:042c
0000:0428: 3b f3 cmp si,bx
0000:042a: 72 05 jb 0000:0431
0000:042c: 2b f3 sub si,bx
0000:042e: 1b fd sbb di,bp
0000:0430: 40 inc ax
0000:0431: e2 e7 loop 0000:041a
0000:0433: 5b pop bx
0000:0434: f7 c3 02 00 test bx,0x2
0000:0438: 74 06 je 0000:0440
0000:043a: 8b c6 mov ax,si
0000:043c: 8b d7 mov dx,di
0000:043e: d1 eb shr bx,0x1
0000:0440: f7 c3 04 00 test bx,0x4
0000:0444: 74 07 je 0000:044d
0000:0446: f7 da neg dx
0000:0448: f7 d8 neg ax
0000:044a: 83 da 00 sbb dx,0x0
0000:044d: 5f pop di
0000:044e: 5e pop si
0000:044f: 5d pop bp
0000:0450: ca 08 00 retf 0x8
0000:0453: f7 f3 div dx,ax,bx
0000:0455: f7 c7 02 00 test di,0x2
0000:0459: 74 01 je 0000:045c
0000:045b: 92 xchg dx,ax
0000:045c: 33 d2 xor dx,dx
0000:045e: eb ed jmp 0000:044d
Entry stubs
A first observation is that we have a series of "entry stubs" of the form:
0000:03b2: 59 pop cx
0000:03b3: 0e push cs
0000:03b4: 51 push cx
0000:03b5: <....> <set cx to some value>
0000:03b7: eb 16 jmp 0000:03cf
In addition, elsewhere in the binary we have all the following calls:
call 0000:03b2
callf 0000:03b5
call 0000:03b9
callf 0000:03bc
call 0000:03c1
callf 0000:03c4
call 0000:03c9
callf 0000:03cc
But no calls to 0000:03cf
directly. Instead these 8 stubs are different entry points for the core
routine implemented from 0000:03cf
.
This part of each stub is a "near call entry" for a "far call function":
0000:03b2: 59 pop cx
0000:03b3: 0e push cs
0000:03b4: 51 push cx
Basically: (1) save near-call return offset to CX, (2) push the code-segment CS, (3) push offset back to stack
This sequence allows the caller to perform a near call (call
) and the implementation function to return with a
far return (retf
).
If this is confusing, just ignore it. Near and far calls are a special beast specific only to x86-16.
So, we have 4 actual "versions" of the call:
Mode (CX) | Near Call | Far Call |
---|---|---|
0 | 0000:03b2 |
0000:03b5 |
1 | 0000:03b9 |
0000:03bc |
2 | 0000:03c1 |
0000:03c4 |
3 | 0000:03c9 |
0000:03cc |
Core Implementation Inspection
The core routine is implemented from 0000:03cf
to 0000:0460
. We turn our attention to that section now.
After further inspection, it seems that this is probably a hand-coded or hand-optimized assembly routine:
- Passing an argument in CX is not part of the normal calling-conventions
- Typical function prologue is
push BP; mov SP,BP
with no other stack operations between. This causes the first parameter to be loaded fromSS:BP+0xa
rather than the more typicalSS:BP+0x6
- Almost halfway through the function, it starts using the frame-pointer
BP
as an ordinary register withmov BP,CX
. Emitting a frame-pointer is a well-known optimization, but clobbering the frame-pointer halfway through is.. umm.. lesser known.
Here's an interesting sequence:
0000:041a: d1 e0 shl ax,0x1
0000:041c: d1 d2 rcl dx,0x1
0000:041e: d1 d6 rcl si,0x1
0000:0420: d1 d7 rcl di,0x1
The rcl
instruction rotates left including the Carry Flag (CF
). So this is shifting most-significant bits out of
one register and into the least-significant register. Curious.
Finally, we see these checks early in the function:
0000:03e6: 0b d2 or dx,dx
0000:03e8: 74 69 je 0000:0453
0000:03ea: 0b db or bx,bx
0000:03ec: 74 65 je 0000:0453
Jumping to:
0000:0453: f7 f3 div dx,ax,bx
0000:0455: f7 c7 02 00 test di,0x2
0000:0459: 74 01 je 0000:045c
0000:045b: 92 xchg dx,ax
0000:045c: 33 d2 xor dx,dx
0000:045e: eb ed jmp 0000:044d
Which just returns:
0000:044d: 5f pop di
0000:044e: 5e pop si
0000:044f: 5d pop bp
0000:0450: ca 08 00 retf 0x8
So, sometimes with just do a simple division and return. Curious.
Decompiling
Let's decompile it with dis86 into C code. After doing a little manual cleanup, we have:
HYDRA_FUNC(mystery_function)
{
#define _param_0006 *PTR_16(SS, SP + 0x4)
#define _param_0008 *PTR_16(SS, SP + 0x6)
#define _param_000a *PTR_16(SS, SP + 0x8)
#define _param_000c *PTR_16(SS, SP + 0xa)
u16 ax_2, dx_2, bx_2, cx_2, dx_25, ax_19, di_18, bx_17, dx_24, ax_18, di_8, bp_3, cx_14, di_10, si_4, dx_10, ax_5, ax_6, dx_11, si_5, di_11, di_19, si_14, ax_20, cx_12, ax_21, dx_28, bx_12, dx_16, ax_12, ax_22;
u16 tmp_0, tmp_1;
u32 tmp_u32;
u64 tmp_u64;
ax_2 = _param_0006;
dx_2 = _param_0008;
bx_2 = _param_000a;
cx_2 = _param_000c;
if (cx_2 != 0) {
goto addr_03ee;
}
if (dx_2 == 0 || bx_2 == 0) {
tmp_0 = MAKE_32(dx_2, ax_2) / bx_2;
tmp_1 = MAKE_32(dx_2, ax_2) % bx_2;
if ((CX & 2) == 0) {
ax_22 = tmp_0;
} else {
ax_22 = tmp_1;
}
dx_16 = 0;
ax_12 = ax_22;
goto ret;
} else {
goto addr_03ee;
}
addr_03ee:;
if ((CX & 1) != 0) {
bx_17 = bx_2;
dx_24 = dx_2;
ax_18 = ax_2;
di_8 = CX;
bp_3 = cx_2;
} else {
if (((dx_2 >> 15) != 0) == 0) {
dx_25 = dx_2;
ax_19 = ax_2;
di_18 = CX;
} else {
/* MANUALLY IMPLEMENTED: ... 32-bit negation impl ....
0000:03f8: f7 da neg dx
0000:03fa: f7 d8 neg ax
0000:03fc: 83 da 00 sbb dx,0x0
*/
tmp_u32 = -MAKE_32(dx_2, ax_2);
dx_25 = (u16)(tmp_u32 >> 16);
ax_19 = (u16)tmp_u32;
di_18 = CX | 12;
}
if (((cx_2 >> 15) != 0) == 0) {
bx_17 = bx_2;
dx_24 = dx_25;
ax_18 = ax_19;
di_8 = di_18;
bp_3 = cx_2;
} else {
/* MANUALLY IMPLEMENTED: ... 32-bit negation impl ....
0000:0406: f7 d9 neg cx
0000:0408: f7 db neg bx
0000:040a: 83 d9 00 sbb cx,0x0
0000:0410: 8b e9 mov bp,cx
*/
tmp_u32 = -MAKE_32(cx_2, bx_2);
bp_3 = (u16)(tmp_u32 >> 16);
bx_17 = (u16)tmp_u32;
dx_24 = dx_25;
ax_18 = ax_19;
di_8 = di_18 ^ 4;
}
}
cx_14 = 32;
di_10 = 0;
si_4 = 0;
dx_10 = dx_24;
ax_5 = ax_18;
while (1) {
addr_041a:;
/* MANUALLY IMPLEMENTED: ... 64-bit left-shifting ...
0000:041a: d1 e0 shl ax,0x1
0000:041c: d1 d2 rcl dx,0x1
0000:041e: d1 d6 rcl si,0x1
0000:0420: d1 d7 rcl di,0x1
*/
tmp_u64 = MAKE_64(MAKE_32(di_10, si_4), MAKE_32(dx_10, ax_5));
tmp_u64 <<= 1;
// unpack back to 16-bit registers
ax_6 = (u16)(tmp_u64 >> 0);
dx_11 = (u16)(tmp_u64 >> 16);
si_5 = (u16)(tmp_u64 >> 32);
di_11 = (u16)(tmp_u64 >> 48);
if (di_11 < bp_3) {
di_19 = di_11;
si_14 = si_5;
ax_20 = ax_6;
goto addr_0431;
}
if (di_11 > bp_3) {
goto addr_042c;
}
if (si_5 < bx_17) {
di_19 = di_11;
si_14 = si_5;
ax_20 = ax_6;
goto addr_0431;
}
addr_042c:;
/* MANUALLY IMPLEMENTED: ... 32-bit subtraction ... (DI:SI) -= (BP:BX)
0000:042c: 2b f3 sub si,bx
0000:042e: 1b fd sbb di,bp
*/
tmp_u32 = MAKE_32(di_11, si_5) - MAKE_32(bp_3, bx_17);
di_19 = (u16)(tmp_u32 >> 8);
si_14 = (u16)(tmp_u32 >> 0);
ax_20 = ax_6 + 1;
addr_0431:;
cx_12 = cx_14 - 1;
if (cx_12 == 0) {
goto addr_0433;
}
cx_14 = cx_12;
di_10 = di_19;
si_4 = si_14;
dx_10 = dx_11;
ax_5 = ax_20;
goto addr_041a;
}
addr_0433:;
if ((di_8 & 2) == 0) {
ax_21 = ax_20;
dx_28 = dx_11;
bx_12 = di_8;
goto addr_0440;
}
ax_21 = si_14;
dx_28 = di_19;
bx_12 = di_8 >> 1;
addr_0440:;
if ((bx_12 & 4) == 0) {
dx_16 = dx_28;
ax_12 = ax_21;
} else {
/* MANUALLY IMPLEMENTED ... 32-bit negation impl .... -(DX:AX)
0000:0446: f7 da neg dx
0000:0448: f7 d8 neg ax
0000:044a: 83 da 00 sbb dx,0x0
*/
tmp_u32 = -MAKE_32(dx_28, ax_21);
dx_16 = (u16)(tmp_u32 >> 16);
ax_12 = (u16)tmp_u32;
}
ret:;
DX = dx_16;
AX = ax_12;
RETURN_FAR();
#undef _param_0006
#undef _param_0008
#undef _param_000a
#undef _param_000c
}
NOTE: Dis86 can be a little verbose at times due to it's internal SSA form that creates a lot of unique variables as a side-effect. This can be a little jarring, but it's actually very helpful because it prevents variable aliasing that can easily thwart manual refactorings. Dis86 is a relatively new decompiler (as of April 2024), so this should improve a lot with more time.
Refactoring
With more cleanup we can reach the following version. Note that I tried to retain the connection to registers and the overall machine-level state-transfers.
From here, you should be able to understand what the function does:
#define DIVMOD(quot, rem, dividend, divider) do { \
quot = dividend / divider; \
rem = dividend % divider; \
} while(0)
#define SHUFFLE_LEFT(high_32, low_32) do { \
u64 tmp_u64 = MAKE_64(high_32, low_32); \
tmp_u64 <<= 1; \
high_32 = (u32)(tmp_u64 >> 32); \
low_32 = (u32)(tmp_u64 >> 0); \
} while (0)
HYDRA_FUNC(mystery_function)
{
// Mode param is passed in CX
u16 cx_mode = CX;
// Move mode to DI (to make room for params)
u16 di_mode = cx_mode;
// Load parameters
u16 ax_param_1 = *PTR_16(SS, SP + 0x4); // Typically in AX
u16 dx_param_2 = *PTR_16(SS, SP + 0x6); // Typically in DX
u16 bx_param_3 = *PTR_16(SS, SP + 0x8); // Typically in BX
u16 cx_param_4 = *PTR_16(SS, SP + 0xa); // Typically in CX
// Number to divide (dividend) in DX:AX
u32 dx_ax = MAKE_32(dx_param_2, ax_param_1);
// Divisor in CX:BX
u32 cx_bx = MAKE_32(cx_param_4, bx_param_3);
// Special case: cx_param_4 == 0 implies that the divisor fits in 16-bits.
// If dx_param_2 == 0, then both fit in 16-bits and we can just use the normal
// 8086 divide instruction. If bx_param_3 == 0 then we have a divisor that is 0
// and we want to execute the divide instruction to trigger a divide by zero fault.
// Note: In both of these cases, using the unsigned division instruction is perfectly acceptable.
if (cx_param_4 == 0 && (dx_param_2 == 0 || bx_param_3 == 0)) {
// The DIV instruction computes both the quotient and remainder at once
u16 ax_quot, dx_rem;
DIVMOD(ax_quot, dx_rem, dx_ax, bx_param_3);
// mode & 0x2 => return the remainder
if (di_mode & 0x2) {
dx_ax = (u32)dx_rem;
} else {
dx_ax = (u32)ax_quot;
}
goto ret;
}
// !(mode & 0x1) => perform signed division
if (!(di_mode & 0x1)) {
// DX:AX (num) is negative?
if ((i32)dx_ax < 0) {
// Negate it so we can treated it as unsigned
dx_ax = -dx_ax;
// Record in the mode so we can fix it up at the end
di_mode |= 0xc; // negate both a quot and rem result
}
// DX:AX (div) is negative?
if ((i32)cx_bx < 0) {
// Negate it so we can treated it as unsigned
cx_bx = -cx_bx;
// Record in the mode so we can fix it up at the end
di_mode ^= 0x4; // negate a quot result only if both num & div are not negative (xor cancels the negations)
}
}
// The loop need to use the CX register for the LOOP instruction, so
// we need to shuffle the contents elsewhere. The frame-pointer (BP) was selected.
// This an interesting choice and a strong hint the routine was hand-coded because
// typically it's an off-limits register (especially as this function is totally using
// it as a proper frame-pointer during param loading!)
u32 bp_bx = cx_bx;
// DI is also shuffled to the stack for the duration of the loop
// DI:SI is used a scratch register that we slowly shift bits into
u32 di_si = 0;
for (u16 cx_loop_cnt = 32; cx_loop_cnt != 0; cx_loop_cnt--) {
/* ORIGINAL ... 64-bit left-shifting via rcl chaining through the Carry Flag
0000:041a: d1 e0 shl ax,0x1
0000:041c: d1 d2 rcl dx,0x1
0000:041e: d1 d6 rcl si,0x1
0000:0420: d1 d7 rcl di,0x1
*/
// Shuffle bits in DI:SI and DX:AX left
// The MSB that shifts out of DX:AX will shift into DI:SI
SHUFFLE_LEFT(di_si, dx_ax);
// Value os too small to divide it out.. keep shifting
if (di_si < bp_bx) {
continue;
}
// Great! We can subtract off a divisor and update the quotient in the newly available bit in DX:AX
di_si -= bp_bx;
dx_ax++;
}
// At this point, we have the quotient in DX:AX and the remainder in DI:SI
// Notice that we no longer need BP:BX (the divisor)
// Before the loop, the mode is pushed from DI, afterwards its popped into BX
u16 bx_mode = di_mode;
// mode & 0x2 => return the remainder
if (bx_mode & 0x2) {
dx_ax = di_si;
bx_mode >>= 1;
}
// mode & 0x8 => negate remainder result
// mode & 0x4 => negate quotient result
if (bx_mode & 0x4) {
/* ORIGINAL: ... 32-bit negation impl ....
0000:0446: f7 da neg dx
0000:0448: f7 d8 neg ax
0000:044a: 83 da 00 sbb dx,0x0
*/
dx_ax = -dx_ax;
}
ret:;
/* Return 32-bits in DX:AX */
DX = UPPER_16(dx_ax);
AX = LOWER_16(dx_ax);
RETURN_FAR();
}
Explanation
The mystery function implements divmod for 32-bits on an x86-16
machine!
The machine has a native DIV
instruction:
Opcode | Mnemonic | Description |
---|---|---|
F7 /6 | DIV r/m16 | Unsigned divide DX:AX by r/m16, with result stored in AX = Quotient, DX = Remainder. |
The instruction takes a 32-bit dividend in DX:AX
and a 16-bit divider as r/m16
, so this alone isn't sufficient to
implement division between two 32-bit values as required by ordinary C code.
The core of this routine is the loop of 32, shifting one bit at a time. This is clever way to implement divmod.
Example:
Let's demonstrate the division algorithm by example, implementing 8-bit divisions.
let
num = 43
div = 3
we want to compute
num / div = 14 rem 1
Convert to 8-bit binary:
num = 00101011 (43)
div = 00000011 (3)
Form a 16-bit <0,num>
scratch value:
scratch = 00000000_00101011
Step 1a: Shift up scratch until the upper 8-bits are greater than or equal to div
:
scratch = 00000101_01100000 (5 shifts)
Step 1b: Subtract div = 00000011
off upper and add 1 to lower:
scratch = 00000010_01100000 + 1
Step 2a: Shift up scratch until the upper 8-bits are greater than or equal to div
:
scratch = 00000100_11000010 (1 shift)
Step 2b: Subtract div = 00000011
off upper and add 1 to lower:
scratch = 00000001_11000010 + 1
Step 3a: Shift up scratch until the upper 8-bits are greater than or equal to div
:
scratch = 00000011_10000110 (1 shift)
Step 3b: Subtract div = 00000011
off upper and add 1 to lower:
scratch = 00000000_10000110 + 1
Step 4: Shift up scratch until done
scratch = 00000001_00001110 (shift 1)
Finally we unpack the scratch as <rem,quot>
:
rem = 00000001 (1)
quot = 00001110 (14)
And we have the desired result.
To see this another way, we are forming the quotient as a decomposition of shifts/powers-of-two:
quot = (shift 5 and add 1) + (shift 1 and add 1) + (shift 1 and add 1)
= 2**(8-5) + 2**(8-5-1) + 2**(8-5-1-1)
= 2**3 + 2**2 + 2**1
= 8 + 4 + 2
= 14
Or in a more traditional lens:
num_0, quot_0 = 43, 0
num_1, quot_1 = 43, 0 (try to subtract 3*(2**7) = 3*128 = 384)
num_2, quot_2 = 43, 0 (try to subtract 3*(2**6) = 3*64 = 192)
num_3, quot_3 = 43, 0 (try to subtract 3*(2**5) = 3*32 = 96)
num_4, quot_4 = 43, 0 (try to subtract 3*(2**4) = 3*16 = 48)
num_5, quot_5 = 19, 8 (try to subtract 3*(2**3) = 3*8 = 24)
num_6, quot_6 = 7, 12 (try to subtract 3*(2**2) = 3*4 = 12)
num_7, quot_7 = 1, 14 (try to subtract 3*(2**1) = 3*2 = 6)
num_8, quot_8 = 1, 14 (try to subtract 3*(2**0) = 3*1 = 3)
Try running through some examples yourself if you want to fully understand.
Core Algorithm
Our function implements the same algorithm as the example, but over 32-bits rather than 8-bits. This means it needs a 64-bit shift on a 16-bit machine :-)
The core algorithm is here:
// DI:SI is used a scratch register that we slowly shift bits into
u32 di_si = 0;
for (u16 cx_loop_cnt = 32; cx_loop_cnt != 0; cx_loop_cnt--) {
// The MSB that shifts out of DX:AX will shift into DI:SI
SHUFFLE_LEFT(di_si, dx_ax);
// Value to small to divide it out.. keep shifting
if (di_si < bp_bx) {
continue;
}
// Great! We can subtract off a divisor and update the quotient in the newly available bit in DX:AX
di_si -= bp_bx;
dx_ax++;
}
Special Case
Let's return to the special case with the DIV
instruction:
if (cx_param_4 == 0 && (dx_param_2 == 0 || bx_param_3 == 0)) {
// The DIV instruction computes both the quotient and remainder at once
u16 ax_quot, dx_rem;
DIVMOD(ax_quot, dx_rem, dx_ax, bx_param_3);
// ... SNIP ....
goto ret;
}
This is actually two cases:
Condition | Reason for using DIV |
---|---|
(CX:BX) == 0 |
Intentionally trigger a divide-by-zero exception |
(DX:AX) and (CX:BX) both fit 16-bits |
Quotient and Remainder will both also be 16-bits |
Modes
Finally we need to discuss the "modes"
There is a prologue section:
// !(mode & 0x1) => perform signed division
if (!(di_mode & 0x1)) {
// DX:AX (num) is negative?
if ((i32)dx_ax < 0) {
// Negate it so we can treated it as unsigned
dx_ax = -dx_ax;
// Record in the mode so we can fix it up at the end
di_mode |= 0xc; // negate both a quot and rem result
}
// DX:AX (div) is negative?
if ((i32)cx_bx < 0) {
// Negate it so we can treated it as unsigned
cx_bx = -cx_bx;
// Record in the mode so we can fix it up at the end
di_mode ^= 0x4; // negate a quot result only if both num & div are not negative (xor cancels the negations)
}
}
And there is an epilogue section:
// mode & 0x2 => return the remainder
if (bx_mode & 0x2) {
dx_ax = di_si;
bx_mode >>= 1;
}
// mode & 0x8 => negate remainder result
// mode & 0x4 => negate quotient result
if (bx_mode & 0x4) {
/* ORIGINAL: ... 32-bit negation impl ....
0000:0446: f7 da neg dx
0000:0448: f7 d8 neg ax
0000:044a: 83 da 00 sbb dx,0x0
*/
dx_ax = -dx_ax;
}
Here's the meaning of the Mode bits:
Bit Value | Meaning |
---|---|
1 = 1<<0 |
Perform Unsigned Division |
2 = 1<<1 |
Return Remainder instead of Quotient |
4 = 1<<2 |
Negate Quotient after computation |
8 = 1<<3 |
Negate Remainder after computation |
With this we can now also understand the entry stubs:
Mode (CX) | Near Call | Far Call | Meaning |
---|---|---|---|
0 | 0000:03b2 |
0000:03b5 |
Compute Signed Division |
1 | 0000:03b9 |
0000:03bc |
Compute Signed Remainder |
2 | 0000:03c1 |
0000:03c4 |
Compute Unsigned Division |
3 | 0000:03c9 |
0000:03cc |
Compute Unsigned Remainder |
Interestingly, there's no mode for divmod
. And amusingly, scanning my binary, I found at least one function
that called both "Compute Unsigned Division" and "Compute Unsigned Remainder" for the same values. I suppose
the issue is that C doesn't have an operator for unified divmod
and the compiler here didn't implement an optimization
for it.
Conclusion
And with that, we have a full understanding of the function. Who guessed it was a division implementation from the assembly listing? Let me know!.
I had a lot of fun dissecting this function. I'm always a little impressed by the old tricks that are hanging out in these ancient binaries: long forgotten. It's fun to dredge them up and repopularize them!
If you enjoyed this kind of thing, you can stay tuned to this blog via the rss feed (here). You can also buy me a coffee (here).
Have a grand day, and try not to get lost in the Binary Mines!