Hydra: A hybrid runtime for x86-16 and Aarch64

Today I'm publicly releasing a hybrid runtime I've been building alongside dis86 for doing reverse-engineering work. Source code is on github: here.

But.. why?

As mentioned in my article on dis86, I've been busying on a reverse-engineering and reimplementation project part-time for about a year now. The next problem after getting C code is validating whether that code actually performs the same computation as the original. When you're reversing a large binary, you can't simply wait for all the code to be reversed before starting a debugging cycle. Instead, you have to find ways to validate any changes iteratively. If you thought getting normal code well-tested was hard, try doing it on mystery machine code from a long-dead ISA.

A standard approach here is to recompile the code back to original platform and link the recompiled code into the binary where the original machine-code exists. Along this path, folks start to value "recompilation accuracy", i.e. can you decompile and then recompile to the same machine code. Or, can it be similar? Can we use a binary diff to validate correctness of a decompile?

These aren't easy answers, but on the x86-16 MS-DOS platform, it's harder still.

8086 Real-mode Address-space

For starters, the address-space is small and weird. Unlike modern machines that use a flat-address space, x86-16 used segmented-addressing so they could build a 16-bit machine with an address-space larger than 16-bits. Wikipedia has a nice primer here.

The upshot is that, despite addresses being specified fully as abcd:efgh (with 32-bits), they only actually address roughly 1 MB of space (technically 1.06 MB, lol). And then after than, many regions are reserved for memory-mapped hardware, BIOS routines, MS-DOS routines, etc.

Thus, the total free application address-space was 640KB. To add some context, the IBM PC RGB VGA screen with 320x240 resolution would require 225KB alone to store each pixel with 3 color bytes (1/3 of the address-space). Fortunately, it did not use this representation.

There were two different call types: NEAR and FAR. The distinction depended on whether a function would only be called from the same code segment or not. A FAR function could not ever call a NEAR function in different segment. In modern parlance, we might call this a function-coloring problem.

Engineers at the time got very clever with tricks. One example is the use of Overlays, also known as "poor man's virtual memory". Different code segments would be paged in and out of RAM at runtime using a segment remapping trick and a little bit of self modified code.

Suffice to say that the x86-16 address space was heavily used and abused. Jamming more tricks into the mix in 2024 seems ill-advised.

Compilers

A number of different compilers existed at the time: Borland Turbo C++, Watcom, Datalight C, Visual C++, etc. Most of these are difficult to get a copy of these days. And, to my knowledge, none of them were open-sourced at the time or after. In my project's case, it appears to be compiled by Borland C++ - Copyright 1991 Borland Intl.

Obtaining an old Borland compiler, getting it running on a modern machine, getting it to re-compile my C code, and linking into the existing binary with its address-space restrictions seemed far too complicated.

X86-16 Emulation and Hooks

Hydra is a hybrid runtime that can execute an application that is partially x86-16 machine code and partially aarch64 machine-code. It accomplishes this by using the dosbox-x emulator to execute the x86-16 MS-DOS parts of the binary. To support native Aarch64 code, dosbox-x has been forked and patched to capture machine-state and provide hooks for Hydra to interrupt its execution at any x86-16 instruction address.

Function hooks

The main mechanism for integrating native Aarch64 code is by defining a function hook:

HYDRA_FUNC(H_my_function)
{
  FRAME_ENTER(2);

  u16 arg = ARG_16(0x6);

  u16 result = F_some_other_function(m, arg);
  if (result > 1) {
    AX = 4;
  } else {
    AX = 5;
  }

  FRAME_LEAVE();
  RETURN_FAR();
}

void hook_init()
{
  HDYRA_REGISTER_ADDR(H_my_function, 0x0399, 0x0123);
}

When the x86-16 emulator reaches address 0399:0123, Hydra will interrupt the execution and call the H_my_function routine above (running on Aarch64).

This function can do pretty much anything to the x86-16 machine state:

Modify x86-16 registers
Modify x86-16 memory
Call other x86-16 functions
Return into arbitrary addresses
Trigger an interrupt
Read/write to an I/O port
Make BIOS calls,
... etc etc etc ...

The call to F_some_other_function is an example of calling an arbitrary function. This function may be x86-16 machine code or may again be a hooked Hydra function compiled to native Aarch64. When the function reaches RETURN_FAR(), the Hydra Runtime will return back into the emulator using a retf equivalent return.

Annotations system

Hydra also provides an extensive annotations metadata system with supports defining:

Function names
Global variables
Jump Tables in the text section
Callstack configuration data
(and eventually) struct definitions

For example, one can access global variables (e.g. G_some_global) that map to the same memory as the x86-16:

HYDRA_FUNC(H_my_function_2)
{
  FRAME_ENTER(0);

  G_some_global = 42;

  FRAME_LEAVE();
  RETURN_FAR();
}

Callstack tracking

Because Hydra fully controls the execution of the hybrid machine and has extensive symbol metadata (e.g. function names), it can produce quality stack-traces at runtime:

Call Stack:
  0  0000:0149 => 02e0:0b38 | (null) => main
  1  02e0:0b82 => 02e0:000f | (null) => F_navigator
  2  f7dc:0000 => 02e0:0619 | (null) => F_warehouse_run
  3  f7dc:0001 => 0834:0ae9 | (null) => F_ent_update
  4  0834:ff02 => 0834:01b6 | (null) => F_ent_next

These are very handy for finding and annotating interesting functions.

Limitations

Although this system is fairly capable, there are a few limitations:

Hydra functions on Aarch64 each use their own stack and there is no way to escape local variable to x86-16 code
Dosbox-x is unable to interrupt Aarch64 code: it must cooperate to allow opportunities for interrupts
Hydra function execution will be viewed as a cpu suspend by x86-16 and any time-tracking via cycle counts will be under-reported

Conclusion

So far this approach has worked fairly well for the project's needs. As reusable software, it's unclear how flexible it is for other projects, but I think it's a fun technique that I wanted to share.

If you'd like to learn more about this project, you can stay tuned to this blog via the rss feed (here). And, feel free to review out the source code yourself on github (here). If you find this interesting, you can also buy me a coffee (here).

xorvoid