Building & programming with “diet” engine
This documentation introduces how to build Capstone for X86 architecture to minimize the libraries for embedding purpose.
Later part presents the APIs related to this feature and recommends the areas programmers should to pay attention to in their code.
1. Building “diet” engine
Typically, we use Capstone for usual applications, where the library weight does not really matter. Indeed, as of version 2.1-RC1, the whole engine is only 1.9 MB including all architectures, and this size raises no issue to most people.
However, there are cases when we want to embed Capstone into special environments, such as OS kernel driver or firmware, where its size should be as small as possible due to space restriction. While we can always compile only selected architectures to make the libraries more compact, we still want to slim them down further.
Towards this object, since version 2.1, Capstone supports diet mode, in which some non-critical data are removed, thus making the engine size at least 40% smaller.
By default, Capstone is built in standard mode. To build diet engine, do: (demonstration is on *nix systems)
If we only build selected architectures, the engine is even smaller. Find below the size for each individual architecture compiled in diet mode.
Architecture | Library | Standard binary | “Diet” binary | Reduced size |
---|---|---|---|---|
Arm | libcapstone.a libcapstone.dylib |
730 KB 599 KB |
603 KB 491 KB |
18% 19% |
Arm64 | libcapstone.a libcapstone.dylib |
519 KB 398 KB |
386 KB 273 KB |
26% 32% |
Mips | libcapstone.a libcapstone.dylib |
206 KB 164 KB |
136 KB 95 KB |
34% 43% |
PowerPC | libcapstone.a libcapstone.dylib |
140 KB 103 KB |
69 KB 50 KB |
51% 52% |
X86 | libcapstone.a libcapstone.dylib |
809 KB 728 KB |
486 KB 452 KB |
40% 38% |
Combine all archs | libcapstone.a libcapstone.dylib |
2.3 MB 1.9 MB |
1.6 MB 1.3 MB |
31% 32% |
(Above statistics were collected as of version 2.1-RC1, built on Mac OSX 10.9.1 with clang-500.2.79)
2. Programming with “diet” engine
2.1 Irrelevant data fields with “diet” engine
To significantly reduce the engine size, some internal data has to be sacrificed. Specifically, the following data fields in cs_insn struct become irrelevant.
Data field | Meaning | Replaced with |
---|---|---|
@mnemonic | Mnemonic of instruction | @id |
@op_str | Operand string of instruction | @detail->operands |
@detail->regs_read @detail->regs_read_count |
Registers implicitly read by instruction | No |
@detail->regs_write @detail->regs_write_count |
Registers implicitly written by instruction | No |
@detail->groups @detail->groups_count |
Semantic groups instruction belong to | No |
While these information is missing, fortunately we can still work out some critical information with the remaining data fields of cs_insn struct.
-
@mnemonic
Without mnemonic information, we can rely on @id field of cs_insn struct.
For example, instruction “ADD EAX, EBX” would have @id as X86_INS_ADD.
-
@op_str
Without operand string, we can still extract equivalent information out of @detail->operands, which contains all details about operands of instruction.
For example, instruction “ADD EAX, EBX” would have 2 operands of register type X86_OP_REG, with register IDs of X86_REG_EAX & X86_REG_EBX.
Besides, all the details in architecture-dependent structures such as cs_arm, cs_arm64, cs_mips, cs_ppc & cs_x86 is still there for us to work out all the information needed, even without the missing fields.
2.2 Irrelevant APIs with “diet” engine
While most Capstone APIs are still function exactly the same, due to these absent data fields, the following APIs become irrelevant.
-
cs_reg_name()
Given a register ID (like X86_REG_EAX), we cannot retrieve its register name anymore.
-
cs_insn_name()
Given an instruction ID (like X86_INS_SUB), we cannot retrieve its instruction name anymore.
-
cs_insn_group()
We no longer have group information, so we cannot check if an instruction belongs to a particular group.
-
cs_reg_read()
We no longer have information about registers implicitly read by instructions, so we cannot tell if an instruction read a particular register.
-
cs_write_read()
We no longer have information about registers implicitly read by instructions, so we cannot tell if an instruction modify a particular register.
By irrelevant, we mean above APIs would return undefined value. Therefore, programmers have been warned not to use these APIs in diet mode.
2.3 Checking engine for “diet” status
Capstone allows us to check if the engine was compiled in diet mode with cs_support() API, as follows - sample code in C.
With Python, we can either check the diet mode via the function cs_support of capstone module, as follows.
Or we can also use the diet getter of Cs class for the same purpose, as follows.