The PS3 virtual machine is driven by two interpreters: PPU and SPU. As an optimization, a basic block in both PPU and SPU code can be rewritten directly to x86 instructions.
Basic block is a series of instructions with a single entry point and a single exit point. The only instruction in a basic block that might modify the control flow is the last instruction.
The rewriter in ps3emu is very conservative. It doesn’t try to understand the call stack and predict where a particular function should return. Instead, every return instruction is treated as a relative branch and the control is transferred back to the interpreter, that decides where the next instruction is. In case the next instruction is at the beginning of a control block, the control again is transferred to x86 code.
PPU code might be either position dependent (the main binary) or position independent (dynamically loaded libraries). The simple shortcut is to fix the base addresses of all the dynamically loaded libraries beforehand. This allows treating all PPU code as position dependent.
There is a memory map that is calculated based on the list of system libraries, where a place is reserved for each library. At runtime no matter the order in which libraries are loaded, each one has a fixed base address.
At runtime a map is created that marks every rewritten basic block. When encountering an instruction the interpreter first checks if a corresponding rewritten basic block exists.
A separate map means that the PPU code may be rewritten very aggressively. Even if the rewriter isn’t sure if a sequence of bytes is code, it can still attempt to generate x86 code based on the sequence. If the sequence was indeed data, at runtime the corresponding address will never be consulted in the map, and as such won’t present any problems.
Consider the following basic block
19b14: srwi r9,r7,8 19b18: addi r10,r0,0 19b1c: cmpwi cr7,r9,0 19b20: addi r11,r0,1 19b24: beq cr7,19b8c
Rewritten, is will look like the following C++ code
_RLWINM(0x18u,0x7u,0x8u,0x1fu,0x9u,0x0u); _ADDI(0x0u,0,0xau); _CMPI(0x9u,0x0u,0,0x7u); _ADDI(0x0u,1,0xbu); _BC(0x0u,0x1u,0x1u,0x0u,0x1eu,0x0u,0x19b8cu,0x19b24u);
After compilation it will become
rorx eax, [rbx+198h], 8 and eax, 0FFFFFFh mov esi, eax mov [rbx+1A8h], rsi mov qword ptr [rbx+1B0h], 0 setnz al movzx eax, al lea eax, [rax+rax+2] mov edx, [rbx+574h] and edx, 0FFFFFFF1h or eax, edx mov [rbx+574h], eax mov qword ptr [rbx+1B8h], 1 test al, 2 jnz _0x19b8cu
Even without further optimizations, the resulting x86 block has similar performance to the initial PowerPC block.
Problematic are the load/store instructions due to the implementation of atomics on PowerPC. For more information see Memory model.
SPU code is mostly position dependent with a notable exception: SPURS Jobs. Those are not guaranteed to be loaded at a particular base address. Even though there is a way to predict their placement for a particular version of the SPURS library, right now the position dependent code is rewritten using a different (slower) technique.
The rewriter takes advantage of the fact that code and data are always separate and never interleave in an SPU binary.
Assuming the PS3 PPU binary is called
a.elf, first rewrite it into C++
ps3tool rewrite --elf a.elf --cpp a.cpp
Then compile it using the generated
ninja -f a.cpp.ninja
Then pass the generated
a.cpp.x86.so to ps3run.
The process for SPU is similar, just add the
--spu argument to ps3tool rewrite. Every embedded SPU binary will be discovered and rewritten.