Miniretro: testing emulators at scale

Published: August 2021

Last year I got involved in libretro/Retroarch development after buying an Odroid Go Advance. During this time I've been working mainly on gpsp and porting it to new devices and such.

One of the main issues about working in software is testing, and as you can probably imagine there's no tests in most emulators. Partly due to bad practices and because it's hard to write tests for them. That's why I came up with miniretro. It is a libretro frontend designed for headless operation, so that it can be used for end-to-end/integration testing. The frontend runs a core and a rom with fake inputs and grabs the output.

Cross platform testing

Some of there emulators (like gpsp) feature a dynamic recompiler (aka dynarec), which have platform specific (CPU, OS and/or device specific) backends targeted at them. These can translate original console instructions into your device's instruction set (for speed). Since this is device specific, it's hard to write, debug and test. One needs a toolchain for the platform and a physical device to test it. Or at least that's the theory!

With miniretro it gets easier to test other platforms like for instance ARM and MIPS devices. Since it is simple (almost no dependencies) and just a Linux binary, we can use Qemu userspace emulation to run our tests! This way there's not need for a physical device, nor manual testing.

The following diagram shows how it works: miniretro and the libretro core are built for the specific device/architecture and they run under qemu. Qemu takes care of translating syscalls into host syscalls, so there's no need for a whole OS to run the program. Miniretro can open pipes and other IPC communication channels to input/output any data.

Diagram of Miniretro running under Qemu

Just as an example, let's build picodrive, gpsp and pcsx for armv6 and mips32. To do this you will need a toolchain. In my case I've been using some Linux-generic toolchains for a variety of platforms, such as ARM, MIPS, x86 and PowerPC, freshly built using Buildroot (you can find them at my Copr if you use Fedora, see this repo). This, coupled with the regular Qemu userspace emulators (available in most Linux distros), enables us to test our emulators on a bunch of platforms effortlessly.

  # Build the three emus for arm and mips
  git clone --recurse-submodules && cd picodrive
  make platform=armv CC=/opt/buildroot-armv6el-eabi-uclibc/bin/arm-linux-gcc -j10 -f Makefile.libretro all && \
    mv && make -f Makefile.libretro clean
  make platform=unix CC=/opt/buildroot-mipsel32-o32-uclibc/bin/mipsel-linux-gcc -j10 -f Makefile.libretro all && \
    mv && make -f Makefile.libretro clean

  git clone && cd gpsp
  make platform=armv CC=/opt/buildroot-armv6el-eabi-uclibc/bin/arm-linux-gcc -j10 all && \
    mv && make platform=armv clean
  make platform=mips32 CC=/opt/buildroot-mipsel32-o32-uclibc/bin/mipsel-linux-gcc -j10 all && \
    mv && make platform=mips32 clean

  git clone && cd pcsx_rearmed
  make platform=armv CC=/opt/buildroot-armv6el-eabi-uclibc/bin/arm-linux-gcc -j10 -f Makefile.libretro all && \
    mv && make -f Makefile.libretro clean
  make platform=unix CC=/opt/buildroot-mipsel32-o32-uclibc/bin/mipsel-linux-gcc -j10 -f Makefile.libretro all && \
    mv && make -f Makefile.libretro clean

  # Build miniretro fro arm and mips too
  git clone && cd miniretro
  PREFIX=/opt/buildroot-mipsel32-o32-uclibc/bin/mipsel-linux- make && mv miniretro miniretro.mipsel && make clean
  PREFIX=/opt/buildroot-armv6el-eabi-uclibc/bin/arm-linux- make && mv miniretro miniretro.arm && make clean

And finally, let's run a full matrix of tests, choose your favourite ROMs of course!

  export ARCHS="arm mipsel" EMUS="pcsx_rearmed picodrive gpsp"
  export FRAMES="10000" OUTPUT="output/" SYSTEM="$HOME/.config/retroarch/system/"
  declare -A SYSROOTS=(["arm"]="/opt/buildroot-armv6el-eabi-uclibc/arm-buildroot-linux-uclibcgnueabi/sysroot/" \
  declare -A ROMDIR=(["gpsp"]="gbaroms/path/" ["picodrive"]="mdroms/path")

  mkdir -p ${OUTPUT}

  for emu in $EMUS; do
    for arch in $ARCHS; do
      echo "Running $emu for $arch"
      ./ --core ../${emu}/${emu}_libretro_${arch}.so --system ~/.config/retroarch/system/ \
        --input ${ROMDIR[${emu}]} --output ${OUTPUT}/${emu}-${arch} --threads=`nproc` --frames=${FRAMES} \
        --driver "qemu-${arch} -L ${SYSROOTS[${arch}]} ./miniretro.${arch}"

    ./ compare --results ${OUTPUT}/${emu}-* --output ${emu}-report.html

This will produce a comparison report for every emulator, where you can see a screenshot (plus some info) for each rom and each platform. It will compare any different screenshot and report it in a red background. This is very useful to compare devices but can be also be used to compare across versions of the emulator (say, on a new commit or PR).

Reports: HTML and video!

It is also possible to produce video output directly using miniretro (if ffmpeg is installed in the system). It will output BMP and RAW PCM frames to ffmpeg which will encode them as video and audio tracks. Unfortunately there's a bug in ffmpeg which prevents using two pipes to feed data and we are forced to produce separate streams (which can then later be muxed easily without reencoding).

I ran the above example with gpsp and arm, mips and x86 (+interpreter on x64) and got the following video (after a bit of editing with ffmpeg :P)

Likewise I ran a full test for picodrive, running 32x games:

Bonus content: dualretro

Tooling is better than debugging for sure. I'd rather spend 1h creating some tool rather than manually debugging something and digging too deep. Perhaps it's part of becoming an adult :) While doing a small code change I found some weird bug in gpsp that didn't make any sense. Instead of debugging it, I tried to compare the emulator before and after my changes. Usually this involves finding the smallest change that triggers the bug and then, compare them. Thanks to miniretro I managed to found a couple of games that would trigger the error and a small test case that caused it.

Next step was to create a tool that allows us to compare emulator cores: let me introduce you to dualretro. This simple tool takes two cores and a rom and runs them side by side in lockstep. On each frame it will create a savestate and compare them, letting you know when it found a difference. This allowed me to find the bug in a matter of minutes.

Diagram of Dualretro running under Qemu

This concept can be generalized to compare frames, memory regions, audio, etc. libretro's API provides a lot of information in a standarized way that can be used.