SuperFW: An open source Supercard firmware


Published: April 2025

Over the past year I've been working on and off on an open source firmware for the SuperCard (a cheap GBA flash cart). You can read a bit of background on this by checking this other article where I explain a give an overview on the Supercard, how patching works and whatnot.

Official website and documentation: superfw.davidgf.net

Github repo: github.com/davidgfnet/superfw

SuperFW overview

SuperFW is an open firmware that ships quite some features, the most interesting ones would be:

  • Nice GUI menus with multi-language support and proper-ish Unicode font rendering.
  • Modern filesystem support, including FAT16/32 and exFAT.
  • Automatic fast ROM patching for most commercial games.
  • Good savegame support, including RTC emulation.
  • In-game menu with a nice GUI and many features (including cheats and savestates!)
  • Native support for certain addons, like emulators.

The firmware also supports running in NDS mode (that is, using the cart to load and play NDS games), albeit in a limited way.

During the firmware development I tried to squeeze the Supercard as much as possible, to ensure we get the most features out of it. The hardware is still quite bad, but with SuperFW it gets some nice features only available in more expensive carts. The cart is quite simple, only having a Flash chip, an SRAM chip and an SD-RAM (acting as a ROM), which limit what's possible.

SuperFW info screen

How SuperFW works

The main firmware component is the menu that allows browsing for ROMs and launching them. This program runs on EWRAM/IWRAM to ensure we can easily operate on the ROM (SDRAM) as well as the SD card without having to worry too much about the memory mapping.

The Supercard has a magic register that allows mapping the Flash or the SDRAM (to the ROM bus) as well as disabling writes to the ROM (aka "read only bit"). It has an extra bit that enables the SD driver. This driver is mapped to the higher 16MB of the ROM space, so whenever it's enabled it is not possible to read/write data to the SDRAM higher bank. As with many other firmwares, most of the SD and filesystem logic is provided by the firmware, since the hardware is very bare-bones. On the plus side, this allowed support for SDHC cards which was unavailable on the original firmware.

Patching and savegames

SuperFW manages savegames (loading and storing them) as well as performs ROM patching, to enable ROMs to work correctly on Supercard. There are several types of patches:

  • WaitCNT patches (aka whitescreen patches) that prevent games from using faster ROM timings, which don't work on Supercard
  • Save patches which enable Flash/EEPROM based games to work (since Supercard lacks emulation for those save types).
  • IRQ handler patches, used to enable the "In-Game menu" (more on that later).
  • RTC patches: allow for some limited RTC emulation

Save patches contain the addresses of the SDK saving routines, so that SuperFW can patch them with the appropriate emulation routines. It offers SRAM-conversion (that is, using the built-in SRAM to emulate the Flash/EEPROM device) and DirectSaving (which directly saves data to SD).

SuperFW features a patching algorithm (called PatchEngine) that generates patches for any ROM. To make the user experience better, since the PatchEngine is quite slow, pre-generated patches for most commercial games are shipped in a built-in database. This database is generated off-device and features a much more mature and better patching mechanism, usually outperforming the PatchEngine. Unfortunately the database tends to fail with ROM hacks and other hacked/patched games.

The patch generation and database can be found in its github repo. I also played around with the pyodide framework and came up with an online patch generator for those who don't want to run the python scripts (which I totally understand!).

SuperFW architecture

SuperFW is meant to be flashed onto the Supercard internal flash, although it can also be chainloaded from another firmware. During its boot process, the firmware will:

  • Detect whether it's running on GBA or NDS mode.
  • Detect where it's running from (Flash vs SDRAM)
  • Unpack its main payload (aka. the "menu") to EWRAM.
  • Unpack other necessary assets (for instance the font pack) to the SDRAM.
  • Boot the main payload/menu

Once the firmware menu is booted, it will try to mount the SD card filesystem and access it. It can then load files such as setting files, any pending operations, etc.

SuperFW packed structure

The firmware file contains the aforementioned main payload (menu) and some other assets, usually compressed, along with an assembly-written bootloader. The menu contains the main app (code and data) but it also contains some other programs packed as data, for instance the In-game menu, the DLDI driver and the direct-saving driver.

The in-game menu

Since most games do not fully use the 32MiB of available SDRAM memory (aka. the ROM space) it is possible for us to use some of that space for an in-game menu. This menu can be triggered by pressing a key combo and the currently played game is interrupted to display this menu. The menu contains nice features such as cheats and savestates.

Main IGM menu

The trigger mechanism for this menu is based on intercepting the IRQ handler routine that games use. The GBA BIOS handles IRQs and passes execution on the routine pointed by the memory address at 0x03007FFC. SuperFW leverages its patches to redirect games to use the 0x03007FF4 address instead (which happens to be some unused memory area) and installs its own handler. This handler catches menu entrypoint (when the keycombo is pressed) as well as V-Blank interrupts, when cheats must be processed.

The menu is an independent payload (although it reuses a lot of code) that is linked and loaded onto the EWRAM/IWRAM. During gameplay, the menu is stored on the SDRAM, and whenever the key-combo is triggered, the menu is loaded and executed. Before the menu can be loaded, it is necessary to swap out the contents of the EWRAM/IWRAM/VRAM to the SDRAM and then load the menu. Therefore, using the In-game menu requires a few kilobytes of space (as swap area), so certain games might not be able to use it.

The DLDI driver

When booting in NDS mode, SuperFW will not attempt to display a menu like it does in GBA mode, but rather chain-load an .NDS file from the SD card. There are many interesting open source projects that support homebrew and commercial game loading on the NDS from a variety of carts, so there's no need to re-invent the wheel here (TWL++ and akmenu-next to name a few).

One nice thing in the NDS space, is the concept of an DLDI driver. This is a standard driver interface that allows homebrew to interface an SD card (or any other block-device for that matter) in a way that abstracts most of the underlying details to the homebrew app. This driver is a small relocatable payload that contains a certain header and code to perform read/write operations. During NDS boot, the /BOOT.NDS file is loaded to RAM and patched with our DLDI driver. This will be further reused by the launched app, in case it's also a launcher.

DLDI drivers should be relocatable, and they provide certain support for relocations within its header fields. In our case we leverage LTO linking to avoid generating GOT entries and simplify the DLDI loading a bit.

The Direct-Saving payload

SuperFW features a nice saving mechanism that allows games to directly save its game backup to the SD card. Unlike other firmwares, that usually patch games and convert them to use the Supercard SRAM chip, we patch the flash/eeprom SDK routines to read and write data from/to the SD card directly. For this, we need to patch the ROM but also inject some small payload responsible for this functionality.

The DirectSaving payload is quite similar to the DLDI driver, with the caveat that it is specifically written to work during in-game scenarios. This is an important difference, since a running game will usually perform DMA transfers, interrupts and other operations that can interfere with the SD access. The Supercard maps the SD card interface to the higher 16MB of ROM address space. This means that during SD read/write operations the upper 16MiB of ROM are not available and any access in that space could mess up the ongoing operation.

Many bootleg cartridges patch these routines to do similar tricks, but they usually disable IRQs to avoid all sorts of breakages. In our case we do not want to do that, to avoid breaking many games. Games such as Pokemon keep on playing music during the saving process, and it could be problematic to inconditionally disable interrupts. For this reason we use a clever trick.

How Direct-Saving works

To simplify Direct-Saving and allow it to work on most games, we do not include any kind of filesystem support in its payload. Any access is performed at a block-level, so when launching a game, we follow this process:

  • Check if the save file is contiguous, otherwise create a new contiguous file and copy its data over.
  • Load the game ROM into SDRAM. Load the Direct-Saving payload right after (or in a ROM "hole")
  • Patch the ROM Flash/EEPROM SDK functions with our Direct-Saving trampolines.
  • Patch the Direct-Saving payload with information such as the LBA address of the save file on the SD card.
  • Launch the ROM as usual.

During a read or write operation the payload function is executed and the SD card is accessed. During this operation, a full SD card block is read, by adding the base LBA address to the desired offset. In order to do this in a safe manner the SD driver performs low-level operations in a safe way:

  • A small routine template is loaded to the stack (~60 bytes)
  • The routine is patched with the desired operation (ie. load/store/load-multiple...)
  • Whenever the driver performs a low-level I/O port access, the code calls the routine on the stack.
  • The routine disables IRQs (using the global IME register)
  • The SD driver is remapped to the ROM address space
  • The access is performed (register read or write)
  • The SD driver is unmapped, IRQs are re-enabled and the routine returns.

Since the stack is always placed on RAM, we can be sure that any ROM address space remapping won't affect us. Even if the DirectSaving payload is unmapped (happens when the code is placed in the higher 16MiB area) the stack routine can be executed without issues. To ensure we do not get an IRQ during this time window (and risk bad code and/or other bad side effects happening) we disable IRQs. This is usually fine since they are only disabled for a few CPU cycles, which should not break most games.

Using the stack-based routine approach results in slow SD access, since every small operation (ie. send or receive a byte from the SD card) requires a handful of instructions (a couple dozen instructions aproximately). To speed up the accesses a bit we try to use LDM/STM instructions when appropriate, so that multiple bytes can be processed at a time.

Further reverse engineering

Some of the known limitations of the Supercard can be a bit frustrating and seem arbitrary. For this reason I always wanted to fully understand how the CPLD on the cart works. This is no easy task but could lead to some interesting information:

  • Is there a way to map the full 128KB of SRAM? Most boards ship 128KB or more!
  • Could we make the SDRAM accesses faster somehow?
  • What is going on with the internal flash address mapping?

For some of these questions we could just look at the board and run some tests, for other stuff it might be necessary to actually reverse the CPLD.

Odd flash mapping

Some of the initial weirdness that went unnoticed for a while, was that the internal flash is not mapped one to one as you would expect. I noticed that the flashing routines were using some weird addresses instead of the most usual JEDEC ones. To understand what was going on I did some soldering and probing and figured out that the address mapping is not linear but has some interesting mapping that looks like:

// Gamepak interface side
// A17 A16 A15 A14 A13 A12 A11 A10  A9  A8  A7  A6  A5  A4  A3  A2  A1  A0
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   \---|---|---|---|---|---|---;
//  |   |   |   |   |   |   |   |   |   |   |       |   |   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   \---|---\   |   |   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |       |   |   |   |   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   /---|---|---|---|---/   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |       |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   \---|---|---|---\   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |       |   |   |   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   /---|---|---/   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |       |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   /---|---|---|---/   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |       |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |   \---|---|---\   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |       |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   \---|---\   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |       |   |   |   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   /---|---|---/   |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |       |   |   |
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   /---|---|---/
//  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
// A17 A16 A15 A14 A13 A12 A11 A10  A9  A8  A7  A6  A5  A4  A3  A2  A1  A0
// Flash IC interface side

It doesn't matter what the mapping looks like most of the time, except for data flashing, since the flashing operations use some magic addresses to perform write and erase commands.

Full SRAM support

In my initial inspection I determined that the SRAM chip used was always at least 128KB, so the 64KB limitation seemed a bit odd. Upon PCB investigation, I could determine that the A16 pin was indeed connected to the CPLD. I assumed that the banking bit could be hidden in the Supercard control register (mapped at 0x9FFFFFE) so I tried to sweep all bits and see what happened. Funnily enough I was right and yet at the same time my test did not work (probably some bug, or I did not sweep the full range, can't recall).

In the end we determined that the bit number 2 that is responsible for the SRAM bank mapping. This bit was earlier identified as the read-enable bit for the SDRAM chip. Turns out it doubles as both. It took me a while to get here, but it was necessary to reverse the CPLD in order to understand how it all works.

CPLD reverse engineering

We attempted to dump the CPLD firmware for a while, which was tricky since many carts are protected (there's some bit to disable read-out). At some point I got lucky and one of my carts hadn't been properly protected so I was able to dump its bitstream. You can find it here. With this bitstream I tried to make sense out of it, but the software stack for this platform is very old (Windows 2000 kind of old!).

Thanks to Ben Crist, who's an expert in ispMACH CPLDs, we finally managed to reverse engineering the Supercard ispMACH4128. He has created a toolchain that allows reading and writing ispMACH bitstreams as well as fully reversed how the chips work internally. Thanks to lazr1026 I also managed to reverse the PCB board (which is 4 layers deep!) to understand the wire routing. You can find interesting data in these repos:

The contents of the CPLD are quite interesting, they are quite simple but contain the necessary logic (most of it already known) such as:

  • SD-RAM driving logic (CAS and RAS generation)
  • SD card input/output logic and clock logic
  • Magic register write control (including some newly found internal bits)
  • Flash address mapping (with some funny quirks)
  • SRAM banking mechanism
  • Address generation and increment for both SDRAM and Flash

There's almost no free logic left. We could delete some of the unnecessary stuff (ie. the weird flash mapping control bits) and gain some extra gates back. However it seems unlikely to be able to add more functionality. It could be possible to speed up SDRAM access but it might require a new oscillator, which is not ideal.

Conclusions and future work

While there's still some work to be done to further reverse and understand the CPLD, most of the goals for that have been accomplished. At this point it is a matter of seeing if it could be possible to improve these carts. If I can find any improvements, we could leverage Ben's toolchain to create a new bitstream without having to deal with the old Windows-only toolchain.

On the firmware side of things I also think that most of the job is done. If anything I will spend some time trying to add compatibility with other carts like Supercard Lite and any other flavour I can get my hands on (Supercard SD are by far the most popular ones). Most of the improvements that I could add have more to do with patching than the actual firmware itself. About this, I would like to create a compatibility list that can be somehow crowdsourced.