Blog Needs a Name


Debugging C With Cosmopolitan Libc

Cosmopolitan Libc provides a suite of debugging features that enhance the C development experience: function call tracing, gdb integration, an ASAN/UBSAN runtime, and more! A lot of fast and critical code is written in C – If you’re using software written in C, interfacing with C libraries, fixing bugs in C code, or even rewriting C software in some other language, it helps to understand what your C code is doing. Debugging isn’t just a diaspora of printf statements – in this blog post, we’ll look at how Cosmopolitan Libc helps with debugging C, true and properly, using this example repo.

An example program

Consider hex16.c: the user specifies a file at the command line, the program opens the file, prints (up to) the first 16 bytes as hexadecimal values, and then prints the bytes read as an ASCII string. You can find the entire code here.

#include <stdio.h>
#define NUM_CHARS 16

void hex16(const char *filename) {
    FILE *fp = fopen(filename, "r");
    char res[NUM_CHARS];
    int i, c;

    printf("hex16.c -- reading file %s\n", filename);
    for (i = 0; i < NUM_CHARS; i++) {
        c = fgetc(fp);
        if (feof(fp) || c == -1) break;
        printf("0x%02x ", c);
        res[i] = (char)c;
    }
    res[i] = '\0';
    fclose(fp);
    printf("\n%s\n", res);
}

int main(int argc, char **argv) {
    hex16(argv[1]);
    return 0;
}

Let’s build the above program with make and test it on a sample file:

make hex16.com
./hex16.com ./sample1.txt
# hex16.c -- reading file ./sample1.txt
# 0x63 0x6f 0x73 0x6d 0x6f 0x20 0x6c 0x69 0x62 0x63 0x0a 
# cosmo libc

But what happens if you don’t provide a file or provide a nonexistent file?

./hex16.com
# Segmentation fault
./hex16.com ./missing.txt
# Segmentation fault

Hmm, Segmentation fault is not quite informative – some part of the program is causing a crash, but where?

gdb and backtraces

With Cosmopolitan Libc, you can have gdb integration and detailed backtraces for your C program in just one line: add ShowCrashReports(); at the start of the main function. Let’s build hex16-backtrace.c which is just that:

make hex16-backtrace.com
# try with no arguments
./hex16-backtrace.com
# or try with a nonexistent file
./hex16-backtrace.com ./missing.txt

Now if you have gdb available on $PATH, you would get a TUI (terminal user interace) showing the register contents and the backtrace of the crash, like the image below:

a gdb tui showing register info and the location of the crash

Type bt and press Enter to view the backtrace in gdb, and exit by pressing Ctrl+D. You can set up a config for gdb from the Cosmopolitan Libc README here.

If gdb is not available, or if you’re running the program as part of a test script, you would get the backtrace in text, showing the register contents and the backtrace of the crash:

error: Uncaught SIGSEGV (SEGV_MAPERR) on X550LD pid 256891 tid 256891
  ./hex16-backtrace.com
  EUNKNOWN/0/No error information
  Linux #1 SMP Debian 5.10.106-1 (2022-03-17) X550LD 5.10.0-13-amd64

RAX 0000000000000000 RBX 00007ffe5c5c9a70 RDI 000000000042dd48 ST(0) 0.0
RCX 0000000000000000 RDX 0000000000000000 RSI 0000000000000240 ST(1) 0.0
RBP 00007000007fff80 RSP 00007000007fff60 RIP 000000000040b698 ST(2) 0.0
 R8 0000000000000000  R9 0000000000000001 R10 0000000000000004 ST(3) 0.0
R11 0000000000000293 R12 0000000000000000 R13 000000000042dd46 ST(4) 0.0
R14 00007ffe5c5c9a80 R15 00007ffe5c5c9b90 VF PF ZF IF

XMM0  00000000000000000000000000000000 XMM8  00000000000000000000000000000000
XMM1  7865682f656c706d6178652d6f6d736f XMM9  00000000000000000000000000000000
XMM2  632d67756265642f6f6d736f632f6666 XMM10 00000000000000000000000000000000
XMM3  7574732f6d6168747561672f656d6f68 XMM11 00000000000000000000000000000000
XMM4  6f632e65636172746b6361622d363178 XMM12 00000000000000000000000000000000
XMM5  65682f656c706d6178652d6f6d736f63 XMM13 00000000000000000000000000000000
XMM6  2d67756265642f6f6d736f632f666675 XMM14 00000000000000000000000000000000
XMM7  74732f6d6168747561672f656d6f682f XMM15 00000000000000000000000000000000

0x000000000040b697: fixpathname at /home/jart/cosmo/libc/stdio/fopen.c:26
 (inlined by) fopen at /home/jart/cosmo/libc/stdio/fopen.c:62
0x00000000004098d0: hex16 at /home/gautham/stuff/cosmo/debug-cosmo-example/hex16-backtrace.c:5
0x000000000040291e: main at /home/gautham/stuff/cosmo/debug-cosmo-example/hex16-backtrace.c:22
0x00000000004029cb: cosmo at /home/jart/cosmo/libc/runtime/cosmo.S:77
0x0000000000402503: _start at /home/jart/cosmo/libc/crt/crt.S:103

10008004-10008004 rw-pa-   1x automap 64kB w/ 7872kB hole
10008080-100080ff rw-pa- 128x automap 8192kB w/ 96tB hole
6fe00004-6fe00004 rw-paF   1x g_fds 64kB
70000000-7000007f rw-Sa- 128x stack 8192kB
# 16mB total mapped memory
./hex16-backtrace.com 

From the backtrace, you now know that not providing a file causes a crash involving fopen, and so update the program to check if the first parameter of fopen (ie argv[1]) is NULL or not. Note that backtrace generation depends on the hex16-backtrace.com.dbg file being located in the same directory as hex16-backtrace.com, as it contains necessary debugging information.

Just knowing the backtrace of a crash helps reduce the time needed to fix an error: when porting make to Cosmopolitan Libc, adding a ShowCrashReports showed that make had some large alloca calls which were causing a crash. The fix was to add a STATIC_STACK_SIZE call in the main function, and then make could build the entire Cosmopolitan Libc monorepo.

function call tracing with --ftrace

Sometimes a backtrace of the crash alone is not sufficient. Cosmopolitan Libc allows you to log every function call over the program’s execution – just pass --ftrace at the end of your program, like this:

./hex16.com ./missing.txt --ftrace
# truncated example
FUN 257012     1'567'864        48 &main
FUN 257012     1'573'031       112   &hex16
FUN 257012     1'578'356       160     &fopen
FUN 257012     1'612'346       336     &printf
FUN 257012     1'720'829       144     &fgetc
FUN 257012     1'726'479       176       &fgetc_unlocked
Segmentation Fault

The ftrace output gets printed to stderr by default; it is useful to redirect it to a file for later analysis. ftrace does not require ShowCrashReports() at the start of the main function, though it still requires the hex16.com.dbg for hex16.com. You can read more about how ftrace works here.

ftrace also provides insight into program performance. I wrote a Python wrapper for ftrace so I could examine how the CPython3.6 runtime behaves. For example, here’s how many calls it takes to add two values in Python3.6 built with Cosmopolitan Libc:

import cosmo
a = 1
b = 2
with cosmo.ftrace():
    c = a + b
&meth_dealloc
&PyFrame_BlockSetup 76
&object_dealloc 265
  &PyObject_Free 82
    &_PyObject_Free.isra.0 52
&PyDict_GetItem 333
  &lookdict_unicode_nodummy 157
&PyDict_GetItem 166
  &lookdict_unicode_nodummy 34
&PyNumber_Add 273
  &binary_op1 69
    &long_add 178
      &PyLong_FromLong 135
&PyDict_SetItem 273
  &insertdict 164
    &lookdict_unicode_nodummy 53
&PyFrame_BlockPop 320
&PyObject_CallFunctionObjArgs 249
  &object_vacall 40
    &_PyObject_FastCallDict 95
      &_PyCFunction_FastCallDict 53
        &_PyMethodDef_RawFastCallDict 86
          &_PyStack_AsTuple 61
            &PyTuple_New 121
          &FtracerObject_exit 199

The numbers on the right are an approximate measure of time each function takes, which is pretty useful for a “slow” language like Python3.61. ftrace helped write a custom sys.meta_path importer for Python3.6 in Cosmopolitan Libc – I just moved code from Python into C until the size of the ftrace logs stopped decreasing.

system call tracing with --strace

Sometimes tracing function calls is too much information, and you just want to narrow down the error region from some logging information. If your program uses a bunch of system calls, you can log them like functions, by passing --strace at the end of your program invocation, like this:

./hex16.com ./missing.txt --strace
SYS 257304  43'627 bell system five system call support 171 magnums loaded on gnu/systemd
SYS 257304  99'854 mmap(0x700000000000, 8'388'608, PROT_READ|PROT_WRITE, MAP_STACK|MAP_ANONYMOUS, -1, 0) → 0x700000000000 (8'388'608 bytes total)
SYS 257304 892'097 getenv("TERM") → "xterm-256color"
SYS 257304 908'170 openat(AT_FDCWD, "./missing.txt", 0, 0) → -1 errno= 2
SYS 257304 932'876 write(1, u"hex16.c -- reading file ./missing.txt◙", 38) → 38 errno= 2

Unlike ShowCrashReports and ftrace, the --strace flag does not require the hex16.com.dbg file to be present. Now you see that the openat resulted in an -1 errno 2 if the file doesn’t exist, so let’s update the code to check the FILE * pointer before calling fgetc. Thus you get the fixed program hex16-fixed.c below:

#include <stdio.h>
#define NUM_CHARS 16

void hex16(const char *filename) {
    FILE *fp = fopen(filename, "r");
    char res[NUM_CHARS];
    int i, c;

    if (!fp) {
        printf("unable to read %s!\n", filename);
        return;
    }
    printf("hex16.c -- reading file %s\n", filename);
    for (i = 0; i < NUM_CHARS; i++) {
        c = fgetc(fp);
        if (feof(fp) || c == -1) break;
        printf("0x%02x ", c);
        res[i] = (char)c;
    }
    res[i] = '\0';
    fclose(fp);
    printf("\n%s\n", res);
}

int main(int argc, char **argv) {
    if (argc != 2) {
        printf("USAGE:\n\t%s FILENAME\n", argv[0]);
        return 1;
    }
    hex16(argv[1]);
    return 0;
}

ASAN/UBSAN

The two errors in hex16.com have been fixed, so it’s time to try another testcase to see if there are any more. What happens if the input file contains 16 or more characters?

cat ./sample2.txt
# cosmopolitan libc
./hex16.com ./sample2.txt
# hex16.c -- reading file ./sample2.txt
# 0x63 0x6f 0x73 0x6d 0x6f 0x70 0x6f 0x6c 0x69 0x74 0x61 0x6e 0x20 0x6c 0x69 0x62 
# cosmopolitan lib2
./hex16.com ./sample2.txt
# hex16.c -- reading file ./sample2.txt
# 0x63 0x6f 0x73 0x6d 0x6f 0x70 0x6f 0x6c 0x69 0x74 0x61 0x6e 0x20 0x6c 0x69 0x62 
# cosmopolitan lib3

hex16.com has inconsistent behavior with sample2.txt – sometimes it prints extra garbage characters. But there is a res[i] = '\0' after the loop to terminate the string, so what’s going on? Let’s build with the Cosmopolitan Libc’s ASAN/UBSAN runtime and find out:

make clean
make MODE=dbg hex16-backtrace.com
./hex16-backtrace.com ./sample2.txt
hex16-backtrace.c:16: ubsan error: 'int' index 16 into 'char [16]' out of bounds (tid 257624)
0x0000000000480c00: __ubsan_handle_out_of_bounds at /home/jart/cosmo/libc/intrin/ubsan.c:289
0x0000000000421658: hex16 at /home/gautham/stuff/cosmo/debug-cosmo-example/hex16-backtrace.c:16
0x0000000000402bf1: main at /home/gautham/stuff/cosmo/debug-cosmo-example/hex16-backtrace.c:22
0x0000000000402cfd: cosmo at /home/jart/cosmo/libc/runtime/cosmo.S:77
0x0000000000402543: _start at /home/jart/cosmo/libc/crt/crt.S:103

Ah – when the file has more than 16 characters, the null terminator is written outside the buffer, causing a buffer overflow when the string is printed. Now you can change the buffer to have one extra character, to handle this case, leading to the now again-fixed hex16-ubsan.c.

Cosmopolitan’s ASAN/UBSAN runtime adds a magical improvement to your C development workflow2, for a tiny cost. Your build times increase only marginally, you avoid any new arguments with the compiler, you still write your tests as usual, and your binaries have almost the same performance. And of course, your binaries will now have runtime memory safety.

In case of emergency (or lack of patience)

Sometimes you find an annoying bug that occurs only during a specific sequence of events, and you just haven’t figured it out. It’s not related to ASAN/UBSAN – the runtime tells you some memory access is invalid, which is good to know, but you don’t know where it all started. The system calls are not shining light into the issue, because it isn’t related to how your program interacts with the OS. The function call sequences are too many, and mixed in due to multithreading, so it’s difficult for you to make sense of what’s going on. You start gdb and setup a dozen breakpoints before running the program, but it’s too stop-start, and you don’t have enough experience or patience to poke at the right spots. In frustration, you reach for the good old printf statements, but they don’t work either! You have no tools, because your tools haven’t set up enough for your tools, like that bug where Python hasn’t set up stdin because the encodings module hasn’t been loaded3 or the one where Rust’s thread-local storage makes some seemingly weird memory requests from the libc on startup4.

You’re backed into a corner. How do you debug this seemingly impervious heisenbug?

  1. You imagine yourself as an x86_64 chip (complete with power supply, RAM, and peripherals) and start rubberducking each assembly instruction to hapless passerby until the end of time or until the bug is found, whichever is earlier.

  2. You use Cosmopolitan Libc’s kprintf.

kprintf is the final tool in the Cosmopolitan Libc debugging arsenal. Look at the kprintf source code – It’s a lean implementation of printf designed to work everywhere. It’s what Cosmopolitan Libc uses when the other functions want to write logs, even after the program has crashed. So it has to work at all times. This is what happens if you try to print some weird memory location using kprintf:

kprintf("%s\n", 31415);
//!!7ab7

You can even clone the Cosmopolitan Libc monorepo, add kprintf calls in Cosmopolitan Libc’s internal functions, and test your code with your own debug-customized libc! When all else fails, kprintf can get the job done, provided you use it judiciously to print the value of every variable after every statement in every function that your program has to call.

bell curve meme with printf and kprintf

Closing Notes

Nobody writes perfect code in one try. Many eyes make bugs scarce. With Cosmopolitan Libc, debugging becomes a game of Scotland Yard, instead of blind-man’s buff – instead of stumbling around and stubbing your toes, you have five different angles to understand what your program is doing and determine where it starts going off the rails:

Underneath all the libraries, syntax, abstractions, and optimizations, we still need to have some idea of what’s going on. Writing programs is a lot more fun when we can understand what the computer is actually doing. Detailed feedback helps understand programs better, and quick feedback helps develop programs faster. Cosmopolitan Libc provides a set of debugging tools we can use to enrich our understanding of the programs we write.


  1. Python is undergoing a lot of performance improvements nowadays – On 2022-08-05, CPython released 3.11.0rc1, and I ported that to Cosmopolitan Libc here. The 3.11 port doesn’t have ftrace or the other goodies yet; it’ll be interesting to see how ftrace gives insight into CPython’s new “JIT-like” optimization tricks. ↩︎

  2. A beautiful thing happens after a while developing with MODE=dbg: you’re still as fast as before, but now you have a sixth sense, an intuition, or a discipline of programming in C. Because the safety of UBSAN enforces only lenient bounds, you start writing UBSAN-tolerant code, but you also gain the confidence to gamble with performance tricks, knowing that UBSAN will save you if things go awry. I had never written a JSON parser before, but I wrote one in C, with NaN-boxing and pointer manipulation, in a few afternoons. It passes around 90% of the most comprehensive JSON Test suite, on MODE=dbg too. ↩︎

  3. This was a nasty bug that happened if you didn’t provide a proper folder location for Python 3 to import its standard library. Here’s my limited recall: Python 3 is Unicode-by-default. But it’s not really Unicode when it starts up, it’s still mostly ASCII-ish. Python does some locale checking before attempting to import encodings.py, and then sets up almost everything else. So if encodings.py is not found, Python gets stuck in this limbo and aborts. I could not connect the error messages to encodings.py, so I just kprintf’d every call in the import sequence until I found the fix. ↩︎

  4. Don’t panic! I said seemingly. There wasn’t anything weird with how Rust initialized thread-local storage, because the Rust examples I wrote still worked on glibc. We eventually narrowed the bug down to a pointer initialization in pthread_create in Cosmopolitan Libc, and the fix was a single line. Cosmopolitan Libc does have pthreads support now, and it can run multithreaded Python 3.11 scripts and Rust executables↩︎