Crafting Interpreters: Getting started with exercises

ยท

6 min read

Table of contents

Hooray, chapter one is down.

In this chapter, Robert explains reasons for learning PL(programming language) design and even tells us what to expect later in the book, i.e. the organization and the content. One of the things to expect is exercises at the end of the chapters to help the reader better understand the concepts explained.

The exercise at the end of chapter one is meant to give us a glimpse of using C, and specifically pointers ๐Ÿ˜…๐Ÿ˜…. The task is to define a doubly linked list of heap-allocated strings and functions to insert, find and delete items from it.

You can get a doubly linked list implementation in C from various sites. What I would like to highlight are the tools that helped me deal with the memory management and debugging.

GDB

Before I got things running correctly, I ran into couple of segmentation fault errors. This is where gdb(GNU Project debugger) comes in. Gdb is a great debugger for C/C++ code, it helps in setting breakpoints and stepping through the code. There are couple of tutorials on how to use gdb from various sites e.g. this.

For my case, I had not initialized the list head and tail nodes to NULL when initializing the list. So when inserting items into the list, and trying to check if the next node is not null to know where to insert the new node, I ran into the segmentation fault(segfault) error. Pointers just showing who is the boss ๐Ÿฅฒ. Segfault always happens when a program tries accessing memory that it should not. Stepping through the code with GDB showed where I got the segfault, so I was able to solve it. The output from gdb when looking for the segfault source location was this:

Program received signal SIGSEGV, Segmentation fault.
0x00005555555552ca in insert_str_node (list=0x7fffffffdb00, new_content=0x555555556007 "Mojo World") at ex1/dllist.h:28

If you look closely at the output above, you see gdb showed the line number and the name of the file of the location producing the segfault, i.e. file ex1/dlist.h and line number 28. When we look at that line in the file we get this code:

str_node_t *curr = list->head; //Line 27
while(curr->next != NULL){ // Line 28 <- the error line
    curr = curr->next;
}

As discussed above, we get the segfault error because the head node has not be initialised, thus the next node pointer is some random address in the system which my system detected did not belong my program.

Valgrind

After the segmentation fault errors, I wanted to make sure there are no memory leaks, since C has no garbage collection thus memory management is done manually (If only we had smart pointers ๐Ÿ˜ฉ) . Valgrind has served me well. Valgrind gives a report of memory management of the program that can be detailed, depending with the options you pass while calling it.

For this case, I used the following command to get memory report of the program:

valgrind --leak-check=full \
         --show-leak-kinds=all \
         --track-origins=yes \
         ./bins/ex1

I included the options --leak-check=full and --show-leak-kinds=all to get the full report on all types of memory leaks; --track-origins=yes option will help in tracking the origin of uninitialized data being used. So before freeing any memory we see this output from the program.

==23733== HEAP SUMMARY:
==23733==     in use at exit: 200 bytes in 11 blocks
==23733==   total heap usage: 12 allocs, 1 frees, 1,224 bytes allocated
==23733== 
==23733== 6 bytes in 1 blocks are indirectly lost in loss record 1 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x109248: insert_str_node (dllist.h:19)
==23733==    by 0x109531: main (main.c:10)
==23733== 
==23733== 11 bytes in 1 blocks are indirectly lost in loss record 2 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x109248: insert_str_node (dllist.h:19)
==23733==    by 0x10951E: main (main.c:9)
==23733== 
==23733== 11 bytes in 1 blocks are indirectly lost in loss record 3 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x109248: insert_str_node (dllist.h:19)
==23733==    by 0x10956A: main (main.c:13)
==23733== 
==23733== 13 bytes in 1 blocks are indirectly lost in loss record 4 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x109248: insert_str_node (dllist.h:19)
==23733==    by 0x109557: main (main.c:12)
==23733== 
==23733== 23 bytes in 1 blocks are indirectly lost in loss record 5 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x109248: insert_str_node (dllist.h:19)
==23733==    by 0x109544: main (main.c:11)
==23733== 
==23733== 24 bytes in 1 blocks are indirectly lost in loss record 6 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x10922C: insert_str_node (dllist.h:17)
==23733==    by 0x10951E: main (main.c:9)
==23733== 
==23733== 24 bytes in 1 blocks are indirectly lost in loss record 7 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x10922C: insert_str_node (dllist.h:17)
==23733==    by 0x109531: main (main.c:10)
==23733== 
==23733== 24 bytes in 1 blocks are indirectly lost in loss record 8 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x10922C: insert_str_node (dllist.h:17)
==23733==    by 0x109544: main (main.c:11)
==23733== 
==23733== 24 bytes in 1 blocks are indirectly lost in loss record 9 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x10922C: insert_str_node (dllist.h:17)
==23733==    by 0x109557: main (main.c:12)
==23733== 
==23733== 24 bytes in 1 blocks are indirectly lost in loss record 10 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x10922C: insert_str_node (dllist.h:17)
==23733==    by 0x10956A: main (main.c:13)
==23733== 
==23733== 200 (16 direct, 184 indirect) bytes in 1 blocks are definitely lost in loss record 11 of 11
==23733==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23733==    by 0x1094F0: main (main.c:6)
==23733== 
==23733== LEAK SUMMARY:
==23733==    definitely lost: 16 bytes in 1 blocks
==23733==    indirectly lost: 184 bytes in 10 blocks
==23733==      possibly lost: 0 bytes in 0 blocks
==23733==    still reachable: 0 bytes in 0 blocks
==23733==         suppressed: 0 bytes in 0 blocks

With first two options provided to valgrind, we are able to see where we are allocating memory and leaking it. Leaking memory just means not deleting/freeing memory that your program has allocated, thus the computer's available memory reduces. Without the options we just have the LEAK SUMMARY section without the prior details e.g. where memory is allocated. If we observe the details we see that three line numbers seem to be the culprits of memory allocation that is not released i.e. from file dlist.h, lines 17 and 19; and from main.c, line 6. Looking at the code located at the mentioned lines:

//dlist.h
str_node_t *node = (struct StringNode*)malloc(sizeof(struct StringNode)); // Line 17
node->next, node->prev = NULL, NULL;
node->content = (char*)malloc(sizeof(char) * str_len); // Line 19
//main.c
str_list_t *mylist = malloc(sizeof(struct StringList)); //6
mylist->head = NULL;
mylist->tail = NULL;

Interesting ๐Ÿค”, for all the lines, we see that memory allocation is happening from the heap. For the lines in dlist.h, they are allocating memory to the nodes in the list; in main.c the program is allocating memory to the list. So this makes us know what to free when our program about to exit. From the report we see that valgrind correctly labelled the memory in the heap summary.

So we just need to clean all the nodes we allocated and the list itself. The function I used was this:

void free_mem(str_list_t *list) {
    if(list != NULL) {
        str_node_t * node = list->head;
        if(node != NULL) {
            while(node->next != NULL) {
                free(node->content);
                node = node->next;
                free(node->prev);
            }
            free(node->content);
            free(node);
            free(list);
        }
    }
}

So the output from valgrind report would look like this:

==25560== HEAP SUMMARY:
==25560==     in use at exit: 0 bytes in 0 blocks
==25560==   total heap usage: 12 allocs, 12 frees, 1,224 bytes allocated
==25560== 
==25560== All heap blocks were freed -- no leaks are possible

Hooray ๐ŸŽ‰๐ŸŽ‰, no memory leaks. I have left out the report on uninitialized values, because I am still sorting it out and trying to see its implication on the program.

Well, this was an exciting exercise and it even makes us appreciate languages like Python, Go, Java, which have garbage collection ๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚.

The code can be found here.

Kwaheri.