Brain Dump

C

Tags
language

Is a low level programming language and the standard for serious systems programming due to most kernels exposing their API through C.

Table 1: Complementary programs to assist with C development (some may require building with -g).
ProgramExample
gdbAn interactive debugger with support for breakpoints and introspection
valgrindA suite of tools for debugging and profiling (eg: memory leak detection)
ltraceA program to monitor and output any system/library calls in a program
straceLike ltrace but can also modify the program to assist in debugging

Language

Pre-processor

The pre-processor is a program that processes c-code before the compiler and linker can get to it. This is used to add macro evaluations and dynamic extensions to the language.

#define

Allows you to define a method or symbol that's substituted verbatim when called in c.

/* Defining and using a symbol. */
#define MAX_LENGTH 10
char buffer[MAX_LENGTH]

/* Creating a function that's really an expression */
#define min(a,b) a < b ? a : b
void foo() {
    int x = 4;
    if (min(x++, 5))
	printf("%d is six", x);
}

Warn: The c pre-processor performs simple symbol substitution. If you tried to use the min function defined as above it would be expanded as x++ < 5 ? x++ : 5. There's two calls to x++ so x is incremented twice and results in 6. With a regular function the arguments would be evaluated and then replaced leading to only one increment but due to being a macro it happens twice.

#include

The include directive lets you include the contents of other files at the current point. This comes in 2 varieties differentiated by delimiters. A file name delimited by angle brackets <stdio.h> references a system library. Quotes "foo.h" are used to look from the local directory and then the library if a local file is not found.

Note: The .h suffix is often omitted when referencing c++ libraries by convention. For example #include <iostream> includes the iostream c++ library but to include stdio.h you must suffix it with .h because its a c library.

#include <stdio.h>
#include "./my-file.h"

The include directive is extremely basic. For one it doesn't avoid repeatedly including the same file more than once, which could happen because of transitive dependencies between includes. To get around this most header files are wrapped in header-guards. These are conditionals that assert whether an identifier (based on the filename) is defined and only expand the contents if its defined.

Warn: This include mechanism is part of why compiling large c projects is so slow. Each include directive is replaced in place by the contents of the file and the nested ifndef etc. are evaluated after substitution leading to drains in compilation speed as the number of dependencies increases.

#ifndef FOO_H
#define FOO_H

void foo();

#endif /* FOO_H */
  • CONVENTION Include Dependencies

    You may end up in a situation where a function in your header file has a dependency on a struct defined in another file (such as, as a parameter). The classic way to solve this is to include the dependency into the header file.

    The convention regarding includes is:

    • If your dependency is required by the header file, place it in the header file AND in the .c file.
    • If your dependency is required by the .c file include it in the .c file.

TODO Conditionals

Functions

c is the standard through which java and many other derived languages define many syntax constructs. This includes methods. A c method is prefixed by its return type then name, then parameter list and lastly body.

int main(void);

int main(void) {
    /* Body of the main function */
}

In the previous example you may notice that we've define main twice. The first time we declared that a function called main that returns int and takes no arguments exists. In the second call we defined the procedure for that function giving it a concrete implementation.

This separation between declaration and definition is related to the include and linking mechanism of the c language, with declarations being packed into header files and included by other .c source-code files upon which they depend; for example you may have a linked list struct/class that's needed by a few files but those files only need the interface/API of that struct so they can compile properly (ensure the available fields are referenced, the proper procedures are called, etc.). They don't need to include and compile the definitions of those procedures, only the guarantee that they exist and can be called at runtime with specific types. C source files are compiled into an intermediate object files and then linked together. Its the linking stage at which point we ensure everything that was declared (or called) has an implementation; which could come from c or a library or some other object representation. This independence between what something is written in and how it can be called is part of the longevity of the c-language.

Note: The void parameter shown above is kind of an outlier. It indicates that this function takes no arguments. Omitting the void (example: void main()) is interpreted as this function can take any number of parameters of unknown types. Warn: In c++ omitting the parameter list is seen the as the same as saying it takes no arguments.

Inlined

Is a way to make the c compiler replace function calls with a series of expression equivalent to calling the function but without the overhead of an extra stack-frame and other consequences of a function call.

inline int max(int a, int b) {
    return a < b ? a : b;
}

Now any reference to max(x, y) is replaced with x < y ? x : y in the same function. This is much like the prior macro level max function except the arguments are only evaluated once. Therefore max(x++, y) is equivalent to x = x + 1; x < y ? x : y.

Note: You generally shouldn't do this manually, the compiler is usually smart enough to detect when this is necessary and do so automatically.

To inline a function, it's definition must be available at compile time. This is why most inline funtions are both declared and defined in header files, and why some devs prefer header only libraries.

Variadic Arguments

C has tangential support for variadic arguments through the stdarg.h library. After including this library you can set a parameter for a function to ... and can then supply an arbitrary number of arguments.

double average(int count, ...) {
    va_list ap;
    int j;
    double sum = 0;

    va_start(ap, count); /* Requires the last fixed parameter (to get the address) */
    for (j = 0; j < count; j++) {
	sum += va_arg(ap, int); /* Increments ap to the next argument. */
    }
    va_end(ap);

    return sum / count;
}

Function Pointers

Are pointers to functions. They can be passed as arguments/parameters and called as variables.

// A pointer, fp, to a function returning an int and taking an int argument.
int (* fp)(int);

With a function pointer, we can call the pointer to call the pointed to function.

void foo(const char *str) { printf("Foo: %s\n", str); }

int main() {
    void (*my_func)(const char *);
    // Assign our function pointer to a value.
    my_func = foo;
    // Call our function through the pointer.
    my_func("Hello"); // => Foo: hello
}

Strings

Strings in C differ from a lot of modern languages in that their NULL-terminated instead of length-prefixed. In practice this means to store a 5 character string "Hello" you have to allocate 6 bytes (including the trailing null).

Whenever you define a string literal (of the form char *str = "constant") the string is stored in the data section of the executable and is read-only. This means in most cases when you define the same string literal more than once, it'll actually point to the same underlying address in memory.

const char *foo = "hello world";
const char *bar = "hello world";
printf("%p == %p", foo, bar); // 0x55c8ceab3004 == 0x55c8ceab3004

You can also declare strings to be in the writable data segment or stack by using character-arrays. Thees contain literal values that have been copied from the code segment into either the stack or static memory.

char foo[] = "hello world";
char bar[] = "hello world";
printf("%p == %p", foo, bar); // 0x7ffee2fb8e68 == 0x7ffee2fb8e58

Conditionals

c has basic if/else if/else constructs and more expressive switch/case statements.

if (SOME_CONDITION) {
    /* Some logic/expression */
} else if (ANOTHER_CONDITION) {
    /* Some other logic/expression */
} else {
    /* Fallback logic for when nothing else matched */
}

Switch cases take a single argument and then compare it against a bunch of cases running the body of the matching case. With c-style switches if the body doesn't end with a break statement control seeps down to the next case and so on and so forth until it is broken or the expression ends.

switch (1) {
case 1:
    puts("1");
    // Controls falls through to next case because no break
case 2:
    puts("2");
    break;
default:
    puts("fallthrough");
    break;
}

Note: The fall-through feature of case statements while often chided can result in some highly elegant algorithms. Case and point Duff's device.

Loops

c has for/while/do-while loops.

The for loop follows the same style you're most likely accustomed to with the body containing (one or more initialisation terms; a loop invariant; an update condition).

for (int i = 0; i < 10; i++) {
    // Do 10 times.
}

Note: Each of the sections of the loop can be omitted. When a condition is omitted the loop just runs forever (much like while (true)) until something in the body terminates it.

for (;;) {
    // Done until we break.
}

While loops run while the condition in the parentheses is true. This is equivalent to our previous for loop example, combining the increment and condition steps into one expression.

int i = 0;
while (i++ < 10) {
    // Also done 10 times.
}

A do-while loop is identical to a while loop except it always runs the body at least once. You can think of it as running the body once and then a while loop with the same body and condition. This is convenient when the cause of termination for the loop is part of the body of the loop. For example:

int i;
do {
    i = 10;
    // Done once because the condition is already satisfied
    // but the do-while loop is always run at least once.
} while (i++ < 10)

The break keyword lets you terminate out of the inner-most loop immediately.

while (1) {
    while (2) {
	break; /* Breaks out of while(2) */
    } /* Jumps here */
    break; /* Breaks out of while(1) */
} /* Continues here */

continue is similar to break except instead of ending the loop, it jumps forward to the next iteration of it. For example the following prints all the odd numbers between 0 and 10 inclusive.

for (int i = 0; i <= 10; i++) {
    if (i % 2 == 0)
	// Skip all the even numbers.
	continue;
    printf("%d\n", i);
}

Type Modifiers

While c lacks the same level of control as c++ and java (public, private, protected, etc.) it does have a few modifiers which control how we can interact with a variable or argument.

const

The const modifier is a language level construct that tells the compiler that this data should remain constant. Attempts to modify a constant variable leads to a compiler crash.

The semantics of const differ depending on whether it comes before or after the type and whether the type is a pointer or not.

const int i = 0; // Same as "int const i = 0"
char *str = ...; // Mutable pointer to a mutable string
const char *const_str = ...; // Mutable pointer to a constant string
char const *const_str2 = ...; // Same as above
const char *const const_ptr_str = ...; // Constant pointer to a constant string

A mutable pointer is a pointer that can be made to point somewhere else, however if it points to something constant (example: an int) that data cannot be changed. The general idea is if const precedes * then the pointer is mutable but the value isn't and if const comes after * the value is mutable but the pointer isn't. Except when const both precedes and follows * in which case it's both.

Note: When you define a constant pointer you must assign it, because it can't be changed later.

It might help to visualise the meaning here:

int a = 5, b = 10, c = 15;

/* ---------------------------
|  pointer to constant int.  |
--------------------------- */
const int* foo = &a;

// *foo = 6;        // the value of a canĀ“t get changed through the pointer.
foo = &b;           // the pointer foo can be changed (point somewhere else).

/* ---------------------------
|  constant pointer to int.  |
--------------------------- */
int *const bar = &c;

*bar = 16;            // the value of c can be changed through the pointer.
// bar = &a;          // not possible because bar is a constant pointer.

extern

Is a keyword telling the compiler that a variable is defined in another object file or a library. Use this when the variable is needed but not exposed through a header file or some other mechanism.

// file1.c
extern int panic;

void foo() {
    printf("%d\n", panic);
}

// file2.c
int panic = 1;

restrict

Hints to the compiler that for the lifetime of the pointer only the pointer itself or a value derived directly from it (example: pointer + 1) will be used to access the object to which it points. Essentially it declares that this particular memory region shouldn't overlap with with all other pointer memory regions. Essentially its used to tell users of the program that it is undefined behaviour if the memory regions overlap.

An example of this is memcpy:

void* memcpy(void * restrict dest, const void * restrict src, size_t bytes);

signed/unsigned

Changes the default behaviour of numbers as signed or unsigned. By default numbers are signed, and you must use unsigned to declare them as unsigned, but it may be useful in cases where you want the compiler to default to a signed type such as below.

Note: unsigned only applies to integers like primitives such as int or long.

int count_bits_and_sign(signed representation) {
    // ...
}

static

Has different meanings depending on whether its used on a variable within a function or in variable or function in the global scope. In the former case it allocates the variable with static runtime, meaning its value is allocated once at program startup and its lifetime persists across method calls and until the end of the program. In the later class it means that the scope of the variable or function is limited to the current file.

// visible to this file only
static int i = 0;

static int _perform_calculation(void) {
    // ...
}

char *print_time(void) {
    // Shared every time a function is called
    static char buffer[200];
    // ...
}

volatile

Prevents the compiler from optimising out a value.

For example in the following function the compiler may optimise out the while loop to while (1) because the while loop has nothing to do with the flag, even though the function may modify it.

int flag = 1;
pass_flag(&flag);
while (flag) {
    // Do things unrelated to flag
}

Declaring volatile int flag = 1; forces the compiler to keep the variable in and perform the check. This is useful for multi-process or multi-threaded programs.

Enumerations

Is a type that can take on many finite values. Use of an enumeration brings compile time safety when referencing values of that enumeration (such as case statements).

enum day {
    monday,
    tuesday,
    wednesday,
    thursday,
    friday,
    saturday,
    sunday
};

You can explicitly assign values to enum members (in fact it's not advisable to rely on the compiler for consistent numbering). You can even assign enum values to either be different or the same.

enum day {
    monday = 0,
    tuesday = 0,
    wednesday = 0,
    thursday = 1,
    friday = 10,
    saturday = 10,
    sunday = 0
};

You can assign reference enum types like structs by explicitly prefixing the enum name with enum.

enum day someDay = monday;

Unions

Is a construct similar in concept to a struct except, whereas a struct allocate enough memory to hold all the elements of the struct, a union allocates enough memory to hold the largest element of the union. At any point the union holds one valid value

A common use case for a union is specifying one or more optional types that might be available (alongside an enum to declare which one is currently in use) or to provide an alternative interface for the data in a struct.

typedef struct {
    union {
	struct {
	    double x;
	    double y;
	    double z;
	};
	double raw[3];
    };
} vec3d_t;

vec3d_t v;
v.x = 4.0;
v.raw[1] = 3.0; // Equivalent to v.y = 3.0
v.z = 2.0;
union pixel {
    struct values {
	char red;
	char blue;
	char green;
	char alpha;
    } values;
    // Each character is 8 bytes, with 4 of them it
    // takes up 32 bytes which is equivalent to the
    // sizeof the values struct.
    uint32_t encoded;
};

union pixel a;

// When modifying or reading
a.values.red;
a.values.blue = 0x0;

// When writing to a file
fprintf(picture, "%d", a.encoded);

Pointers

Is a value that points to some address in memory. If you think of memory as a continuous tape of data then the idea of a pointer being a number (offset) from the start of that tape might be easier to rationalise.

In C we access the address of things using the & operator and we dereference a pointer to its pointed to value using the * operator.

int foo = 5;
int *foo_ptr = &foo; // Create a reference
int foo_val = *foo_ptr; // Dereference the value
*foo_ptr = 1; // Update through the pointer

Pointers in C are how we support pass-by-reference, with values being pass-by-value by default. At a high level what this means is that if you supply a value to a function its copied into memory for use by that function and modifying the value doesn't do anything to the original value. To allow functions to modify arguments they must be passed as pointers.

void pass_by_val(int val) {
    val = 2;
}

void pass_by_ptr(int *ptr) {
    // Dereference and then assign the value pointed to by ptr
    *ptr = 2;
}

int num = 1;
printf("%d\n", num); // 1
pass_by_val(num);
printf("%d\n", num); // 1
pass_by_ptr(&num);
printf("%d\n", num); // 2

Pointer Arithmetic

The type of a pointer doesn't affect its size in memory, instead it determines how much to increment a pointer by. Incrementing a pointer moves the address pointed to forward by a set number of bytes (the sizeof the type of the pointer).

For example you can iterate through the characters of a string by using moving the pointer through until you reach the null character. This approach is so commonly used that it's the basis for terminating C strings with null. Otherwise developers would always have to know and supply the length of the string (as many library functions do with arrays).

char *ptr = "hello";
for (int i = 0; *ptr != '\0'; ptr++, i++) {
    printf("char %d: %c\n", i, *ptr);
}

// char 0: h
// char 1: e
// char 2: l
// char 3: l
// char 4: o

Void Pointers

A special case in the use of pointers is the void*. This is a pointer with an unassociated type that can be type-cast to and from any other pointer type.

C will automatically promote void* to its appropriate type. This is the return type of the malloc function and is why you can assign the result of malloc to any pointer type (see memory allocation).

Null Pointers

Is a pointer that points nowhere. In C this is defined as NULL.

Arrays

Arrays are a syntax wrapper around pointers. They represented a contiguous array of elements in memory all of the same type. Internally an array is actually a pointer to the first element of the array and provides syntax sugar for accessing elements at different indexes.

Note: The syntax arr[x] is equivalent to *(ptr+x).

int arr[] = {1, 2, 3};
int *ptr = arr;
printf("%p = %p\n", arr, ptr); // 0x7ffed20c9498 = 0x7ffed20c9498
printf("%d = %d\n", arr[0], *ptr);     //   1 = 1
printf("%d = %d\n", arr[1], *(ptr+1)); //   2 = 2
printf("%d = %d\n", arr[2], *(ptr+2)); //   3 = 3

Note: Due to their nature as pointers, arrays automatically decay down to pointers. In the example above we assign ptr to arr without any issue.

Sizeof Arrays

The sizeof an array is the size of all the elements taken up by the array.

#define SIZE 10
int* ip = malloc(sizeof(int) * SIZE);
int ia[SIZE];

sizeof(ip); // 8 bytes (64 bit pointer, this is always hardware dependent)
sizeof(ia); // 40 bytes (sum of the sizes of each element in array (4 bytes per int * 10 ints))

This gives us a rudimentary approach for counting the number of elements in an array. Since each element has a fixed size and we can calculate the size of all elements, we simply divide one by the other.

int size = sizeof(ia) / sizeof(ia[0]);

Array Reassignment

Another distinction between arrays and pointers is that arrays cannot be reassigned or made to point elsewhere like pointers. Once an array is declared it must always point to the same location.

int arr1[] = {1, 2, 3};
int arr2[] = {1, 2, 3};
int *ptr = arr1;

arr1 = arr2; // Compiler error array type 'int [3]' is not assignable
ptr = arr2;

This means that pointers are more flexible than arrays, even though the name of an array is a pointer to its starting address.

Structs

Are the construct used to pair multiple types together into a new structure.

In c structs are contiguous regions of memory that can access each specific element of each memory as if they were separate variables.

struct hostname {
    const char *port;
    const char *name;
    const char *resource;
};

You declare a variable of a struct type by explicitly specifying its type and then assigning each struct member separately. Newer versions of c allow static initialisation of each field alongside the variable declaration.

// Assign each individually
struct hostname facebook;
facebook.port = "80";
facebook.name = "www.google.com";
facebook.resource = "/";

// You can use static initialization in later versions of c
struct hostname google = {"80", "www.google.com", "/"};

Padding

While structs are all setup to take up a contiguous region of memory, it might not be possible to fit two elements one before the other such that the element ends as soon as the next element starts. Essentially there might be padding between elements to ensure each variable is memory aligned (starts at a memory address that's a multiple of the word size).

Sizeof

Is an operator, evaluated at compile-time, that returns the number of bytes that an expression contains. This could calculate the size of a struct or a builtin.

char a = 0;
printf("%zu", sizeof(a++));

This keyword requires the complete definition of the type at compile-time, not link-time, because otherwise the preprocessor which runs at compile-time can't replace the size of the expression. This is commonly why complete struct definitions are placed into header-files.

Labels & Goto

Is a keyword allowing non-conditional jumps. First you have to set an assembly style label (Example: label:) and then at some point in the procedure you can jump forward or backwards to a specific point.

void setup(void) {
    Doe *deer;
    Ray *drop;
    Mi *myself;

    if (!setupdoe(deer)) {
	goto finish;
    }
    if (!setupdrop(drop)) {
	goto cleanupdoe;
    }
    if (!setupmi(myself)) {
	goto cleanupray;
    }

    perform_action(deer, drop, myself);

cleanupray:
    cleanup(drop);
cleanupdoe:
    cleanup(deer);
finish:
    return;
}

Type Aliases

Is a way to declare an alias for a type.

For example you can refer to a float as real.

typedef float real;
real gravity = 10;

This is most commonly used with structs or enums to avoid having to repeatedly prefix the type with struct or enum.

typedef struct link_t {
    char* url;
    int port;
} link_t;

// You can use either the style with struct or without.
struct link_t lk1;
link_t lk2;

Note: You don't need to specify a name for the struct. Instead you could typedef an anonymous struct to make sure it can only be accessed without prefixing it with struct.

typedef struct {
    char* url;
    int port;
} link_t;

link_t lk;

Operators

Table 2: List of available operators in order of precedence.
OperatorNameDescription
[]SubscriptRetrieves the nth element of an array a[n] == (a + n)*
->Structure DereferenceAcess a struct member x through a pointer *p as p->x
.Structure ReferenceAccess the member x of object a as a.x
+a, -aUnary Plus, Unary MinusKeep or negate the sign of the integer or float of a
*pDereferenceAccess the element located at the value pointed to by p
&aAddress-OfReturn the address of the element a
++IncrementCan be used postfix or prefix to increment a variable by 1
--DecrementSame as the increment operator except decrements the variable
sizeofSizeofFetch the size-of a type or array
+, -, *~, %, /Arithmetic BinaryAdd, subtract, multiple, mod, divide an element by another
<<, >>Bit ShiftLeft or right shift the bits in an integer by n bits
>, <Greater/Less Than
==, !=Equal To, Not Equal To
&&Logical AndTrue if both the left and right operands are true
| |Logical OrTrue if either the left or right operands are true
!Unary Logical NegationInverts the boolean expression to its right
&Bitwise ANDIf a bit is set in both the operands its set in the output
|Bitwise ORIf a bit is set in either of the operands its set in the output
~Unary Bitwise NegationIf a bit is set in the input its not set in the output
x ? y : zTernaryIf x then return y else return z
a, b, ..., zCommaEvaluates a, then b, etc. and returns the last entry z

Note: bit shift handles signs uniquely. When left shifting there will always be zeros introduced on the right. If the operand on the left is signed and negative then right shifting will introduce ones on the left. Otherwise in all cases zeroes are introduced.

Sequence Points

Are points in code that the spec guarantees that all side effects of previous evaluations have been performed and no side-effects from a subsequent evaluation has happened yet.

This is relevent in expressions like foo(x++, x-- - ++y) where the order of evaluation is ambiguous.

TODO: describe.

Data Types

Table 3: A list of the standard C-types and their expected size. Note: An alignment of x means the address of the data-type must be divisible by x.
Type[Minimum] Guaranteed SizeAlignment (Byte)Description
char1 byte1The number of bits in a byte might vary but the remaining types assume 8-bits = 1-byte
short (short int)>= 2 bytes2
int>= 2 bytes, normally 4-bytes2
long (long int)>= 4 bytes, sometimes 8-bytes2
long long>= 8 bytes8
floatnormally 4-bytesnormally 4IEEE-754 single precision floating point number
double8-bytes8IEEE-754 double precision floating point number

Note: If you want a fixed width integer type you can use the data types defined in stdint.h which are of the form [u]intwidth_t where u (is optional) represents signedness and width can be any of 8, 16, 32, 64.

TODO Bit Fields

See bit field.

Memory Allocation

In C you have more control over memory allocation than in other languages. ordinarily you'd declare variables in a method and they'd be managed by the runtime and placed on the stack. Dynamic memory allocation instead has us manually allocating memory for a variable using malloc and then later freeing that memory using free.

Warn: malloc may return a NULL pointer when it can't allocate the amount of memory requested. Warn: malloc doesn't zero-out the allocated memory for performance reasons, in most cases you'll probably assign the result of malloc to a struct and therefore zeroing out was unnecessary. If you'd like malloc to zero-out by default, use calloc.

For example here's a common hack that creates a string type with a length field. The char * to the string is at the end of the struct and takes up no length (as in sizeof(string) = sizeof(string.length)) but we can allocate more memory to store the actual string after allocating the struct and then access it like a regular string.

typedef struct {
    int length;
    char c_str[0];
} string;

const char* to_convert = "person";
int length = strlen(to_convert);

// Let's convert to a length prefixed string.
string* person;
person = malloc(sizeof(string) + length+1);
person->length = length;
strcpy(person->c_str, to_convert);
// ...
free(person); // Done with person, free it now.

Warn: Don't free the same pointer twice. This can lead to undefined behaviour. As good practice you should make any pointers to freed memory to NULL instead.

Reallocation

You can resize an existing malloc allocation on the heap through the realloc function. This expands on existing allocation or allocates some new memory and then copies over the contents of the existing allocation before returning to the user. This is most commonly done to resize memory used to hold an array of values.

Warn: realloc may return a different pointer to the one returned by the original malloc and it may fail just like malloc.

int *array = malloc(sizeof(int) * 2);
array[0] = 10;
array[1] = 20;
// Oops need a bigger array - so use realloc..
array = realloc(array, 3 * sizeof(int));
array[2] = 30;

Reading Declarations

To grok a declaration like this:

  1. Start from the identifier (fp).
  2. Look to the right, if there is nothing OR a closing parenthesis ), then goto 4.
  3. We're either at an array (opening square-bracket) [ or a (function opening parenthesis) (.
  4. Return to the starting position and look left. If there is nothing OR an opening parenthesis goto step 6.
  5. You are now positioned on a pointer descriptor, "*".
  6. At this point you have either a parenthesized expression or the complete declarator. If you have a parenthesized expression, consider it as your new starting point and return to step 2.

Bizarre Declarations

Warn: This will probably traumatise you.

// A function taking a 3D array (example arr[5][7][3]) and returning
// a pointer to the first element of the array while maintaining the
// dimension information for the 2nd and 3rd dimensions.
//
// In memory arr[5][7][3] is packed into sizeof(double)*5*7*3 and the
// address of the array is a pointer to the first element. Returning
// the array while discarding the first dimension, references the same
// memory, but forgets that at the end of the first arr[0][7][3] there's
// 4 more arrays of the same size.
double (*fn(double a[][7][3]))[7][3] { return a; }

Standard Library

The POSIX standard specifies a portable set of portable conventions that you can rely on being present on any POSIX compliant system. Note: Many of the C functions described in this section are actually wrappers that call the underlying system call based on the current platform.

One core aspect of this standard is the idea that everything is a file. While recently outdated, and moreover wrong, this convention is still used to today and essentially means that everything is a file descriptor (an integer) referencing an open file handle.

int file_fd = open(...);
int network_fd = socket(...);
int kernel_fd = epoll_create1(...);

Internally a file descriptor is just a pointer to a struct/entry in the file descriptor table of the operating system and can be allocated, deallocated, closed, opened etc. You act open and interact with these objects through the API specified by system-calls and library functions which take the file descriptor as input. Note: Each running program has file descriptors 0 through 2 set to stdin, stdout and stderr.

errno.h

Is the error handling library for c and exposes the errno variable which is set by system calls and some library functions in the event of an error to indicate what went wrong.

Note: errno is thread-safe. Each thread will get a copy of errno stored at the top its stack.

Usage of errno involves calling a function that could return an error and if that function returns an error (I.E. -1 for most system-calls, and -1 or NULL from most library function) you are expected to check errno and handle the error as appropriate. When you don't know what errno means you can use the perror to print the English description of errno.

stdio.h

Outputting

There are 3 main write functions exposed by POSIX and the C standard library.

  • printf writes to standard output, first passing the arguments through sprintf, which can be buffered (see buffering concepts Buffering Concepts\")"), essentially note that stderr will normally be unbuffered and stdin and stdout will be line-buffered or fully buffered depending on whether the stream is connected to an terminal or not).
  • puts and putchar allow you to print strings and single characters verbatim.
  • write is the low-level system call that writes n-bytes from one buffer to the file associated with another file-descriptor.

All of these default to writing to stdout, however there're variants prefixed with f (example: fprintf) that lets you specify a file descriptor to write to.

Table 4: List of format specifiers supported by the printf function.
FormatMeaningDescription
%sStringKeep printing characters until the NULL-character is reached
%dIntegerPrint the argument as an integer
%pPointerPrint the argument as a memory address

Reading

  • fgets requires you to specify a file-stream and also requires specifying the number of bytes to be read, and then copies bytes from the file into the referenced buffer.

    Warn: The older, and now deprecated, gets function was once widely used for this but is no longer because there's no way to control the length of what was being read leading to easy buffer overflows.

  • getline is a variant that automatically allocates and reallocates a buffer on the heap of sufficient size. This is why it takes a pointer to char *, the string may be reallocated and the pointer made to point elsewhere if the buffer isn't large enough.

  • scanf lets you use printf style format strings to parse different items in an input string. It returns the number of items parsed.

    Warn: This requires valid pointers that must be writable.

string.h

Provides a series of functions to deal with manipulating and check pieces of memory, mostly c-style strings.

Table 5: List of basic functions exposed through string.h.
FunctionDescription
strlenReturns the length of a string
strcmpReturn an integer expressing the lexicographical order of 2 strings
strcpyCopy the string at a source pointer into a destination pointer
strcatConcatenate the string at a source destination to the end of destination
strdupReturn a malloc'd copy of an input string
strchrReturn a pointer to the first occurrence of a character in a string
strstrLike strchr but finding the occurrence of a sub-string
strtokSplit a string based on a token
strtol, strtollConvert a string to a long int or long long int
memcpyMove n bytes from a source address to a destination address
memmoveLike memcpy except it can properly handle overlapping memory regions

Creating Libraries

Libraries for C come in two varieties:

  • Static (.a): libraries containing object code that you can link with at compile.
  • Dynamic (.so): dynamically linked shared object libraries. These come in 2 varieties:

    • Dynamically linked at runtime: The libraries must be available during the compile/link phase. The shared objects aren't included into the executable but are tied to its execution.
    • Dynamically loaded/unloaded and linked during program execution using the dynamic linking loader system functions. This can be useful for scripting languages that want to reference C code provided by users or interface with external libraries.

    Note: You can view the libraries a program is linked with using ldd. You can also view the symbols in an object file using nm or readelf.

By convention libraries have a lib prefix. So the zip library would be at a file named libzip.so. We may also suffix the library file with the library version. So libzip.so.1.0.0 is the v1.0.0 release for the zip library (most UNIX systems simply link these to major and short code variants so libzip.so links to libzip.1.0.0 and so does libzip.1).

When you link a library into an executable you specify the name of the library without the extension suffix and lib prefix.

gcc src-file.c -lm -lpthread

For example in figure 1 the linker links the m and zlib libraries that look for the files libm.{a,so} and libzlib.{a,so} in LD_LIBRARY_PATH. These are commonly placed at /usr/lib. Warn: Programs linked with dynamic libraries will fail at execution time if the library they depend on isn't available in LD_LIBRARY_PATH.

Note: Libraries contain compiled code, not the interfaces for them. That means you have to supply the interfaces (header files) for a library alongside the library itself. A rudimentary workaround is to define the prototype of such files in your code directly and leave the linker to throw errors when that fails, however the standard approach is to supply header files (stored at /usr/include) and configured using one of a myriad of path variables such as CPATH (see gcc).

void ctest1(int *i) {
    *i = 5;
}
Code Snippet 1: ctest1.c
void ctest2(int *i) {
    *i = 100;
}
Code Snippet 2: ctest2.c
#ifndef CTEST_H
#define CTEST_H

#ifdef __cplusplus
extern "c" {
#endif // __cplusplus

void ctest1(int *);
void ctest2(int *);

#ifdef __cplusplus
}
#endif // __cplusplus

#endif /* CTEST_H */
Code Snippet 3: ctest.h
#include <stdio.h>
#include "ctest.h"

int main() {
    int x;
    ctest1(&x);
    printf("Val x=%d\n", x);

    return 0;
}
Code Snippet 4: main.c

Static Libraries

Static libraries are literal archives of object code made using the ar utility. To create a library you first compile the files into object files and then archive them together.

# Compile the library src files into objects.
gcc -c ctest1.c ctest2.c
# Combine the objects into a library archive.
ar -cvq libctest.a ctest1.o ctest2.o

# Build and compile the main executable.
gcc -c main.c
gcc -o main main.o -L. -lctest
# Run the executable
./main
Code Snippet 5: Example of creating and linking a static library into an executable.

Dynamic Libraries

Shared Object

# Compile the library src files into objects.
# Note: Dynamic libraries require PIC to be enabled.
gcc -fPIC -c ctest1.c ctest2.c
gcc -shared -Wl,-soname,libctest.so.1 -o libctest.so.1.0 ctest1.o ctest2.o
ln -sv libctest.so.1.0 ./libctest.so.1
ln -sv libctest.so.1.0 ./libctest.so

# Build and compile the main executable.
gcc -c main.c
## You can also use the shortform -lNAME when version doesn't matter.
gcc -o main main.o -L. -l:libctest.so.1.0
# Run the executable.
LD_LIBRARY_PATH=.:"$LD_LIBRARY_PATH" ./main

In figure shared-object-lib-xample you can see the options used to create a shared object dynamic library. The -shared option makes the compiler produce a shared object. The -Wl makes the compiler pass some arguments to the linker (ld). In this case we pass the soname option which sets a field in the library that the linker will also store in the executable. If the library is upgraded and the soname has changed any executables built with the older library version will fail to run. This lets us add a rudimentary level of version checking. In most cases a minor change in the library might result in a new release but it won't end up changing the soname.

Dynamic Loading

This uses libdl to manually load a library at runtime through the dlfcn header.

#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include "ctest.h"

int main(int argc, char **argv) {
    void *lib_handle;
    double (*fn)(int*);
    int x;
    char *error;

    lib_handle = dlopen("./libctest.so", RTLD_LAZY);
    if (!lib_handle) {
	fprintf(stderr, "%s\n", dlerror());
	exit(1);
    }

    fn = dlsym(lib_handle, "ctest1");
    if ((error = dlerror()) != NULL) {
	fprintf(stderr, "%s\n", error);
	exit(1);
    }

    (*fn)(&x);
    printf("Val x=%d\n", x);
    dlclose(lib_handle);
    return 0;
}
Code Snippet 6: main2.c

To dynamically load a library we must first create the shared object library again, and then load it into our program and then require each symbol as we need it.

# Build and compile the main executable.
gcc -rdynamic -o main2 main2.c -ldl
# Run the executable.
./main2

TODO Windows DLLs

http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html

Programs

nm

nm outputs all the symbols defined in an object file.

gcc

The GNU Compiler Collection. The recommended FOSS compiler for C.

Important flags:

  • -g will include debugging information in compiled executables.
  • -v -Wl,--verbose will give verbose linker output.
  • -E makes the compiler output translation units (results of pre-processing).

Sanitizers

Are extended checkers/linters that perform the job of other tools such as valgrind at run time. To enable you must statically link against a sanitizer library: for example -llibasan for including the address sanitizer. Then we pass one or more -fsanitize=X arguments where X can be address, leak, or numerous other options. Then when you run the script any sanitiser errors will be printed out as their encountered.

Warn: sanitizer libraries must be the first libraries you link against for it to work as expected.

gdb

An interactive debugger with support for breakpoints and introspection. Designed to be used with gcc.

valgrind

A suite of tools for debugging and profiling (eg: memory leak detection).

ltrace

A program to monitor and output any library calls in a program.

strace

A more standard ltrace alternative which also outputs syscalls by default.

Links to this note