C
- Tags
- language
Is a low level programming language and the standard for serious systems programming due to most kernels exposing their API through C.
Program | Example |
---|---|
gdb | An interactive debugger with support for breakpoints and introspection |
valgrind | A suite of tools for debugging and profiling (eg: memory leak detection) |
ltrace | A program to monitor and output any system/library calls in a program |
strace | Like ltrace but can also modify the program to assist in debugging |
Language
Pre-processor
The pre-processor is a program that processes c-code before the compiler and linker can get to it. This is used to add macro evaluations and dynamic extensions to the language.
#define
Allows you to define a method or symbol that's substituted verbatim when called in c.
/* Defining and using a symbol. */
#define MAX_LENGTH 10
char buffer[MAX_LENGTH]
/* Creating a function that's really an expression */
#define min(a,b) a < b ? a : b
void foo() {
int x = 4;
if (min(x++, 5))
printf("%d is six", x);
}
Warn: The c pre-processor performs simple symbol substitution. If you tried to
use the min
function defined as above it would be expanded as x++ < 5 ? x++ : 5
.
There's two calls to x++
so x
is incremented twice and results in 6
. With a
regular function the arguments would be evaluated and then replaced leading to
only one increment but due to being a macro it happens twice.
#include
The include directive lets you include the contents of other files at the current
point.
This comes in 2 varieties differentiated by delimiters. A file name delimited by
angle brackets <stdio.h>
references a system library. Quotes "foo.h"
are used to
look from the local directory and then the library if a local file is not found.
Note: The .h
suffix is often omitted when referencing c++ libraries by convention.
For example #include <iostream>
includes the iostream
c++ library but to include
stdio.h
you must suffix it with .h
because its a c library.
#include <stdio.h>
#include "./my-file.h"
The include directive is extremely basic. For one it doesn't avoid repeatedly including the same file more than once, which could happen because of transitive dependencies between includes. To get around this most header files are wrapped in header-guards. These are conditionals that assert whether an identifier (based on the filename) is defined and only expand the contents if its defined.
Warn: This include mechanism is part of why compiling large c projects is so slow. Each include directive is replaced in place by the contents of the file and the nested ifndef etc. are evaluated after substitution leading to drains in compilation speed as the number of dependencies increases.
#ifndef FOO_H
#define FOO_H
void foo();
#endif /* FOO_H */
CONVENTION Include Dependencies
You may end up in a situation where a function in your header file has a dependency on a struct defined in another file (such as, as a parameter). The classic way to solve this is to include the dependency into the header file.
The convention regarding includes is:
- If your dependency is required by the header file, place it in the header
file AND in the
.c
file. - If your dependency is required by the
.c
file include it in the.c
file.
- If your dependency is required by the header file, place it in the header
file AND in the
TODO Conditionals
Functions
c is the standard through which java and many other derived languages define many syntax constructs. This includes methods. A c method is prefixed by its return type then name, then parameter list and lastly body.
int main(void);
int main(void) {
/* Body of the main function */
}
In the previous example you may notice that we've define main
twice. The first
time we declared that a function called main
that returns int
and takes no
arguments exists. In the second call we defined the procedure for that function
giving it a concrete implementation.
This separation between declaration and definition is related to the include and
linking mechanism of the c language, with declarations being packed into header
files and included by other .c
source-code files upon which they depend;
for example you may have a linked list struct/class that's needed by a few files
but those files only need the interface/API of that struct so they can compile
properly (ensure the available fields are referenced, the proper procedures are
called, etc.). They don't need to include and compile the definitions of those
procedures, only the guarantee that they exist and can be called at runtime with
specific types.
C source files are compiled into an intermediate object files and then linked
together. Its the linking stage at which point we ensure everything that was
declared (or called) has an implementation; which could come from c or a library
or some other object representation. This independence between what something is
written in and how it can be called is part of the longevity of the c-language.
Note: The void
parameter shown above is kind of an outlier. It indicates that this
function takes no arguments. Omitting the void
(example: void main()
) is interpreted
as this function can take any number of parameters of unknown types.
Warn: In c++ omitting the parameter list is seen the as the same as saying it takes
no arguments.
Inlined
Is a way to make the c compiler replace function calls with a series of expression equivalent to calling the function but without the overhead of an extra stack-frame and other consequences of a function call.
inline int max(int a, int b) {
return a < b ? a : b;
}
Now any reference to max(x, y)
is replaced with x < y ? x : y
in the same
function. This is much like the prior macro level max function except the
arguments are only evaluated once. Therefore max(x++, y)
is equivalent to
x = x + 1; x < y ? x : y
.
Note: You generally shouldn't do this manually, the compiler is usually smart enough to detect when this is necessary and do so automatically.
To inline a function, it's definition must be available at compile time. This is why most inline funtions are both declared and defined in header files, and why some devs prefer header only libraries.
Variadic Arguments
C has tangential support for variadic arguments through the stdarg.h
library.
After including this library you can set a parameter for a function to ...
and
can then supply an arbitrary number of arguments.
double average(int count, ...) {
va_list ap;
int j;
double sum = 0;
va_start(ap, count); /* Requires the last fixed parameter (to get the address) */
for (j = 0; j < count; j++) {
sum += va_arg(ap, int); /* Increments ap to the next argument. */
}
va_end(ap);
return sum / count;
}
Function Pointers
Are pointers to functions. They can be passed as arguments/parameters and called as variables.
// A pointer, fp, to a function returning an int and taking an int argument.
int (* fp)(int);
With a function pointer, we can call the pointer to call the pointed to function.
void foo(const char *str) { printf("Foo: %s\n", str); }
int main() {
void (*my_func)(const char *);
// Assign our function pointer to a value.
my_func = foo;
// Call our function through the pointer.
my_func("Hello"); // => Foo: hello
}
Strings
Strings in C differ from a lot of modern languages in that their NULL-terminated
instead of length-prefixed. In practice this means to store a 5 character string
"Hello"
you have to allocate 6 bytes (including the trailing null).
Whenever you define a string literal (of the form char *str = "constant"
) the
string is stored in the data section of the executable and is read-only. This
means in most cases when you define the same string literal more than once, it'll
actually point to the same underlying address in memory.
const char *foo = "hello world";
const char *bar = "hello world";
printf("%p == %p", foo, bar); // 0x55c8ceab3004 == 0x55c8ceab3004
You can also declare strings to be in the writable data segment or stack by using character-arrays. Thees contain literal values that have been copied from the code segment into either the stack or static memory.
char foo[] = "hello world";
char bar[] = "hello world";
printf("%p == %p", foo, bar); // 0x7ffee2fb8e68 == 0x7ffee2fb8e58
Conditionals
c has basic if/else if/else
constructs and more expressive switch/case
statements.
if (SOME_CONDITION) {
/* Some logic/expression */
} else if (ANOTHER_CONDITION) {
/* Some other logic/expression */
} else {
/* Fallback logic for when nothing else matched */
}
Switch cases take a single argument and then compare it against a bunch of cases
running the body of the matching case. With c-style switches if the body doesn't
end with a break
statement control seeps down to the next case and so on and so
forth until it is broken or the expression ends.
switch (1) {
case 1:
puts("1");
// Controls falls through to next case because no break
case 2:
puts("2");
break;
default:
puts("fallthrough");
break;
}
Note: The fall-through feature of case statements while often chided can result in some highly elegant algorithms. Case and point Duff's device.
Loops
c has for/while/do-while
loops.
The for loop follows the same style you're most likely accustomed to with the body
containing (one or more initialisation terms; a loop invariant; an update condition)
.
for (int i = 0; i < 10; i++) {
// Do 10 times.
}
Note: Each of the sections of the loop can be omitted. When a condition is omitted
the loop just runs forever (much like while (true)
) until something in the body
terminates it.
for (;;) {
// Done until we break.
}
While loops run while the condition in the parentheses is true. This is equivalent to our previous for loop example, combining the increment and condition steps into one expression.
int i = 0;
while (i++ < 10) {
// Also done 10 times.
}
A do-while
loop is identical to a while loop except it always runs the body at
least once. You can think of it as running the body once and then a while loop
with the same body and condition. This is convenient when the cause of termination
for the loop is part of the body of the loop. For example:
int i;
do {
i = 10;
// Done once because the condition is already satisfied
// but the do-while loop is always run at least once.
} while (i++ < 10)
The break
keyword lets you terminate out of the inner-most loop immediately.
while (1) {
while (2) {
break; /* Breaks out of while(2) */
} /* Jumps here */
break; /* Breaks out of while(1) */
} /* Continues here */
continue
is similar to break except instead of ending the loop, it jumps forward
to the next iteration of it.
For example the following prints all the odd numbers between 0 and 10 inclusive.
for (int i = 0; i <= 10; i++) {
if (i % 2 == 0)
// Skip all the even numbers.
continue;
printf("%d\n", i);
}
Type Modifiers
While c lacks the same level of control as c++ and java (public, private, protected, etc.) it does have a few modifiers which control how we can interact with a variable or argument.
const
The const
modifier is a language level construct that tells the compiler that
this data should remain constant. Attempts to modify a constant variable leads
to a compiler crash.
The semantics of const
differ depending on whether it comes before or after the
type and whether the type is a pointer or not.
const int i = 0; // Same as "int const i = 0"
char *str = ...; // Mutable pointer to a mutable string
const char *const_str = ...; // Mutable pointer to a constant string
char const *const_str2 = ...; // Same as above
const char *const const_ptr_str = ...; // Constant pointer to a constant string
A mutable pointer is a pointer that can be made to point somewhere else, however
if it points to something constant (example: an int) that data cannot be changed.
The general idea is if const
precedes *
then the pointer is mutable but the value
isn't and if const
comes after *
the value is mutable but the pointer isn't.
Except when const both precedes and follows *
in which case it's both.
Note: When you define a constant pointer you must assign it, because it can't be changed later.
It might help to visualise the meaning here:
int a = 5, b = 10, c = 15;
/* ---------------------------
| pointer to constant int. |
--------------------------- */
const int* foo = &a;
// *foo = 6; // the value of a canĀ“t get changed through the pointer.
foo = &b; // the pointer foo can be changed (point somewhere else).
/* ---------------------------
| constant pointer to int. |
--------------------------- */
int *const bar = &c;
*bar = 16; // the value of c can be changed through the pointer.
// bar = &a; // not possible because bar is a constant pointer.
extern
Is a keyword telling the compiler that a variable is defined in another object file or a library. Use this when the variable is needed but not exposed through a header file or some other mechanism.
// file1.c
extern int panic;
void foo() {
printf("%d\n", panic);
}
// file2.c
int panic = 1;
restrict
Hints to the compiler that for the lifetime of the pointer only the pointer
itself or a value derived directly from it (example: pointer + 1
) will be used to
access the object to which it points.
Essentially it declares that this particular memory region shouldn't overlap with
with all other pointer memory regions. Essentially its used to tell users of the
program that it is undefined behaviour if the memory regions overlap.
An example of this is memcpy
:
void* memcpy(void * restrict dest, const void * restrict src, size_t bytes);
signed/unsigned
Changes the default behaviour of numbers as signed or unsigned. By default
numbers are signed, and you must use unsigned
to declare them as unsigned, but
it may be useful in cases where you want the compiler to default to a signed
type such as below.
Note: unsigned
only applies to integers like primitives such as int
or long
.
int count_bits_and_sign(signed representation) {
// ...
}
static
Has different meanings depending on whether its used on a variable within a function or in variable or function in the global scope. In the former case it allocates the variable with static runtime, meaning its value is allocated once at program startup and its lifetime persists across method calls and until the end of the program. In the later class it means that the scope of the variable or function is limited to the current file.
// visible to this file only
static int i = 0;
static int _perform_calculation(void) {
// ...
}
char *print_time(void) {
// Shared every time a function is called
static char buffer[200];
// ...
}
volatile
Prevents the compiler from optimising out a value.
For example in the following function the compiler may optimise out the while loop
to while (1)
because the while loop has nothing to do with the flag, even though
the function may modify it.
int flag = 1;
pass_flag(&flag);
while (flag) {
// Do things unrelated to flag
}
Declaring volatile int flag = 1;
forces the compiler to keep the variable in and
perform the check. This is useful for multi-process or multi-threaded programs.
Enumerations
Is a type that can take on many finite values. Use of an enumeration brings compile time safety when referencing values of that enumeration (such as case statements).
enum day {
monday,
tuesday,
wednesday,
thursday,
friday,
saturday,
sunday
};
You can explicitly assign values to enum members (in fact it's not advisable to rely on the compiler for consistent numbering). You can even assign enum values to either be different or the same.
enum day {
monday = 0,
tuesday = 0,
wednesday = 0,
thursday = 1,
friday = 10,
saturday = 10,
sunday = 0
};
You can assign reference enum types like structs by explicitly prefixing the enum name with enum.
enum day someDay = monday;
Unions
Is a construct similar in concept to a struct except, whereas a struct allocate enough memory to hold all the elements of the struct, a union allocates enough memory to hold the largest element of the union. At any point the union holds one valid value
A common use case for a union is specifying one or more optional types that might be available (alongside an enum to declare which one is currently in use) or to provide an alternative interface for the data in a struct.
typedef struct {
union {
struct {
double x;
double y;
double z;
};
double raw[3];
};
} vec3d_t;
vec3d_t v;
v.x = 4.0;
v.raw[1] = 3.0; // Equivalent to v.y = 3.0
v.z = 2.0;
union pixel {
struct values {
char red;
char blue;
char green;
char alpha;
} values;
// Each character is 8 bytes, with 4 of them it
// takes up 32 bytes which is equivalent to the
// sizeof the values struct.
uint32_t encoded;
};
union pixel a;
// When modifying or reading
a.values.red;
a.values.blue = 0x0;
// When writing to a file
fprintf(picture, "%d", a.encoded);
Pointers
Is a value that points to some address in memory. If you think of memory as a continuous tape of data then the idea of a pointer being a number (offset) from the start of that tape might be easier to rationalise.
In C we access the address of things using the &
operator and we dereference a
pointer to its pointed to value using the *
operator.
int foo = 5;
int *foo_ptr = &foo; // Create a reference
int foo_val = *foo_ptr; // Dereference the value
*foo_ptr = 1; // Update through the pointer
Pointers in C are how we support pass-by-reference, with values being pass-by-value by default. At a high level what this means is that if you supply a value to a function its copied into memory for use by that function and modifying the value doesn't do anything to the original value. To allow functions to modify arguments they must be passed as pointers.
void pass_by_val(int val) {
val = 2;
}
void pass_by_ptr(int *ptr) {
// Dereference and then assign the value pointed to by ptr
*ptr = 2;
}
int num = 1;
printf("%d\n", num); // 1
pass_by_val(num);
printf("%d\n", num); // 1
pass_by_ptr(&num);
printf("%d\n", num); // 2
Pointer Arithmetic
The type of a pointer doesn't affect its size in memory, instead it determines how much to increment a pointer by. Incrementing a pointer moves the address pointed to forward by a set number of bytes (the sizeof the type of the pointer).
For example you can iterate through the characters of a string by using moving the pointer through until you reach the null character. This approach is so commonly used that it's the basis for terminating C strings with null. Otherwise developers would always have to know and supply the length of the string (as many library functions do with arrays).
char *ptr = "hello";
for (int i = 0; *ptr != '\0'; ptr++, i++) {
printf("char %d: %c\n", i, *ptr);
}
// char 0: h
// char 1: e
// char 2: l
// char 3: l
// char 4: o
Void Pointers
A special case in the use of pointers is the void*
. This is a pointer with an
unassociated type that can be type-cast to and from any other pointer type.
C will automatically promote void*
to its appropriate type. This is the return
type of the malloc
function and is why you can assign the result of malloc
to
any pointer type (see memory allocation).
Null Pointers
Is a pointer that points nowhere. In C this is defined as NULL
.
Arrays
Arrays are a syntax wrapper around pointers. They represented a contiguous array of elements in memory all of the same type. Internally an array is actually a pointer to the first element of the array and provides syntax sugar for accessing elements at different indexes.
Note: The syntax arr[x]
is equivalent to *(ptr+x)
.
int arr[] = {1, 2, 3};
int *ptr = arr;
printf("%p = %p\n", arr, ptr); // 0x7ffed20c9498 = 0x7ffed20c9498
printf("%d = %d\n", arr[0], *ptr); // 1 = 1
printf("%d = %d\n", arr[1], *(ptr+1)); // 2 = 2
printf("%d = %d\n", arr[2], *(ptr+2)); // 3 = 3
Note: Due to their nature as pointers, arrays automatically decay down to
pointers. In the example above we assign ptr
to arr
without any issue.
Sizeof Arrays
The sizeof an array is the size of all the elements taken up by the array.
#define SIZE 10
int* ip = malloc(sizeof(int) * SIZE);
int ia[SIZE];
sizeof(ip); // 8 bytes (64 bit pointer, this is always hardware dependent)
sizeof(ia); // 40 bytes (sum of the sizes of each element in array (4 bytes per int * 10 ints))
This gives us a rudimentary approach for counting the number of elements in an array. Since each element has a fixed size and we can calculate the size of all elements, we simply divide one by the other.
int size = sizeof(ia) / sizeof(ia[0]);
Array Reassignment
Another distinction between arrays and pointers is that arrays cannot be reassigned or made to point elsewhere like pointers. Once an array is declared it must always point to the same location.
int arr1[] = {1, 2, 3};
int arr2[] = {1, 2, 3};
int *ptr = arr1;
arr1 = arr2; // Compiler error array type 'int [3]' is not assignable
ptr = arr2;
This means that pointers are more flexible than arrays, even though the name of an array is a pointer to its starting address.
Structs
Are the construct used to pair multiple types together into a new structure.
In c structs are contiguous regions of memory that can access each specific element of each memory as if they were separate variables.
struct hostname {
const char *port;
const char *name;
const char *resource;
};
You declare a variable of a struct type by explicitly specifying its type and then assigning each struct member separately. Newer versions of c allow static initialisation of each field alongside the variable declaration.
// Assign each individually
struct hostname facebook;
facebook.port = "80";
facebook.name = "www.google.com";
facebook.resource = "/";
// You can use static initialization in later versions of c
struct hostname google = {"80", "www.google.com", "/"};
Padding
While structs are all setup to take up a contiguous region of memory, it might not be possible to fit two elements one before the other such that the element ends as soon as the next element starts. Essentially there might be padding between elements to ensure each variable is memory aligned (starts at a memory address that's a multiple of the word size).
Sizeof
Is an operator, evaluated at compile-time, that returns the number of bytes that an expression contains. This could calculate the size of a struct or a builtin.
char a = 0;
printf("%zu", sizeof(a++));
This keyword requires the complete definition of the type at compile-time, not link-time, because otherwise the preprocessor which runs at compile-time can't replace the size of the expression. This is commonly why complete struct definitions are placed into header-files.
Labels & Goto
Is a keyword allowing non-conditional jumps. First you have to set an assembly
style label (Example: label:
) and then at some point in the procedure you can
jump forward or backwards to a specific point.
void setup(void) {
Doe *deer;
Ray *drop;
Mi *myself;
if (!setupdoe(deer)) {
goto finish;
}
if (!setupdrop(drop)) {
goto cleanupdoe;
}
if (!setupmi(myself)) {
goto cleanupray;
}
perform_action(deer, drop, myself);
cleanupray:
cleanup(drop);
cleanupdoe:
cleanup(deer);
finish:
return;
}
Type Aliases
Is a way to declare an alias for a type.
For example you can refer to a float as real.
typedef float real;
real gravity = 10;
This is most commonly used with structs or enums to avoid having to repeatedly
prefix the type with struct
or enum
.
typedef struct link_t {
char* url;
int port;
} link_t;
// You can use either the style with struct or without.
struct link_t lk1;
link_t lk2;
Note: You don't need to specify a name for the struct. Instead you could typedef
an anonymous struct to make sure it can only be accessed without prefixing it with
struct
.
typedef struct {
char* url;
int port;
} link_t;
link_t lk;
Operators
Operator | Name | Description |
---|---|---|
[] | Subscript | Retrieves the nth element of an array a[n] == (a + n)* |
-> | Structure Dereference | Acess a struct member x through a pointer *p as p->x |
. | Structure Reference | Access the member x of object a as a.x |
+a , -a | Unary Plus, Unary Minus | Keep or negate the sign of the integer or float of a |
*p | Dereference | Access the element located at the value pointed to by p |
&a | Address-Of | Return the address of the element a |
++ | Increment | Can be used postfix or prefix to increment a variable by 1 |
-- | Decrement | Same as the increment operator except decrements the variable |
sizeof | Sizeof | Fetch the size-of a type or array |
+ , - , *~, % , / | Arithmetic Binary | Add, subtract, multiple, mod, divide an element by another |
<< , >> | Bit Shift | Left or right shift the bits in an integer by n bits |
> , < | Greater/Less Than | |
== , != | Equal To, Not Equal To | |
&& | Logical And | True if both the left and right operands are true |
| | | Logical Or | True if either the left or right operands are true |
! | Unary Logical Negation | Inverts the boolean expression to its right |
& | Bitwise AND | If a bit is set in both the operands its set in the output |
| | Bitwise OR | If a bit is set in either of the operands its set in the output |
~ | Unary Bitwise Negation | If a bit is set in the input its not set in the output |
x ? y : z | Ternary | If x then return y else return z |
a, b, ..., z | Comma | Evaluates a , then b , etc. and returns the last entry z |
Note: bit shift handles signs uniquely. When left shifting there will always be zeros introduced on the right. If the operand on the left is signed and negative then right shifting will introduce ones on the left. Otherwise in all cases zeroes are introduced.
Sequence Points
Are points in code that the spec guarantees that all side effects of previous evaluations have been performed and no side-effects from a subsequent evaluation has happened yet.
This is relevent in expressions like foo(x++, x-- - ++y)
where the order of
evaluation is ambiguous.
TODO: describe.
Data Types
Type | [Minimum] Guaranteed Size | Alignment (Byte) | Description |
---|---|---|---|
char | 1 byte | 1 | The number of bits in a byte might vary but the remaining types assume 8-bits = 1-byte |
short (short int ) | >= 2 bytes | 2 | |
int | >= 2 bytes, normally 4-bytes | 2 | |
long (long int ) | >= 4 bytes, sometimes 8-bytes | 2 | |
long long | >= 8 bytes | 8 | |
float | normally 4-bytes | normally 4 | IEEE-754 single precision floating point number |
double | 8-bytes | 8 | IEEE-754 double precision floating point number |
Note: If you want a fixed width integer type you can use the data types defined in
stdint.h
which are of the form [u]intwidth_t
where u
(is optional) represents
signedness and width can be any of 8, 16, 32, 64.
TODO Bit Fields
See bit field.
Memory Allocation
In C you have more control over memory allocation than in other languages. ordinarily you'd declare variables in a method and they'd be managed by the runtime and placed on the stack. Dynamic memory allocation instead has us manually allocating memory for a variable using malloc and then later freeing that memory using free.
Warn: malloc
may return a NULL
pointer when it can't allocate the amount of memory
requested.
Warn: malloc
doesn't zero-out the allocated memory for performance reasons, in
most cases you'll probably assign the result of malloc
to a struct and therefore
zeroing out was unnecessary. If you'd like malloc
to zero-out by default, use
calloc.
For example here's a common hack that creates a string type with a length field.
The char *
to the string is at the end of the struct and takes up no length (as in
sizeof(string) = sizeof(string.length)
) but we can allocate more memory to store
the actual string after allocating the struct and then access it like a regular
string.
typedef struct {
int length;
char c_str[0];
} string;
const char* to_convert = "person";
int length = strlen(to_convert);
// Let's convert to a length prefixed string.
string* person;
person = malloc(sizeof(string) + length+1);
person->length = length;
strcpy(person->c_str, to_convert);
// ...
free(person); // Done with person, free it now.
Warn: Don't free the same pointer twice. This can lead to undefined behaviour. As
good practice you should make any pointers to freed memory to NULL
instead.
Reallocation
You can resize an existing malloc
allocation on the heap through the realloc
function. This expands on existing allocation or allocates some new memory and
then copies over the contents of the existing allocation before returning to the
user. This is most commonly done to resize memory used to hold an array of values.
Warn: realloc
may return a different pointer to the one returned by the original
malloc
and it may fail just like malloc
.
int *array = malloc(sizeof(int) * 2);
array[0] = 10;
array[1] = 20;
// Oops need a bigger array - so use realloc..
array = realloc(array, 3 * sizeof(int));
array[2] = 30;
Reading Declarations
To grok a declaration like this:
- Start from the identifier (
fp
). - Look to the right, if there is nothing OR a closing parenthesis
)
, then goto 4. - We're either at an array (opening square-bracket)
[
or a (function opening parenthesis)(
. - Return to the starting position and look left. If there is nothing OR an opening parenthesis goto step 6.
- You are now positioned on a pointer descriptor, "*".
- At this point you have either a parenthesized expression or the complete declarator. If you have a parenthesized expression, consider it as your new starting point and return to step 2.
Bizarre Declarations
Warn: This will probably traumatise you.
// A function taking a 3D array (example arr[5][7][3]) and returning
// a pointer to the first element of the array while maintaining the
// dimension information for the 2nd and 3rd dimensions.
//
// In memory arr[5][7][3] is packed into sizeof(double)*5*7*3 and the
// address of the array is a pointer to the first element. Returning
// the array while discarding the first dimension, references the same
// memory, but forgets that at the end of the first arr[0][7][3] there's
// 4 more arrays of the same size.
double (*fn(double a[][7][3]))[7][3] { return a; }
Standard Library
The POSIX standard specifies a portable set of portable conventions that you can rely on being present on any POSIX compliant system. Note: Many of the C functions described in this section are actually wrappers that call the underlying system call based on the current platform.
One core aspect of this standard is the idea that everything is a file. While recently outdated, and moreover wrong, this convention is still used to today and essentially means that everything is a file descriptor (an integer) referencing an open file handle.
int file_fd = open(...);
int network_fd = socket(...);
int kernel_fd = epoll_create1(...);
Internally a file descriptor is just a pointer to a struct/entry in the file descriptor table of the operating system and can be allocated, deallocated, closed, opened etc. You act open and interact with these objects through the API specified by system-calls and library functions which take the file descriptor as input. Note: Each running program has file descriptors 0 through 2 set to stdin, stdout and stderr.
errno.h
Is the error handling library for c and exposes the errno variable which is set by system calls and some library functions in the event of an error to indicate what went wrong.
Note: errno is thread-safe. Each thread will get a copy of errno
stored at the top
its stack.
Usage of errno
involves calling a function that could return an error and if that
function returns an error (I.E. -1 for most system-calls, and -1 or NULL
from most
library function) you are expected to check errno and handle the error as
appropriate.
When you don't know what errno
means you can use the perror to print the English
description of errno
.
stdio.h
Outputting
There are 3 main write functions exposed by POSIX and the C standard library.
- printf writes to standard output, first passing the arguments through sprintf, which can be buffered (see buffering concepts Buffering Concepts\")"), essentially note that stderr will normally be unbuffered and stdin and stdout will be line-buffered or fully buffered depending on whether the stream is connected to an terminal or not).
- puts and putchar allow you to print strings and single characters verbatim.
- write is the low-level system call that writes n-bytes from one buffer to the file associated with another file-descriptor.
All of these default to writing to stdout, however there're variants prefixed
with f
(example: fprintf
) that lets you specify a file descriptor to write to.
Format | Meaning | Description |
---|---|---|
%s | String | Keep printing characters until the NULL-character is reached |
%d | Integer | Print the argument as an integer |
%p | Pointer | Print the argument as a memory address |
Reading
fgets requires you to specify a file-stream and also requires specifying the number of bytes to be read, and then copies bytes from the file into the referenced buffer.
Warn: The older, and now deprecated, gets function was once widely used for this but is no longer because there's no way to control the length of what was being read leading to easy buffer overflows.
getline is a variant that automatically allocates and reallocates a buffer on the heap of sufficient size. This is why it takes a pointer to
char *
, the string may be reallocated and the pointer made to point elsewhere if the buffer isn't large enough.scanf lets you use printf style format strings to parse different items in an input string. It returns the number of items parsed.
Warn: This requires valid pointers that must be writable.
string.h
Provides a series of functions to deal with manipulating and check pieces of memory, mostly c-style strings.
Function | Description |
---|---|
strlen | Returns the length of a string |
strcmp | Return an integer expressing the lexicographical order of 2 strings |
strcpy | Copy the string at a source pointer into a destination pointer |
strcat | Concatenate the string at a source destination to the end of destination |
strdup | Return a malloc'd copy of an input string |
strchr | Return a pointer to the first occurrence of a character in a string |
strstr | Like strchr but finding the occurrence of a sub-string |
strtok | Split a string based on a token |
strtol, strtoll | Convert a string to a long int or long long int |
memcpy | Move n bytes from a source address to a destination address |
memmove | Like memcpy except it can properly handle overlapping memory regions |
Creating Libraries
Libraries for C come in two varieties:
- Static (
.a
): libraries containing object code that you can link with at compile. Dynamic (
.so
): dynamically linked shared object libraries. These come in 2 varieties:- Dynamically linked at runtime: The libraries must be available during the compile/link phase. The shared objects aren't included into the executable but are tied to its execution.
- Dynamically loaded/unloaded and linked during program execution using the dynamic linking loader system functions. This can be useful for scripting languages that want to reference C code provided by users or interface with external libraries.
Note: You can view the libraries a program is linked with using ldd. You can also view the symbols in an object file using nm or readelf.
By convention libraries have a lib
prefix. So the zip library would be at a file
named libzip.so
. We may also suffix the library file with the library version. So
libzip.so.1.0.0
is the v1.0.0 release for the zip library (most UNIX systems simply
link these to major and short code variants so libzip.so
links to libzip.1.0.0
and
so does libzip.1
).
When you link a library into an executable you specify the name of the library without the extension suffix and lib prefix.
gcc src-file.c -lm -lpthread
For example in figure 1 the linker links the m
and zlib
libraries
that look for the files libm.{a,so}
and libzlib.{a,so}
in LD_LIBRARY_PATH. These
are commonly placed at /usr/lib.
Warn: Programs linked with dynamic libraries will fail at execution time if the
library they depend on isn't available in LD_LIBRARY_PATH
.
Note: Libraries contain compiled code, not the interfaces for them. That means you
have to supply the interfaces (header files) for a library alongside the library
itself. A rudimentary workaround is to define the prototype of such files in your
code directly and leave the linker to throw errors when that fails, however the
standard approach is to supply header files (stored at /usr/include) and configured
using one of a myriad of path variables such as CPATH
(see gcc).
void ctest1(int *i) {
*i = 5;
}
void ctest2(int *i) {
*i = 100;
}
#ifndef CTEST_H
#define CTEST_H
#ifdef __cplusplus
extern "c" {
#endif // __cplusplus
void ctest1(int *);
void ctest2(int *);
#ifdef __cplusplus
}
#endif // __cplusplus
#endif /* CTEST_H */
#include <stdio.h>
#include "ctest.h"
int main() {
int x;
ctest1(&x);
printf("Val x=%d\n", x);
return 0;
}
Static Libraries
Static libraries are literal archives of object code made using the ar utility. To create a library you first compile the files into object files and then archive them together.
# Compile the library src files into objects.
gcc -c ctest1.c ctest2.c
# Combine the objects into a library archive.
ar -cvq libctest.a ctest1.o ctest2.o
# Build and compile the main executable.
gcc -c main.c
gcc -o main main.o -L. -lctest
# Run the executable
./main
Dynamic Libraries
Shared Object
# Compile the library src files into objects.
# Note: Dynamic libraries require PIC to be enabled.
gcc -fPIC -c ctest1.c ctest2.c
gcc -shared -Wl,-soname,libctest.so.1 -o libctest.so.1.0 ctest1.o ctest2.o
ln -sv libctest.so.1.0 ./libctest.so.1
ln -sv libctest.so.1.0 ./libctest.so
# Build and compile the main executable.
gcc -c main.c
## You can also use the shortform -lNAME when version doesn't matter.
gcc -o main main.o -L. -l:libctest.so.1.0
# Run the executable.
LD_LIBRARY_PATH=.:"$LD_LIBRARY_PATH" ./main
In figure shared-object-lib-xample you can see the options used to create a
shared object dynamic library. The -shared
option makes the compiler produce
a shared object. The -Wl
makes the compiler pass some arguments to the linker
(ld). In this case we pass the soname
option which sets a field in the library
that the linker will also store in the executable. If the library is upgraded and
the soname
has changed any executables built with the older library version will
fail to run. This lets us add a rudimentary level of version checking. In most
cases a minor change in the library might result in a new release but it won't
end up changing the soname
.
Dynamic Loading
This uses libdl
to manually load a library at runtime through the dlfcn header.
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include "ctest.h"
int main(int argc, char **argv) {
void *lib_handle;
double (*fn)(int*);
int x;
char *error;
lib_handle = dlopen("./libctest.so", RTLD_LAZY);
if (!lib_handle) {
fprintf(stderr, "%s\n", dlerror());
exit(1);
}
fn = dlsym(lib_handle, "ctest1");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(1);
}
(*fn)(&x);
printf("Val x=%d\n", x);
dlclose(lib_handle);
return 0;
}
To dynamically load a library we must first create the shared object library again, and then load it into our program and then require each symbol as we need it.
# Build and compile the main executable.
gcc -rdynamic -o main2 main2.c -ldl
# Run the executable.
./main2
TODO Windows DLLs
http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html
Programs
nm
nm
outputs all the symbols defined in an object file.
gcc
The GNU Compiler Collection. The recommended FOSS compiler for C.
Important flags:
-g
will include debugging information in compiled executables.-v -Wl,--verbose
will give verbose linker output.-E
makes the compiler output translation units (results of pre-processing).
Sanitizers
Are extended checkers/linters that perform the job of other tools such as
valgrind
at run time. To enable you must statically link against a sanitizer
library: for example -llibasan
for including the address sanitizer. Then we pass
one or more -fsanitize=X
arguments where X
can be address, leak, or numerous
other options. Then when you run the script any sanitiser errors will be printed
out as their encountered.
Warn: sanitizer libraries must be the first libraries you link against for it to work as expected.
gdb
An interactive debugger with support for breakpoints and introspection. Designed
to be used with gcc
.
valgrind
A suite of tools for debugging and profiling (eg: memory leak detection).
ltrace
A program to monitor and output any library calls in a program.
strace
A more standard ltrace
alternative which also outputs syscalls by default.