Stjepan Groš

Programming Linux Kernel

Disclaimer and stuff like that

Note 1: These examples are tested under Fedora Core Linux Kernel version 2.6.18-1.2849. Because of possible changes in kernel API it might be that the examples given here don't work. Also, they might break, burn, or do whatever with your computer and data on it! So, be careful! I'm not responsible for any damages.

Note 2: I don't claim to be original! Just the opposite, all the code given here has been collected from different places on the Internet! I recommend that you look kernel page on LWN. Also, Linux Device Drivers book might be very helpfull. Finally, you should definitely have available kernel sources.

Note 3: This is incomplete, and probably will stay so for some time in the future. Anyway, I'll try to expand this into something useful.

Note 4: You should now how to use C compiler, make utility and also know what are and how are used tools for manipulating modules, i.e. insmod, rmmod, modprobe and modinfo. Each of those commands has a good manual page so go read it to get an impression of what they are and how they are used.

What are modules and what we are going to do

Linux kernel, as any other modern operating system kernel, is modularized and allows functionality to be added and removed at runtime. In other words, it's not necessary to reboot only to add some USB stick. It should be clear that not all parts of the kernel can be turned into modules. For example, memory management, or scheduling, are in a core of kernel and their removal would make computer unusable.

Note that newcomers often confuse terms modularized and microkernel. On the surface those seem the same, i.e. both denote that the kernel is in some way divided into separate, removable, parts that communicate via some predefined interfaces. But, the key difference between those two terms is in the privileges of the modules. In case of the modularized kernels all the code is running with the same privileges, usually in some special CPU mode that allows almost unrestricted access to all the computer resources. On the other hand, in the microkernel based kernel only a small part of the kernel runs in the unrestricted special CPU mode and it doesn't do much, most importantly, it handles communication services. All the other "modules" are now simple processes no different than any other application process on any other OS. Well, almost. :) Anyway, the Linux kernel is not microkernel based.

When learning how to add or change some functionality in the kernel, or developing some new functionality, it is preferable to be able to have quick development cycle. This is actually generally true, i.e. not specific for kernel development. Under term development cycle I assume the following steps: writing code, compiling, testing, and debugging. What is specific for the kernel is that slight mistake might crash your computer. This is unlike bug in application software where most catastrophic consequence is famous 'Segmention Fault'. One other point is important with respect to kernel development. To test new feature, or corrected version of some functionality, you have to first compile kernel, and then to reboot computer in order for kernel to be loaded and executed.

Fortunately, we will stick to the functionality that can be implemented in modules. But, the possibility of crashing your computer is still there. So be careful.

The idea is as follows. We'll start with the simplest possible module and than add functionality and grow it in complexity.

Hello World, the simplest version

As I said, this is the simplest module. Upon loading into kernel, it just prints "Hello world". From that point on, this module doesn't do anything else, just waits to be removed from the kernel. When unloaded, i.e. removed from the kernel, it prints "Goodbye world".

This module is composed of a single C file and appropriate Makefile. The makefile is for convenience only, and it will transform C file into kernel module ready to be inserted into running kernel.

So, download given files, put them into separate directory, compile by issuing make with makefile1 argument. After compilation is finished you'll get several files in current directory, but most important is module itself. It has .ko extension. Insert this module into kernel using insmod utility. Watch the system log after loading the module. What you should notice is that Hello world line appeared. Then, remove module using rmmod utility. Also, by watching system log, you'll notice that the module printed goodbye message.

Ok, now some more details about this simple module. Here is the listing:

 1: #include <linux/module.h>
 2: #include <linux/kernel.h>
 3: 
 4: static int __init hw_init(void)      
 5: { 
 6: 	printk(KERN_INFO "Hello world\n"); 
 7: 	return 0; 
 8: }
 9: 
10: static void __init hw_cleanup(void)  
11: { 
12: 	printk(KERN_INFO "Goodbye world\n"); 
13: }
14:
15: module_init(hw_init);
16: module_exit(hw_cleanup);

First of all, you should forget printf(3) function and all the similar functions you used as application programmer. This is natural, since nothing from user space, or even from disks, is available during kernel boot process. If kernel would depend on glibc library to boot then we would have chicken-and-egg problem. No user space libraries are available since kernel didn't boot, and kernel can not boot since there is no user space libraries.

So, resolve this problem, Linux kernel implements all the functions necessary to boot and run system. One obvious example of such functions is printk function called in lines 6 and 12. printk is kernel's replacement for printf library function.

The other thing to note is that modules have initialization and deinitialization functions. These are called when module is loaded, or unloaded, respectively. Those functions used to be called by predefined names, just like main function is predefined function in every application program written in C. Now those functions can be called anything and with a help of two macros, module_init and module_exit, you name functions that should be called immediately after module is loaded and unloaded.

Hello World, introducing timers

Ok, now we'll introduce more functionality into our bare bones hello world module. What we are going to do is to make it periodically print Hello world. And it will do so until module is unloaded from the kernel.

Basically, we'll use timers to achieve this functionality. Timers are basic building block of almost anything in the kernel and thus they are of utmost importance. The usage of timers is quite straightforward. But first, let us try this module to see results of it's execution.

Source code is here. Also, you can get Makefile that will compile this module. Now, go compile and load this module as was explained in the previous section.

If you look into syslog (usually /var/log/messages) after you've loaded the module you'll see something similar to the following:

Dec 25 18:12:46 localhost kernel: Hello, world
Dec 25 18:12:47 localhost kernel: Hello world, again with 5
Dec 25 18:12:48 localhost kernel: Hello world, again with 4
Dec 25 18:12:49 localhost kernel: Hello world, again with 3
Dec 25 18:12:50 localhost kernel: Hello world, again with 2
Dec 25 18:12:51 localhost kernel: Hello world, again with 1
Dec 25 18:12:52 localhost kernel: Hello world, again with 0
Dec 25 18:13:01 localhost kernel: Goodbye cruel world

Basically, what this module does is that it sends a message to syslog every 1 second five times. Then, it stops and waits to be unloaded. When unloaded, it will send goodbye message to syslog.

Now, let us see how the code looks like:

 1: #include <linux/module.h>
 2: #include <linux/kernel.h>
 3: #include <linux/timer.h>
 4: 
 5: #define DELAY           1000
 6: #define ITERATIONS      5
 7: 
 8: struct timer_list tl;
 9: 
10: void timer_func(unsigned long data)
11: {
12:         if (tl.data--)
13:                 mod_timer(&tl, jiffies + DELAY);
14:
15:         printk(KERN_INFO "Hello world, again with %lu\n", data);
16: }
17: 
18: int init_module(void)
19: {
20:         printk(KERN_INFO "Hello, world\n");
21: 
22:         init_timer(&tl);
23:         tl.expires = jiffies + DELAY;
24:         tl.function = timer_func;
25:         tl.data = ITERATIONS;
26:         add_timer(&tl);
27: 
28:         return 0;
29: }
30: 
31: void cleanup_module(void)
32: {
33:         del_timer_sync(&tl);
34:         printk(KERN_INFO "Goodbye cruel world\n");
35: }

The first change we made is that we introduced new header file called linux/timer.h in line 3. This file is necessary in order to use timers.

Then, we defined global variable named tl in line 7. We will have only a single timer at any time so there is no need for some complex logic to handle the more general case, i.e. no need for dynamic memory allocation and locking mechanisms.

In the struct timer_list structure three fields are important for our purpose:

We'll skip for now timer_func function and proceed to the module initialization function. As the last time there is a printk function to mark the point when the module initialization function has been started (line 20), but this time there is some more stuff we have to do. First, the timer structure has to be initialized (line 22). Afterwards, we place relevant data in the structure (when the timer will expire), the function that will be called upon timer expiration (timer_func). Finally, timer is activated with a call to the function add_timer. This concludes initialization part.

When the timer expires the function timer_func is called, which brings us back to the lines 10-16 which we skipped. In this function first the current value of the number of iterations is checked (line 12) and then decreased. This ensures that the function will be called only a given number of times. The function mod_timer in a line 13, reschedules the timer. Finally, in the last line of the function (line 15) a message is sent to a system log.

At the end of this example, note that we don't have lines starting with module_init and module_exit. The reason is that this time we called functions for the module initialization and removal by their standard names, i.e. init_module and cleanup_module. The kernel's build system expect the functions to be called with those names, and even though we could specify module_init and module_exit directives at the end of the module, this would be redundant.

Upon request to remove the module we only delete timer, while the printk is the same as in the previous example.

Finally, there are two thing you should never do in the production code, at least if you don't fully understand all the implications. The first NO is that you avoid using global variables, like tl in the previous example. Global variables eat up memory even if the variable is not used. In other word, if the tl structure is sporadically used then it's better to define pointer to a structure and allocate memory on demand. The second NO is related to how data modification is handled. For example, in line 11 we decrease variable's value and act according to it's value, but in the real-world scenarios it might happen that several threads try to modify and/or use the same variable and that will almost certainly cause very strange and extremely hard to debug bugs.

Hello World with module information and parameters

It is a good practice to include into each module some information that will help the user to identify the purpose and the usage of the module. To obtain information about some module you use the modinfo command. For example, running this command on the pcspkr module will output something like this:

$ modinfo pcspkr
filename:       /lib/modules/2.6.27.9-159.fc10.x86_64/kernel/drivers/input/misc/pcspkr.ko
alias:          platform:pcspkr
license:        GPL
description:    PC Speaker beeper driver
author:         Vojtech Pavlik <vojtech@ucw.cz>
srcversion:     5757E9C81E627525BA9D165
depends:        
vermagic:       2.6.27.9-159.fc10.x86_64 SMP mod_unload 

We found out the full filename of this module along with the alias, but also the licence, short description, author. This module doesn't have any parameters so none was displayed. To see how it looks when the module has parameters we could look into e100:

$ modinfo e100
filename:       /lib/modules/2.6.27.9-159.fc10.x86_64/kernel/drivers/net/e100.ko
version:        3.5.23-k4-NAPI
license:        GPL
author:         Copyright(c) 1999-2006 Intel Corporation
description:    Intel(R) PRO/100 Network Driver
srcversion:     8F267BF9B220FD90B162A50
depends:        mii
vermagic:       2.6.27.9-159.fc10.x86_64 SMP mod_unload 
parm:           debug:Debug level (0=none,...,16=all) (int)
parm:           eeprom_bad_csum_allow:Allow bad eeprom checksums (int)
parm:           use_io:Force use of i/o access mode (int)

The output is slightly edited in order to remove aliases which only clatter the output but don't give us useful information for our current purpose.

So, we see that there are three parameters. The first one is integer (int) debug that determines debug level. The last two are booleans (although implemented as integer types).

What we are going to do now is to introduce information about hello world module and also add it two parameters. One that will determine how frequently it will output string to a syslog, and the other one that will determine how many times it will do that until it stops.

As usuall, you can download hello_world3.c and appropriate makefile3 and compile the module. Don't insert it yet. Now, if you run modinfo on the module you'll get somthing like this:

$ modinfo ./hello_world3.ko 
filename:       ./hello_world3.ko
version:        1.0
license:        GPL
author:         Copyleft 2008 by Stjepan Gros
description:    Simple Hello World module with periodic message output
srcversion:     A7E8A5D961F137238BCC638
depends:        
vermagic:       2.6.27.9-159.fc10.x86_64 SMP mod_unload 
parm:           delay:Delay between two messages in jiffies (int)
parm:           iterations:Number of iterations (int)

Now, let me explain how it is implemented. Actually, it's very easy, and the main change is at the beginning of the file, so I'll list only that part:

 1: MODULE_DESCRIPTION("Simple Hello World module with periodic message output");
 2: MODULE_AUTHOR("Copyleft 2008 by Stjepan Gros");
 3: MODULE_LICENSE("GPL");
 4: MODULE_VERSION("1.0");
 5: 
 6: static int delay = 1000;
 7: static int iterations = 5;
 8: module_param(delay, int, 0);
 9: module_param(iterations, int, 0);
10: MODULE_PARM_DESC(delay, "Delay between two messages in jiffies");
11: MODULE_PARM_DESC(iterations, "Number of iterations");

For the module description, author, licence and version things are obvious, i.e. you just use them. Note that licence has predefined values you have to use. In other words, if you use wrong version the modprobe and insmod commands will complain that the kernel is tainted, i.e. the module doesn't obey kernel's licence (GPL). The same is for the version. There are very good comments in the module.h header file so it's best that you look there for details.

The mechanism to define parameter is conceptually simple. First, you define static global variable (e.g. static int delay) which you also initialize to it's default value. Then, you proclaim it to be module parameter (module_param) and finally, you add description for the parameter (MODULE_PARM_DESC). And that's it. Note that the module_param has three parameters. The first one is the name of the global variable, the second one is its type, while the third one is the permission bitmask.

There are few more changes in the code. The first two are necessary in order for the parameters to work. That is, the lines 13, 23 and 25 are changed in order to take the values of the parameters instead of the constants. This is a mandatory change. One change for the safety reasons is the check that both parameters have nonzero value and if any of them is zero, then the hello world module behaves as the fist version, i.e. only prints messages on loading and unloading.

To load a module with non-default value of some parameter just add parameter=value to the insmod or modprobe commands:

# insmod ./hello_world3.ko iterations=2

As a final note in this part I suggest that you take a look into moduleparam.h header file where macros for handling module parameters are defined.

Hello World messaging through the sysfs

The sysfs and procfs are used to dynamically read and set parameters in kernel. The sysfs interface is newer and it was ment to replace procfs, but that probably will not happen any time soon. To show you how to use sysfs interface will modify our hello world module so that instead to syslog the messages will be read through the sysfs.

Hello World, dynamically allocating memory

In this part we'll introduce dynamic allocation and manipulation of the memory inside the Linux kernel. The goal is to allocate struct timer_list structure instead of statically allocating it during module compilation phase. The routine for the memory allocation is similar to the user space version malloc and is called kmalloc. It's prototype is:

void *kmalloc(size_t size, gfp_t flags);

Hello World with locking