[Date Prev][Date Next][Date Index]
[linux] Kernel Hacking functions to remember
taken from
http://netfilter.samba.org/unreliable-guides/kernel-hacking/lk-hacking-guide.html---------------------------------------------------------------------------
printk(KERN_INFO
"i = %u\n",
i);
---------------------------------------------------------------------------
for
printing an IP address use :
__u32 ipaddress;
printk(KERN_INFO "my ip:
%d.%d.%d.%d\n",
NIPQUAD(ipaddress));
---------------------------------------------------------------------------
put_user()
and get_user() are used to get and put single values (such as
an int, char,
or long) from and to userspace. A pointer into userspace
should never be
simply dereferenced: data should be copied using these
routines. Both return
-EFAULT or 0.
copy_to_user() and copy_from_user() are more general: they
copy an
arbitrary amount of data to and from userspace.
---------------------------------------------------------------------------
----------------
kmalloc()/kfree()
include/linux/slab.h
These routines are used to dynamically request
pointer-aligned chunks of
memory, like malloc and free do in userspace, but
kmalloc() takes an extra
flag word. Important values:
GFP_KERNEL
May sleep and swap to free memory. Only allowed in user context, but
is
the most reliable way to allocate memory.
GFP_ATOMIC
Don't
sleep. Less reliable than GFP_KERNEL, but may be called from
interrupt
context. You should really have a good out-of-memory error-
handling
strategy.
GFP_DMA
Allocate ISA DMA lower than 16MB. If you don't
know what that is you don't
need it. Very unreliable.
If you see
a kmem_grow: Called nonatomically from int warning message you
called a
memory allocation function from interrupt context without
GFP_ATOMIC. You
should really fix that. Run, don't walk.
If you are allocating at least
PAGE_SIZE (include/asm/page.h) bytes,
consider using __get_free_pages()
(include/linux/mm.h). It takes an order
argument (0 for page sized, 1 for
double page, 2 for four pages etc.) and
the same memory priority flag word as
above.
If you are allocating more than a page worth of bytes you can
use
vmalloc(). It'll allocate virtual memory in the kernel map. This block
is
not contiguous in physical memory, but the MMU makes it look like it is
for
you (so it'll only look contiguous to the CPUs, not to external
device
drivers). If you really need large physically contiguous memory for
some
weird device, you have a problem: it is poorly supported in Linux
because
after some time memory fragmentation in a running kernel makes it
hard. The
best way is to allocate the block early in the boot process via
the
alloc_bootmem() routine.
Before inventing your own cache of
often-used objects consider using a
slab cache in include/linux/slab.h
---------------------------------------------------------------------------
udelay()/mdelay()
include/asm/delay.h include/linux/delay.h
The udelay() function can be
used for small pauses. Do not use large
values with udelay() as you risk
overflow - the helper function mdelay() is
useful here, or even consider
schedule_timeout().
---------------------------------------------------------------------------
local_irq_save()/local_irq_restore()
include/asm/system.h
These routines disable hard interrupts on the local
CPU, and restore them.
They are reentrant; saving the previous state in their
one unsigned long
flags argument. If you know that interrupts are enabled,
you can simply use
local_irq_disable() and local_irq_enable().
---------------------------------------------------------------------------
local_bh_disable()/local_bh_enable()
include/asm/softirq.h
These routines disable soft interrupts on the local
CPU, and restore them.
They are reentrant; if soft interrupts were disabled
before, they will
still be disabled after this pair of functions has been
called. They
prevent softirqs, tasklets and bottom halves from running on the
current
CPU.
---------------------------------------------------------------------------
smp_processor_id()/cpu_[number/logical]_map()
include/asm/smp.h
smp_processor_id() returns the current processor
number, between 0 and
NR_CPUS (the maximum number of CPUs supported by Linux,
currently 32).
These values are not necessarily continuous: to get a number
between 0 and
smp_num_cpus() (the number of actual processors in this
machine), the
cpu_number_map() function is used to map the processor id to a
logical
number. cpu_logical_map() does the reverse.
---------------------------------------------------------------------------
__init/__exit/__initdata
include/linux/init.h
After boot, the kernel frees up a special section;
functions marked with
__init and data structures marked with __initdata are
dropped after boot is
complete (within modules this directive is currently
ignored). __exit is
used to declare a function which is only required on
exit: the function
will be dropped if this file is not compiled as a module.
See the header
file for use. Note that it makes no sense for a function
marked with __init
to be exported to modules with EXPORT_SYMBOL() - this will
break.
Static data structures marked as __initdata must be initialised
(as
opposed to ordinary static data which is zeroed BSS) and cannot be const.
---------------------------------------------------------------------------
__initcall()/module_init()
include/linux/init.h
Many parts of the kernel are well served as a module
(dynamically-loadable
parts of the kernel). Using the module_init() and
module_exit() macros it
is easy to write code without #ifdefs which can
operate both as a module or
built into the kernel.
The module_init()
macro defines which function is to be called at module
insertion time (if the
file is compiled as a module), or at boot time: if
the file is not compiled
as a module the module_init() macro becomes
equivalent to __initcall(), which
through linker magic ensures that the
function is called on boot.
The
function can return a negative error number to cause module loading to
fail
(unfortunately, this has no effect if the module is compiled into
the
kernel). For modules, this is called in user context, with
interrupts
enabled, and the kernel lock held, so it can sleep.
---------------------------------------------------------------------------
module_exit()
include/linux/init.h
This macro defines the function to be called at
module removal time (or
never, in the case of the file compiled into the
kernel). It will only be
called if the module usage count has reached zero.
This function can also
sleep, but cannot fail: everything must be cleaned up
by the time it
returns.
---------------------------------------------------------------------------
MOD_INC_USE_COUNT/MOD_DEC_USE_COUNT
include/linux/module.h
These manipulate the module usage count, to
protect against removal (a
module also can't be removed if another module
uses one of its exported
symbols: see below). Every reference to the module
from user context should
be reflected by this counter..
You can often
avoid having to deal with these problems by using the owner
field of the
file_operations structure. Set this field as the macro
THIS_MODULE.
For more complicated module unload locking requirements, you can set
the
can_unload function pointer to your own routine, which should return 0
if
the module is unloadable, or -EBUSY otherwise.
---------------------------------------------------------------------------
Wait
Queues include/linux/wait.h
A wait queue is used to wait for someone to
wake you up when a certain
condition is true. They must be used carefully to
ensure there is no race
condition. You declare a wait_queue_head_t, and then
processes which want
to wait for that condition declare a wait_queue_t
referring to themselves,
and place that in the
queue.
---------------------------------------------------------------------------
Declaring
Wait Queues
You declare a wait_queue_head_t using the
DECLARE_WAIT_QUEUE_HEAD() macro,
or using the init_waitqueue_head() routine
in your initialization code.
---------------------------------------------------------------------------
Queuing
Placing
yourself in the waitqueue is fairly complex, because you must put
yourself in
the queue before checking the condition. There is a macro to do
this:
wait_event_interruptible() include/linux/sched.h The first argument
is the
wait queue head, and the second is an _expression_ which is evaluated;
the
macro returns 0 when this _expression_ is true, or -ERESTARTSYS if a
signal is
received. The wait_event() version ignores signals.
Do not use the
sleep_on() function family - it is very easy to
accidentally introduce races;
almost certainly one of the wait_event()
family will do, or a loop around
schedule_timeout(). If you choose to loop
around schedule_timeout() remember
you must set the task state (with
set_current_state()) on each iteration to
avoid busy-looping.
---------------------------------------------------------------------------
Waking
Up Queued Tasks
Call wake_up() include/linux/sched.h;, which will wake up
every process in
the queue. The exception is if one has TASK_EXCLUSIVE set,
in which case
the remainder of the queue will not be woken.
---------------------------------------------------------------------------
Atomic
Operations
Certain operations are guaranteed atomic on all platforms. The
first class
of operations work on atomic_t include/asm/atomic.h; this
contains a signed
integer (at least 24 bits long), and you must use these
functions to
manipulate or read atomic_t variables. atomic_read() and
atomic_set() get
and set the counter, atomic_add(), atomic_sub(),
atomic_inc(),
atomic_dec(), and atomic_dec_and_test() (returns true if it was
decremented
to zero).
Yes. It returns true (i.e. != 0) if the atomic
variable is zero.
Note that these functions are slower than normal
arithmetic, and so should
not be used unnecessarily. On some platforms they
are much slower, like 32-
bit Sparc where they use a spinlock.
The
second class of atomic operations is atomic bit operations on a long,
defined
in include/asm/bitops.h. These operations generally take a pointer
to the bit
pattern, and a bit number: 0 is the least significant bit.
set_bit(),
clear_bit() and change_bit() set, clear, and flip the given
bit.
test_and_set_bit(), test_and_clear_bit() and test_and_change_bit() do
the
same thing, except return true if the bit was previously set; these
are
particularly useful for very simple locking.
It is possible to
call these operations with bit indices greater than
BITS_PER_LONG. The
resulting behavior is strange on big-endian platforms
though so it is a good
idea not to do this.
Note that the order of bits depends on the
architecture, and in
particular, the bitfield passed to these operations must
be at least as
large as a long.
---------------------------------------------------------------------------
Symbols
Within
the kernel proper, the normal linking rules apply (ie. unless a
symbol is
declared to be file scope with the static keyword, it can be used
anywhere in
the kernel). However, for modules, a special exported symbol
table is kept
which limits the entry points to the kernel proper. Modules
can also export
symbols.
---------------------------------------------------------------------------
EXPORT_SYMBOL()
include/linux/module.h
This is the classic method of exporting a symbol,
and it works for both
modules and non-modules. In the kernel all these
declarations are often
bundled into a single file to help genksyms (which
searches source files
for these declarations). See the comment on genksyms
and Makefiles
below.
---------------------------------------------------------------------------
EXPORT_SYMTAB
For
convenience, a module usually exports all non-file-scope symbols (ie.
all
those not declared static). If this is defined before
include/linux/module.h
is included, then only symbols explicit exported
with EXPORT_SYMBOL() will be
exported.
---------------------------------------------------------------------------
Double-linked
lists include/linux/list.h
There are three sets of linked-list routines
in the kernel headers, but
this one seems to be winning out (and Linus has
used it). If you don't have
some particular pressing need for a single list,
it's a good choice. In
fact, I don't care whether it's a good choice or not,
just use it so we can
get rid of the others.
---------------------------------------------------------------------------
Initializing
structure members
The preferred method of initializing structures is to
use the gcc Labeled
Elements extension, eg:
static struct
block_device_operations opt_fops =
{
open:
opt_open,
release:
opt_release,
ioctl:
opt_ioctl,
check_media_change:
opt_media_change,
};
This makes it easy to grep
for, and makes it clear which structure fields
are set. You should do this
because it looks cool.
---------------------------------------------------------------------------
GNU
Extensions
GNU Extensions are explicitly allowed in the Linux kernel.
Note that some
of the more complex ones are not very well supported, due to
lack of
general use, but the following are considered standard (see the GCC
info
page section "C Extensions" for more details - Yes, really the info
page,
the man page is only a short summary of the stuff in info):
Inline functions
Statement expressions (ie. the ({ and })
constructs).
Declaring attributes of a function / variable / type
(__attribute__)
Labeled elements
typeof
Zero length
arrays
Macro varargs
Arithmetic on void pointers
Non-Constant initializers
Assembler Instructions (not outside
arch/ and include/asm/)
Function names as strings (__FUNCTION__)
__builtin_constant_p()
Be wary when using long long in the
kernel, the code gcc generates for it
is horrible and worse: division and
multiplication does not work on i386
because the GCC runtime functions for it
are missing from the kernel
environment.
---------------------------------------------------------------------------
Putting
Your Stuff in the Kernel
In order to get your stuff into shape for official
inclusion, or even to make a neat patch, there's administrative work to be done:
Figure out whose pond you've been pissing in. Look at the top of the
source files, inside the MAINTAINERS file, and last of all in the CREDITS file.
You should coordinate with this person to make sure you're not duplicating
effort, or trying something that's already been rejected.
Make sure you
put your name and EMail address at the top of any files you create or mangle
significantly. This is the first place people will look when they find a bug, or
when they want to make a change.
Usually you want a configuration option
for your kernel hack. Edit Config.in in the appropriate directory (but under
arch/ it's called config.in). The Config Language used is not bash, even though
it looks like bash; the safe way is to use only the constructs that you already
see in Config.in files (see Documentation/kbuild/config-language.txt). It's good
to run "make xconfig" at least once to test (because it's the only one with a
static parser).
Variables which can be Y or N use bool followed by a
tagline and the config define name (which must start with CONFIG_). The tristate
function is the same, but allows the answer M (which defines CONFIG_foo_MODULE
in your source, instead of CONFIG_FOO) if CONFIG_MODULES is enabled.
You
may well want to make your CONFIG option only visible if CONFIG_EXPERIMENTAL is
enabled: this serves as a warning to users. There many other fancy things you
can do: see the various Config.in files for ideas.
Edit the Makefile:
the CONFIG variables are exported here so you can conditionalize compilation
with `ifeq'. If your file exports symbols then add the names to MX_OBJS or
OX_OBJS instead of M_OBJS or O_OBJS, so that genksyms will find them.
Document your option in Documentation/Configure.help. Mention
incompatibilities and issues here. Definitely end your description with " if in
doubt, say N " (or, occasionally, `Y'); this is for people who have no idea what
you are talking about.
Put yourself in CREDITS if you've done something
noteworthy, usually beyond a single file (your name should be at the top of the
source files anyway). MAINTAINERS means you want to be consulted when changes
are made to a subsystem, and hear about bugs; it implies a more-than-passing
commitment to some part of the code.
Finally, don't forget to read
Documentation/SubmittingPatches and possibly
Documentation/SubmittingDrivers.
---------------------------------------------------------------------------