"Process are like human beings: they are generated, they have a more or less significant life, they optionally generate one or more child processes, and eventually they die."
If you already searched at least a little bit about programming languages certainly you find some buzzwords like: multi threading, concurrency, parallelism, single-thread application, etc. And maybe you was a little confused about all this concepts, but don't worry, the first step to learning all this is understanding process.
Well, process are a abstract thing. Following the Kernel book definition: "a process is an instance of a program in execution", from the kernel's perspective, the process is like a entity to which system resources are allocated. To abstract? Let's see a real example:
To manage all this processes the kernel must know what each process is doing and what is your current state, to achieve this each process have a process descriptor which is basically a complex c struct that points to other structs, one of most important fields on this struct is the process state, that can assume three values:
- TASK_RUNNING: self-explanatory, the process is running;
- TASK_INTERRUPTIBLE: the process is sleeping, that is, waiting some condition.
- TASK_UNINTERRUPTIBLE: like the above state, except that delivering a signal to the sleeping process leaves its state unchanged.
- TASK_STOPPED: process execution has been stopped, that happens after the process receive a SIGSTOP (Ctrl+C), SIGTSTP (Ctrl+Z), SIGTTIN or SIGTTOU signal.
- TASK_TRACED: process execution has been stopped by a debugger(!).
- EXIT_ZOMBIE: process execution is terminated, but the parent process has not yet issued a wait4() or waitpid() syscall to return information about the dead process.
- EXIT_DEAD: the final state, the process is being removed by the system because the parent process has just issued a wait4() or waitpid() syscall for it.
Is important to notice that only one process can run at a time on a single CPU.
Process Types
- User Process: process that are initiated by a user and runs in user space (userland)
- Daemon Process: a application that is designed to run in the background, typically managing some kind of ongoing service.
- Kernel Process: similar to daemon process, but run only in kernel space and so, have full access to kernel data structures.
Process ID (PID)
In Unix-like operating systems, and so Linux, the processes are identified by a process id, which is stored in the pid field of the process descriptor. PIDs are numbered sequentially: the PID of a newly created process is normally the PID of the previously created process increased by one. For sure there is a upper limit on the PID values (PID_MAX_DEFAULT - 1), you can check this constant by running:
$ cat /proc/sys/kernel/pid_max
-- acrescentar explicação da diferença entre PID, TGID, etc. ref
Double linked list
The Linux kernel stores the pointer to each process_descriptor in a circular double-linked list. That's kinda nice thing my professor don't tell on the class :(
When looking for a new process to run on a CPU, the kernel has to consider only the runnable processes (that is, the processes in the TASK_RUNNING state).
Earlier Linux versions put all runnable processes in the same list called runqueue, so, for the scheduler find the "best" runnable process it were compelled to scan the whole list. Since Linux 2.6 things has changed. The trick was split the runqueue in many lists of runnable processes, one list per process priority. Each task_struct descriptor includes a run_list field of type list_head. If the process priority is equal to k (a value ranging between 0 and 139), the run_list field links the process descriptor into the list of runnable processes having priority k. Furthermore, on a multiprocessor system, each CPU has its own runqueue, that is, its own set of lists of processes. So, to achieve better performance in scheduler operations the runqueues list has been split into 140 different lists, per CPU!
The pidhash Table and chained lists
Imagine if the kernel wants to find a particularly process by its PID, as they are in a double linked list it must to parse to each element to find the PID on position n, thats seems O(n) complexity right? No too bad, but not too great. To speed up this process four hash tables have been introduced, one for each type of IDs: PID, TGID, PGID, SID. These tables are dynamically allocated during the kernel initialization phase. I don't want to show too much code (probably its outdated), but that is a interesting case:
unsigned long hash_long(unsigned long val, unsigned int bits)
{
unsigned long hash = val * 0x9e370001UL;
return hash >> (32 - bits);
}
A more detailed explanation can be found here, but I like to show this particularly function for the "Magic constant": 0x9e370001UL (=2,654,404,609). Just because is a prime number easily multiplied by additions and bit shifts :)
You may be thinking: "so far so good, but what about hash collisions?". If you don't, shame on you, take this notes aside and go study basic data structures! You come back? Good. So, Linux uses chaining to handle colliding PIDs; each table entry is the head of a double linked list of colliding process descriptors.
How process Are Organized
The runqueue lists group all process in a TASK_RUNNING state. To group processes in other states, the Linux kernel uses different strategies:
- TASK_STOPPED, EXIT_ZOMBIE or EXIT_DEAD states: are not linked in specific lists. They are accessed only via PID.
- TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE states: as the process state does not provide enough information to retrieve the process quickly it is necessary to introduce additional lists of processes. They are called wait queues.
Wait queues
They have several uses in the kernel, particularly for interrupt handling, process synchronization, and timing. As processes must often wait for some event to occur, such as for a disk operation to terminate, a system resource to be released, or a fixed interval of time to elapse. Wait queues implement conditional waits on events: a process wishing to wait for a specific event places itself in the proper wait queue and relinquishes control. Therefore, a wait queue represents a set of sleeping processes, which are woken up by the kernel when some condition becomes true. And they are implemented as, guess what? Yup, double linked lists.
Because wait queues are modified by interrupt handlers as well as by major kernel functions, the doubly linked lists must be protected from concurrent accesses. Synchronization is achieved by the lock spin lock in the wait queue head. Each element in the the queue represents a sleeping process. However, it is not always convenient to wake up all sleeping processes in a wait queue. For instance, it two or more processes are waiting for exclusive access to some resource to be released, it makes sense to wake up just one process in the wait queue. This process takes the resource, while the other processes continue to sleep. This avoids a problem know as "thundering herd"
Thus, there are two kind of process: exclusive processes are selectively woken up by the kernel, while nonexclusive processes are always woken up by the kernel when the event occurs.
Process Switch
To control the execution of processes, the kernel must be able to suspend the execution of the process running on the CPU and resume the execution of some other process previously suspended. This can be called: process switch, task switch, or context switch.
Hardware Switch
While each process as its own address space, all them have to share the CPU registers. So before resuming the execution of a process, the kernel must ensure that each such register is loaded with the value it had when the process was suspended.
The hardware context is the set of data that must be loaded into the registers before the process resumes its execution on CPU. It is a subset of the process execution context, which includes all information needed for the process execution. In Linux, a part of the hardware context of a process is stored in the process descriptor, while the remaining part is saved in the Kernel Mode stack. As context switch occur quite often, it is important to minimize the time spent in saving and loading hardware contexts.
Process switching occurs only in Kernel Mode. The contents of all registers used by a process in User Mode have already been saved on the Kernel Mode Stack (KMS) before performing process switching.
Essentially, every process switch consists of two steps:
- Switching the Page Global Directory to install a new address space;
- Switching the KMS and the hardware context, which provides all the information needed by the kernel to execute the new process, including the CPU registers.
Creating Processes
Unix operating system rely heavily on process creation to satisfy user requests. For example, the shell creates a new process that executes another copy of the shell whenever the user enters a command. Unix treat all processes in the same way: resources owned by the parent process are duplicated in the child process. This approach makes process creation very slow and inefficient. Modern Unix kernels solve this problem by three different mechanisms:
- Copy on write COW;
- Lightweight processes allow both the parent and the child to share many per-process kernel data structures.
- The vfork() syscall creates a process that shares the memory address space of its parent. To avoid the parent from overwriting data needed by the child, the parent's execution is blocked until the child exits or executes a new program.