0xAX 9 роки тому
батько
коміт
fbf8f6e58a
1 змінених файлів з 415 додано та 0 видалено
  1. 415 0
      Timers/timers-5.md

+ 415 - 0
Timers/timers-5.md

@@ -0,0 +1,415 @@
+Timers and time management in the Linux kernel. Part 5.
+================================================================================
+
+Introduction to the `clockevents` framework
+--------------------------------------------------------------------------------
+
+This is fifth part of the [chapter](https://0xax.gitbooks.io/linux-insides/content/Timers/index.html) which describes timers and time management related stuff in the Linux kernel. As you might noted from the title of this part, the `clockevents` framework will be discussed. We already saw one framework in the [second](https://0xax.gitbooks.io/linux-insides/content/Timers/timers-2.html) part of this chapter. It was `clocksource` framework. Both of these frameworks represent timekeeping abstractions in the Linux kernel.
+
+At first let's refresh your memory and try to remember what is it `clocksource` framework and and what its purpose. The main goal of the `clocksource` framework is to provide `timeline`. As described in the [documentation](https://github.com/0xAX/linux/blob/master/Documentation/timers/timekeeping.txt):
+
+> For example issuing the command 'date' on a Linux system will eventually read the clock source to determine exactly what time it is.
+
+The Linux kernel supports many different clock sources. You can find some of them in the [drivers/closksource](https://github.com/torvalds/linux/tree/master/drivers/clocksource). For example old good [Intel 8253](https://en.wikipedia.org/wiki/Intel_8253) - [programmable interval timer](https://en.wikipedia.org/wiki/Programmable_interval_timer) with `1193182` Hz frequency, yet another one - [ACPI PM](http://uefi.org/sites/default/files/resources/ACPI_5.pdf) timer with `3579545` Hz frequence. Besides the [drivers/closksource](https://github.com/torvalds/linux/tree/master/drivers/clocksource) directory, each architecture may provide own architecture-specific clock sources. For example [x86](https://en.wikipedia.org/wiki/X86) architecture provides [High Precision Event Timer](https://en.wikipedia.org/wiki/High_Precision_Event_Timer), or for example [powerpc](https://en.wikipedia.org/wiki/PowerPC) provides access to the processor timer through `timebase` register.
+
+Each clock source provides monotonic atomic counter. As I already wrote, the Linux kernel supports a huge set of different clock source and each clock source has own parameters like [frequency](https://en.wikipedia.org/wiki/Frequency). The main goal of the `clocksource` framework is to provide [API](https://en.wikipedia.org/wiki/Application_programming_interface) to select best available clock source in the system i.e. a clock source with the highest frequency. Additional goal of the `clocksource` framework is to represent an atomic counter provided by a clock source in human units. In this time, nanoseconds are the favorite choice for the time value units of the given clock source in the Linux kernel.
+
+The `clocksource` framework represented by the `clocksource` structure which is defined in the [include/linux/clocksource.h](https://github.com/torvalds/linux/blob/master/include/linux/clocksource.h) header code file which contains `name` of a clock source, ratiing of certain clock sourcein the system (a clock source with the higher frequence has the biggest rating in the system), `list` of all registered clock source in the system, `enable` and `disable` fields to enable and disable a clock source, pointer to the `read` function which must return an atomic counter of a clock source and etc.
+
+Additionally the `clocksource` structure provides two fields: `mult` and `shift` which are needed for translation of an atomic counter which is provided by a certain clock source to the human units, i.e. [nanoseconds](https://en.wikipedia.org/wiki/Nanosecond). Translation occurs via following formula:
+
+```
+ns ~= (clocksource * mult) >> shift
+```
+
+As we already know, besides the `clocksource` structure, the `clocksource` framework provides an API for registration of clock source with different frequency scale factor:
+
+```C
+static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
+static inline int clocksource_register_khz(struct clocksource *cs, u32 khz)
+```
+
+A clock source unregistration:
+
+```C
+int clocksource_unregister(struct clocksource *cs)
+```
+
+and etc.
+
+Additionally to the `clocksource` framework, the Linux kernel provides `clockevents` framework. As described in the [documentation](https://github.com/0xAX/linux/blob/master/Documentation/timers/timekeeping.txt):
+
+> Clock events are the conceptual reverse of clock sources
+
+Main goal of the is to manage clock event devices or in other words - to manage devices that allow to register an event or in other words [interrupt](https://en.wikipedia.org/wiki/Interrupt) that is going to happen at a defined point of time in the future.
+
+Now we know a little about the `clockevents` framework in the Linux kernel, and now time is to see on it [API](https://en.wikipedia.org/wiki/Application_programming_interface).
+
+API of `clockevents` framework
+-------------------------------------------------------------------------------
+
+The main structure which described a clock event device is `clock_event_device` structure. This structure is defined in the [include/linux/clockchips.h](https://github.com/torvalds/linux/blob/master/include/linux/clockchips.h) header file and contains a huge set of fields. as well as the `clocksource` structure it has `name` fields which contains human readable name of a clock event device, for example [local APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) timer:
+
+```C
+static struct clock_event_device lapic_clockevent = {
+    .name                   = "lapic",
+    ...
+    ...
+    ...
+}
+```
+
+Addresses of the `event_handler`, `set_next_event`, `next_event` functions for a certain clock event device which are an [interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler), setter of next event and local storage for next event respectively. Yet another field of the `clock_event_device` structure is - `features` field. Its value maybe on of the following generic features:
+
+```C
+#define CLOCK_EVT_FEAT_PERIODIC	0x000001
+#define CLOCK_EVT_FEAT_ONESHOT		0x000002
+```
+
+Where the `CLOCK_EVT_FEAT_PERIODIC` represents device which may be programmed to generate events periodically. The `CLOCK_EVT_FEAT_ONESHOT` represents device which may generate an event only once. Besides these two features, there are also architecture-specific features. For example [x86_64](https://en.wikipedia.org/wiki/X86-64) supports two additional features:
+
+```C
+#define CLOCK_EVT_FEAT_C3STOP		0x000008
+```
+
+The first `CLOCK_EVT_FEAT_C3STOP` means that a clock event device will be stopped in the [C3](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Device_states) state. Additionally the `clock_event_device` structure has `mult` and `shift` fields as well as `clocksource` structure. The `clocksource` structure also contains other fields, but we will consider it later.
+
+After we considered part of the `clock_event_device` structure, time is to look at the `API` of the `clockevents` framework. To work with a clock envet device, first of all we need to initialize `clock_event_device` structure and register a clock events device. The `clockevents` framework provides following `API` for registration of clock event devies:
+
+```C
+void clockevents_register_device(struct clock_event_device *dev)
+{
+   ...
+   ...
+   ...
+}
+```
+
+This function defined in the [kernel/time/clockevents.c](https://github.com/torvalds/linux/blob/master/kernel/time/clockevents.c) source code file and as we may see, the `clockevents_register_device` function takes only one parameter:
+
+* address of a `clock_event_device` structure which represents a clock event device.
+
+So, to register a clock event device, at first we need to initialize `clock_event_device` structure with parameters of a certain clock event device. Let's take a look at one random clock event device in the Linux kernel source code. We can find one in the [drivers/closksource](https://github.com/torvalds/linux/tree/master/drivers/clocksource) directory or try to take a look at an architecture-specific clock event device. Let's take for example - [Periodic Interval Timer (PIT) for at91sam926x](http://www.atmel.com/Images/doc6062.pdf). You can find its implementation in the [drivers/closksource](https://github.com/torvalds/linux/tree/master/drivers/clocksource/timer-atmel-pit.c).
+
+First of all let's look at initialization of the `clock_event_device` structure. This occurs in the `at91sam926x_pit_common_init` function:
+
+```C
+struct pit_data {
+    ...
+    ...
+    struct clock_event_device       clkevt;
+    ...
+    ...
+};
+
+static void __init at91sam926x_pit_common_init(struct pit_data *data)
+{
+    ...
+    ...
+    ...
+    data->clkevt.name = "pit";
+    data->clkevt.features = CLOCK_EVT_FEAT_PERIODIC;
+    data->clkevt.shift = 32;
+    data->clkevt.mult = div_sc(pit_rate, NSEC_PER_SEC, data->clkevt.shift);
+    data->clkevt.rating = 100;
+    data->clkevt.cpumask = cpumask_of(0);
+
+    data->clkevt.set_state_shutdown = pit_clkevt_shutdown;
+    data->clkevt.set_state_periodic = pit_clkevt_set_periodic;
+    data->clkevt.resume = at91sam926x_pit_resume;
+    data->clkevt.suspend = at91sam926x_pit_suspend;
+    ...
+}
+```
+
+Here we can see that `at91sam926x_pit_common_init` takes one parameter - pointer to the `pit_data` structure which contains `clock_event_device` structure which will contain clock event related information of the `at91sam926x` [periodic Interval Timer](https://en.wikipedia.org/wiki/Programmable_interval_timer). At the start we fill `name` of the timer device and its `features`. In our case we deal with periodic timer which as we already know may be programmed to generate events periodically.
+
+The next two fields `shift` and `mult` are familiar to us. They will be used to translate counter of our timer to nanoseconds. After this we set rating of the timer  to `100`. This means if there will not be timers with higher rating in the system, this timer will be used for timekeeping. The next field - `cpumask` indicates for which processors in the system the device will work. In our case, the device will work for the first processor. The `cpumask_of` macro defined in the [include/linux/cpumask.h](https://github.com/torvalds/linux/tree/master/include/linux/cpumask.h) header file and just expands to the call of the:
+
+```C
+#define cpumask_of(cpu) (get_cpu_mask(cpu))
+```
+
+Where the `get_cpu_mask` returns the cpumask containing just a given `cpu` number. More about `cpumasks` concept you may read in the [CPU masks in the Linux kernel](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) part. In the last four lines of code we set callbacks for the clock event device suspend/resume, device shutdown and update of the clock event device state.
+
+After we finished with the initialization of the `at91sam926x` periodic timer, we can register it by the call of the following functions:
+
+```C
+clockevents_register_device(&data->clkevt);
+```
+
+Now we can consider implementation of the `clockevent_register_device` function. As I already wrote above, this function is defined in the [kernel/time/clockevents.c](https://github.com/torvalds/linux/blob/master/kernel/time/clockevents.c) source code file and starts from the initialization of the initial event device state:
+
+```C
+clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);
+```
+
+Actually, an event device may be in one of this states:
+
+```C
+enum clock_event_state {
+	CLOCK_EVT_STATE_DETACHED,
+	CLOCK_EVT_STATE_SHUTDOWN,
+	CLOCK_EVT_STATE_PERIODIC,
+	CLOCK_EVT_STATE_ONESHOT,
+	CLOCK_EVT_STATE_ONESHOT_STOPPED,
+};
+```
+
+Where:
+
+* `CLOCK_EVT_STATE_DETACHED` - a clock event device is not not used by `clockevents` framework. Actually it is initial state of all clock event devices;
+* `CLOCK_EVT_STATE_SHUTDOWN` - a clock event device is powered-off;
+* `CLOCK_EVT_STATE_PERIODIC` - a clock event device may be programmed to generate event periodically;
+* `CLOCK_EVT_STATE_ONESHOT`  - a clock event device may be programmed to generate event only once;
+* `CLOCK_EVT_STATE_ONESHOT_STOPPED` - a clock event device was programmed to generate event only once and now it is temporary stopped.
+
+The implementation of the `clock_event_set_state` function is pretty easy:
+
+```C
+static inline void clockevent_set_state(struct clock_event_device *dev,
+					enum clock_event_state state)
+{
+	dev->state_use_accessors = state;
+}
+```
+
+As we can see, it just fills the `state_use_accessors` field of the given `clock_event_device` structure with the given value which is in our case is `CLOCK_EVT_STATE_DETACHED`. Acutally all clock event devices has this initial state during registration. The `state_use_accessors` field of the `clock_event_device` structure provides `current` state of the clock event device.
+
+After we have set initial state of the given `clock_event_device` structure we check that the `cpumask` of the given clock event device is not zero:
+
+```C
+if (!dev->cpumask) {
+	WARN_ON(num_possible_cpus() > 1);
+	dev->cpumask = cpumask_of(smp_processor_id());
+}
+```
+
+Remember that we have set the `cpumask` of the `at91sam926x` periodic timer to first processor. If the `cpumask` field is zero, we check the number of possible processors in the system and print warning message if it is less than on. Additionally we set the `cpumask` of the given clock event device to the current processor. If you are interested in how the `smp_processor_id` macro is implemented, you can read more about it in the fourth [part](https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-4.html) of the Linux kernel initialization process chapter.
+
+After this check we lock the actual code of the clock event device registration by the call following macros:
+
+```C
+raw_spin_lock_irqsave(&clockevents_lock, flags);
+...
+...
+...
+raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+```
+
+Additionally the `raw_spin_lock_irqsave` and the `raw_spin_unlock_irqrestore` macros disable local interrupts, however interrupts on other processors still may occur. We need to do it to prevent potential [deadlock](https://en.wikipedia.org/wiki/Deadlock) if we adding new clock event device to the list of clock event devices and an interrupt occurs from other clock event device.
+
+We can see following code of clock event device registration between the `raw_spin_lock_irqsave` and `raw_spin_unkock_irqrestore` macros:
+
+```C
+list_add(&dev->list, &clockevent_devices);
+tick_check_new_device(dev);
+clockevents_notify_released();
+```
+
+First of all we add the given clock event device to the list of clock event devices which is represented by the `clockevent_devices`:
+
+```C
+static LIST_HEAD(clockevent_devices);
+```
+
+At the next step we call the `tick_check_new_device` function which is defined in the [kernel/time/tick-common.c](https://github.com/torvalds/linux/blob/master/kernel/time/tick-common.c) source code file and checks do the new registered clock event device should be used or not. The `tick_check_new_device` function checks the given `clock_event_device` gets the current registered tick device which is represented by the `tick_device` structure and compares their ratings and features. Actually `CLOCK_EVT_STATE_ONESHOT` is preferred:
+
+```C
+static bool tick_check_preferred(struct clock_event_device *curdev,
+				 struct clock_event_device *newdev)
+{
+	if (!(newdev->features & CLOCK_EVT_FEAT_ONESHOT)) {
+		if (curdev && (curdev->features & CLOCK_EVT_FEAT_ONESHOT))
+			return false;
+		if (tick_oneshot_mode_active())
+			return false;
+	}
+
+	return !curdev ||
+		newdev->rating > curdev->rating ||
+	       !cpumask_equal(curdev->cpumask, newdev->cpumask);
+}
+```
+
+If the new registered clock event device is more preferred than old tick device, we exchange old and new registered devices and install new device:
+
+```C
+clockevents_exchange_device(curdev, newdev);
+tick_setup_device(td, newdev, cpu, cpumask_of(cpu));
+```
+
+The `clockevents_exchange_device` function releases or in other words deleted the old clock event device from the `clockevent_devices` list. The next function - `tick_setup_device` as we may understand from its name, setups new tick device. This function check the mode of the new registered clock event device and call the `tick_setup_periodic` function or the `tick_setup_oneshot` depends on the tick device mode:
+
+```C
+if (td->mode == TICKDEV_MODE_PERIODIC)
+	tick_setup_periodic(newdev, 0);
+else
+	tick_setup_oneshot(newdev, handler, next_event);
+```
+
+Both of this functions calls the `clockevents_switch_state` to change state of the clock event device and the `clockevents_program_event` function to set next event of clock event device based on delta between the maximum and minimum difference current time and time for the next event. The `tick_setup_periodic`:
+
+```C
+clockevents_switch_state(dev, CLOCK_EVT_STATE_PERIODIC);
+clockevents_program_event(dev, next, false))
+```
+
+and the `tick_setup_oneshot_periodic`:
+
+```C
+clockevents_switch_state(newdev, CLOCK_EVT_STATE_ONESHOT);
+clockevents_program_event(newdev, next_event, true);
+```
+
+The `clockevents_switch_state` function checks that the clock event device is not in the given state and calls the `__clockevents_switch_state` function from the same source code file:
+
+```C
+if (clockevent_get_state(dev) != state) {
+	if (__clockevents_switch_state(dev, state))
+		return;
+```
+
+The `__clockevents_switch_state` function just makes a call of the certain callback depends on the given state:
+
+```C
+static int __clockevents_switch_state(struct clock_event_device *dev,
+				      enum clock_event_state state)
+{
+	if (dev->features & CLOCK_EVT_FEAT_DUMMY)
+		return 0;
+
+	switch (state) {
+	case CLOCK_EVT_STATE_DETACHED:
+	case CLOCK_EVT_STATE_SHUTDOWN:
+		if (dev->set_state_shutdown)
+			return dev->set_state_shutdown(dev);
+		return 0;
+
+	case CLOCK_EVT_STATE_PERIODIC:
+		if (!(dev->features & CLOCK_EVT_FEAT_PERIODIC))
+			return -ENOSYS;
+		if (dev->set_state_periodic)
+			return dev->set_state_periodic(dev);
+		return 0;
+    ...
+    ...
+    ...
+```
+
+In our case for `at91sam926x` periodic timer, the state is the `CLOCK_EVT_FEAT_PERIODIC`:
+
+```C
+data->clkevt.features = CLOCK_EVT_FEAT_PERIODIC;
+data->clkevt.set_state_periodic = pit_clkevt_set_periodic;
+```
+
+So, fo the `pit_clkevt_set_periodic` callback will be called. If we will read the documentation of the [Periodic Interval Timer (PIT) for at91sam926x](http://www.atmel.com/Images/doc6062.pdf), we will see that there is `Periodic Interval Timer Mode Register` which allows us to control of periodic interval timer.
+
+It looks like:
+
+```
+31                                                   25        24
++---------------------------------------------------------------+
+|                                          |  PITIEN  |  PITEN  |
++---------------------------------------------------------------+
+23                            19                               16
++---------------------------------------------------------------+
+|                             |               PIV               |
++---------------------------------------------------------------+
+15                                                              8
++---------------------------------------------------------------+
+|                            PIV                                |
++---------------------------------------------------------------+
+7                                                               0
++---------------------------------------------------------------+
+|                            PIV                                |
++---------------------------------------------------------------+
+```
+
+Where `PIV` or `Periodic Interval Value` - defines the value compared with the primary `20-bit` counter of the Periodic Interval Timer. The `PITEN` or `Period Interval Timer Enabled` if the bit is `1` and the `PITIEN` or `Periodic Interval Timer Interrupt Enable` if the bit is `1`. So, to set peridic mode, we need to set `24`, `25` bits in the `Periodic Interval Timer Mode Register`. And we are doing it in the `pit_clkevt_set_periodic` function:
+
+```C
+static int pit_clkevt_set_periodic(struct clock_event_device *dev)
+{
+        struct pit_data *data = clkevt_to_pit_data(dev);
+        ...
+        ...
+        ...
+        pit_write(data->base, AT91_PIT_MR,
+                  (data->cycle - 1) | AT91_PIT_PITEN | AT91_PIT_PITIEN);
+
+        return 0;
+}
+```
+
+Where the `AT91_PT_MR`, `AT91_PT_PITEN` and the `AT91_PIT_PITIEN` are declared as:
+
+```C
+#define AT91_PIT_MR             0x00
+#define AT91_PIT_PITIEN       BIT(25)
+#define AT91_PIT_PITEN        BIT(24)
+```
+
+After the setup of the new clock event device is finished, we can return to the `clockevents_register_device` function. The last function in the `clockevents_register_device` function is:
+
+```C
+clockevents_notify_released();
+```
+
+This function checks the `clockevents_released` list which contains released clock event devices (remember that they may occur after the call of the ` clockevents_exchange_device` function). If this list is not empty, we go through clock event devices from the `clock_events_released` list and delete it from the `clockevent_devices`:
+
+```C
+static void clockevents_notify_released(void)
+{
+	struct clock_event_device *dev;
+
+	while (!list_empty(&clockevents_released)) {
+		dev = list_entry(clockevents_released.next,
+				 struct clock_event_device, list);
+		list_del(&dev->list);
+		list_add(&dev->list, &clockevent_devices);
+		tick_check_new_device(dev);
+	}
+}
+```
+
+That's all. From this moment we have registered new clock event device. So the usage of the `clockevents` framework is simple and clear. Architectures registered their clock event devices, in the clock events core. Users of the clockevents core can get clock event devices for their use. The `clockevents` framework provides notification mechanisms for various clock related management events like a clock event device registered or unregistered, a processor is offlined in system which supports [CPU hotplug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) and etc.
+
+We saw implementation only of the `clockevents_register_device` function. But genrally, the clock event layer [API](https://en.wikipedia.org/wiki/Application_programming_interface) is small. Besides the `API` for clock event device registration, the `clockevents` framework provides functions to schedule the next event interrupt, clock event device notification service and support for suspend and resume for clock event devices.
+
+If you want to know more about `clockevents` API you can start to research following source code and header files: [kernel/time/tick-common.c](https://github.com/torvalds/linux/blob/master/kernel/time/tick-common.c), [kernel/time/clockevents.c](https://github.com/torvalds/linux/blob/master/kernel/time/clockevents.c) and [include/linux/clockchips.h](https://github.com/torvalds/linux/blob/master/include/linux/clockchips.h).
+
+That's all.
+
+Conclusion
+-------------------------------------------------------------------------------
+
+This is the end of the fifth part of the [chapter](https://0xax.gitbooks.io/linux-insides/content/Timers/index.html) that describes timers and timer management related stuff in the Linux kernel. In the previous part got acquainted with the `timers` concept. In this part we continied to learn time management related stuff in the Linux kernel and saw a little about yet another framework - `clockevents`.
+
+If you have questions or suggestions, feel free to ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/0xAX/linux-insides/issues/new).
+
+**Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
+
+Links
+-------------------------------------------------------------------------------
+
+* [timekeeping documentation](https://github.com/0xAX/linux/blob/master/Documentation/timers/timekeeping.txt)
+* [Intel 8253](https://en.wikipedia.org/wiki/Intel_8253)
+* [programmable interval timer](https://en.wikipedia.org/wiki/Programmable_interval_timer)
+* [ACPI pdf](http://uefi.org/sites/default/files/resources/ACPI_5.pdf)
+* [x86](https://en.wikipedia.org/wiki/X86)
+* [High Precision Event Timer](https://en.wikipedia.org/wiki/High_Precision_Event_Timer)
+* [powerpc](https://en.wikipedia.org/wiki/PowerPC)
+* [frequency](https://en.wikipedia.org/wiki/Frequency)
+* [API](https://en.wikipedia.org/wiki/Application_programming_interface)
+* [nanoseconds](https://en.wikipedia.org/wiki/Nanosecond)
+* [interrupt](https://en.wikipedia.org/wiki/Interrupt)
+* [interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler)
+* [local APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller)
+* [C3 state](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Device_states) 
+* [Periodic Interval Timer (PIT) for at91sam926x](http://www.atmel.com/Images/doc6062.pdf)
+* [CPU masks in the Linux kernel](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html)
+* [deadlock](https://en.wikipedia.org/wiki/Deadlock)
+* [CPU hotplug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt)
+* [previous part](https://0xax.gitbooks.io/linux-insides/content/Timers/timers-3.html)