Linux电源管理——Suspend-to-Idle(s2idle) 流程
目录
一、常见的 Suspend 方式
1、Suspend-to-idle
2、Standby
3、Suspend-to-RAM
4、Hibernation
二、代码分析
1、state_store
2、suspend_enter
3、s2idle_loop
4、idle loop
5、psci_enter_idle_state
6、cpu_suspend
7、开始 resume 流程
8、开启 IRQ 中断
9、pm_system_wakeup
三、总结
References
Linux Version:linux-6.1
一、常见的 Suspend 方式
1、Suspend-to-idle
suspend to idle 也称为 S2I 或 S2Idle,它通过冻结用户空间程序、暂停计时并将所有 I/O 设备置于低功耗状态,这样 CPU 就可以在系统 suspend 时处于最深的 idle state。系统由内中断唤醒,所以理论上所有可产生内中断的设备都可以作为S2Idle的唤醒源。
进入方式:
echo s2idle > /sys/power/mem_state
echo mem > /sys/power/state
或
echo freeze > /sys/power/state
2、Standby
standby 也称为待机,除了 S2Idle 中需要 suspend 的设备外,其还会下线 Non boot cpu,这样做能更节能,但恢复时间更长。相较于 S2Idle,可以唤醒系统的设备更少,可能需要依赖平台设置来进行唤醒。
进入方式:
echo shallow > /sys/power/mem_state
echo mem > /sys/power/state
或
echo standby > /sys/power/state
3、Suspend-to-RAM
将系统状态保存至RAM,又称STR或S2RAM,除了内存外的所有资源设备都处于低功耗状态,大多情况下所有外设总线都会断电,可以将系统从S2RAM唤醒的设备更少,并且需要平台来实现唤醒功能。
进入方式:
echo deep > /sys/power/mem_state
echo mem > /sys/power/state
4、Hibernation
Hibernation 翻译过来就是冬眠,也被称为Suspend-to-disk或STD,有更强的节能功能。但是需要底层代码来恢复系统。当触发Hibernation 时,内核停止所有系统活动,并创建内存快照映像,并写入持久存储设备(如Disk),保存好镜像之后系统进入低功耗状态,除了少量的唤醒设备外,几乎所有硬件组件(包括RAM)都断电。
进入方式:
echo disk > /sys/power/state
二、代码分析
本篇文章主要是以 cpuidle framework 为基础,对Suspend-to-idle(s2idle) 的流程进行分析,但在分析之前建议读者可以先去了解一下 cpuidle framework,因为 suspend 后面的流程就是通过 cpuidle framework 实现的,关于 cpuidle framework 的内容可点击这里查看。
1、state_store
下面开始分析代码流程,进入 state_store 函数,如下:
// kernel/power/main.c
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
suspend_state_t state;
int error;
error = pm_autosleep_lock();
if (error)
return error;
if (pm_autosleep_state() > PM_SUSPEND_ON) {
error = -EBUSY;
goto out;
}
state = decode_state(buf, n);
if (state < PM_SUSPEND_MAX) {
if (state == PM_SUSPEND_MEM)
state = mem_sleep_current;
error = pm_suspend(state);
} else if (state == PM_SUSPEND_MAX) {
error = hibernate();
trace_android_vh_hibernate_state(error);
} else {
error = -EINVAL;
}
out:
pm_autosleep_unlock();
return error ? error : n;
}
这里主要就是通过用户输入不同的 state 来确定使用那一种 suspend 方式,因为是 suspend to idle ,所以 state = PM_SUSPEND_TO_IDLE,并且会进入 pm_suspend 函数,为了方便分析,直接跳过其它代码,进入到关键的函数,如下:
pm_suspend
-> enter_state
-> s2idle_begin ======> s2idle_state = S2IDLE_STATE_NONE
-> suspend_prepare ======> suspend_freeze_processes()
-> suspend_devices_and_enter
-> suspend_console
-> dpm_suspend_start
-> suspend_test_finish
-> suspend_enter
2、suspend_enter
suspend_enter 函数如下:
// kernel/power/suspend.c
/**
* suspend_enter - Make the system enter the given sleep state.
* @state: System sleep state to enter.
* @wakeup: Returns information that the sleep state should not be re-entered.
*
* This function should be called after devices have been suspended.
*/
static int suspend_enter(suspend_state_t state, bool *wakeup)
{
int error, last_dev;
error = platform_suspend_prepare(state);
if (error)
goto Platform_finish;
error = dpm_suspend_late(PMSG_SUSPEND);
if (error) {
last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
last_dev %= REC_FAILED_NUM;
pr_err("late suspend of devices failed\n");
log_suspend_abort_reason("late suspend of %s device failed",
suspend_stats.failed_devs[last_dev]);
goto Platform_finish;
}
error = platform_suspend_prepare_late(state);
if (error)
goto Devices_early_resume;
error = dpm_suspend_noirq(PMSG_SUSPEND);
if (error) {
last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
last_dev %= REC_FAILED_NUM;
pr_err("noirq suspend of devices failed\n");
log_suspend_abort_reason("noirq suspend of %s device failed",
suspend_stats.failed_devs[last_dev]);
goto Platform_early_resume;
}
error = platform_suspend_prepare_noirq(state);
if (error)
goto Platform_wake;
if (suspend_test(TEST_PLATFORM))
goto Platform_wake;
if (state == PM_SUSPEND_TO_IDLE) {
s2idle_loop();
goto Platform_wake;
}
error = pm_sleep_disable_secondary_cpus();
if (error || suspend_test(TEST_CPUS)) {
log_suspend_abort_reason("Disabling non-boot cpus failed");
goto Enable_cpus;
}
arch_suspend_disable_irqs();
BUG_ON(!irqs_disabled());
system_state = SYSTEM_SUSPEND;
error = syscore_suspend();
if (!error) {
*wakeup = pm_wakeup_pending();
if (!(suspend_test(TEST_CORE) || *wakeup)) {
trace_suspend_resume(TPS("machine_suspend"),
state, true);
error = suspend_ops->enter(state);
trace_suspend_resume(TPS("machine_suspend"),
state, false);
trace_android_vh_early_resume_begin(NULL);
} else if (*wakeup) {
error = -EBUSY;
}
syscore_resume();
}
system_state = SYSTEM_RUNNING;
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());
Enable_cpus:
pm_sleep_enable_secondary_cpus();
Platform_wake:
platform_resume_noirq(state);
dpm_resume_noirq(PMSG_RESUME);
Platform_early_resume:
platform_resume_early(state);
Devices_early_resume:
dpm_resume_early(PMSG_RESUME);
Platform_finish:
platform_resume_finish(state);
return error;
}
suspend_enter 函数主要作用就是让系统进入指定的 suspend state,通过 platform_suspend _prepare、dpm_suspend_late、platform_suspend_prepare_late、dpm_suspend_noirq、等函数是为了让系统中的 device 能够正常进入 suspend state,因为 device 会在 suspend 的不同阶段调用不同的回调函数,其中的回调包括 dev->pm_domain->ops、dev->type->pm、dev->class->pm、dev->bus->pm,最后再调用 dev->driver->pm。
因为是 suspend to idle,所以 state == PM_SUSPEND_TO_IDLE 会调用 s2idle_loop 函数,该函数就是 suspend to idle 中比较重要的函数之一了,s2idle_loop 函数后面的一段流程是其它 suspend state 调用的,包括关闭非启动 CPU(pm_sleep_disable_secondary_cpus),关闭 boot CPU(suspend_ops->enter(state))等,关于这部分代码的分析可点击这里。
3、s2idle_loop
下面直接进入 s2idle_loop 函数,如下:
// kernel/power/suspend.c
static void s2idle_loop(void)
{
pm_pr_dbg("suspend-to-idle\n");
/*
* Suspend-to-idle equals:
* frozen processes + suspended devices + idle processors.
* Thus s2idle_enter() should be called right after all devices have
* been suspended.
*
* Wakeups during the noirq suspend of devices may be spurious, so try
* to avoid them upfront.
*/
for (;;) {
if (s2idle_ops && s2idle_ops->wake) {
if (s2idle_ops->wake())
break;
} else if (pm_wakeup_pending()) {
break;
}
clear_wakeup_reasons();
if (s2idle_ops && s2idle_ops->check)
s2idle_ops->check();
s2idle_enter();
}
}
可以看到这里是一个无限循环,当系统中没有唤醒事件时,系统将会一直 suspend 下去,回到代码,首先来看一下检查是否有唤醒的代码,如下:
if (s2idle_ops && s2idle_ops->wake) {
if (s2idle_ops->wake())
break;
} else if (pm_wakeup_pending()) {
break;
}
首先会调用 s2idle_ops->wake() 函数检查是否有唤醒事件,否则调用pm_wakeup_ pending
函数检查是否有唤醒事件,如果有则跳出循环,pm_wakeup_pending 函数如下:
// drivers/base/power/wakeup.c
/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
* value from the past and return true if new wakeup events have been registered
* since the old value was stored. Also return true if the current number of
* wakeup events being processed is different from zero.
*/
bool pm_wakeup_pending(void)
{
unsigned long flags;
bool ret = false;
char suspend_abort[MAX_SUSPEND_ABORT_LEN];
raw_spin_lock_irqsave(&events_lock, flags);
if (events_check_enabled) {
unsigned int cnt, inpr;
split_counters(&cnt, &inpr);
ret = (cnt != saved_count || inpr > 0);
events_check_enabled = !ret;
}
raw_spin_unlock_irqrestore(&events_lock, flags);
......
return ret || atomic_read(&pm_abort_suspend) > 0;
}
EXPORT_SYMBOL_GPL(pm_wakeup_pending);
函数会通过 ret 和原子变量 pm_abort_suspend 的值的真假来返回 bool 值,该原子变量的设置后面再分析,回到 s2idle_loop 函数中,当判断完是否有 唤醒事件后会调用 clear_wakeup_reasons 清除唤醒事件,并调用 s2idle_enter 函数,如下:
// kernel/power/suspend.c
static void s2idle_enter(void)
{
trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, true);
raw_spin_lock_irq(&s2idle_lock);
if (pm_wakeup_pending())
goto out;
s2idle_state = S2IDLE_STATE_ENTER;
raw_spin_unlock_irq(&s2idle_lock);
cpus_read_lock();
/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
/* Make the current CPU wait so it can enter the idle loop too. */
swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);
/*
* Kick all CPUs to ensure that they resume their timers and restore
* consistent system state.
*/
wake_up_all_idle_cpus();
cpus_read_unlock();
raw_spin_lock_irq(&s2idle_lock);
out:
s2idle_state = S2IDLE_STATE_NONE;
raw_spin_unlock_irq(&s2idle_lock);
trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, false);
}
该函数就是进入 CPU 进入 idle 之前最后要执行的函数了,函数首先会检查是否有唤醒事件,如果有则结束 suspend,然后进入关键代码,如下:
s2idle_state = S2IDLE_STATE_ENTER;
raw_spin_unlock_irq(&s2idle_lock);
cpus_read_lock();
/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
/* Make the current CPU wait so it can enter the idle loop too. */
swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);
首先将全局变量 s2idle_state 设置为 S2IDLE_STATE_ENTER,表示开始进入 s2idle ,该标志位的设置会在 idle loop 中体现,然后调用 wake_up_all_idle_cpus 函数将所有 CPU 都推入 idle 线程中,并调用 swait_event_exclusive 函数等待条件 s2idle_state == S2IDLE_STATE_WAKE 的满足,这样当前 CPU 也会进入 idle loop。
4、idle loop
因为已经把 CPU push 到了 idle loop 中,所以直接进入 idle 部分的代码,如下:
do_idle
-> local_irq_disable
-> cpuidle_idle_call
cpuidle_idle_call:
// kernel/sched/idle.c
/**
* cpuidle_idle_call - the main idle function
*
* NOTE: no locks or semaphores should be used here
*
* On architectures that support TIF_POLLING_NRFLAG, is called with polling
* set, and it returns with polling set. If it ever stops polling, it
* must clear the polling bit.
*/
static void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = cpuidle_get_device();
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
int next_state, entered_state;
/*
* Check if the idle task must be rescheduled. If it is the
* case, exit the function after re-enabling the local irq.
*/
if (need_resched()) {
local_irq_enable();
return;
}
/*
* The RCU framework needs to be told that we are entering an idle
* section, so no more rcu read side critical sections and one more
* step to the grace period
*/
// 判断 cpu idle 是否可用,包括 idle driver/device 是否有提供
// 如果没有提供则使用默认的 idle 流程
if (cpuidle_not_available(drv, dev)) {
tick_nohz_idle_stop_tick();
default_idle_call();
goto exit_idle;
}
/*
* Suspend-to-idle ("s2idle") is a system state in which all user space
* has been frozen, all I/O devices have been suspended and the only
* activity happens here and in interrupts (if any). In that case bypass
* the cpuidle governor and go straight for the deepest idle state
* available. Possibly also suspend the local tick and the entire
* timekeeping to prevent timer interrupts from kicking us out of idle
* until a proper wakeup interrupt happens.
*/
if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) {
u64 max_latency_ns;
if (idle_should_enter_s2idle()) {
entered_state = call_cpuidle_s2idle(drv, dev);
if (entered_state > 0) {
goto exit_idle;
}
max_latency_ns = U64_MAX;
} else {
max_latency_ns = dev->forced_idle_latency_limit_ns;
}
tick_nohz_idle_stop_tick();
next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
call_cpuidle(drv, dev, next_state);
} else {
bool stop_tick = true;
/*
* Ask the cpuidle framework to choose a convenient idle state.
*/
next_state = cpuidle_select(drv, dev, &stop_tick); //通过cpuidle governor,选择一个cpuidle state
if (stop_tick || tick_nohz_tick_stopped())
tick_nohz_idle_stop_tick();
else
tick_nohz_idle_retain_tick();
entered_state = call_cpuidle(drv, dev, next_state); //通过cpuidle state,进入该idle状态
/*
* Give the governor an opportunity to reflect on the outcome
*/
cpuidle_reflect(dev, entered_state); // 通知cpuidle governor,更新状态
}
exit_idle:
__current_set_polling();
/*
* It is up to the idle functions to reenable local interrupts
*/
if (WARN_ON_ONCE(irqs_disabled()))
local_irq_enable(); // 使能中断,响应中断事件,跳转到对应的中断处理函数
}
cpuidle_idle_call 函数是主 idle 函数,并且在进入该函数之前 IRQ 中断已经被关闭,在该函数中首先会通过 cpuidle_not_available 函数来判断当前系统是否支持 idle framework,如果不支持则会调用default_idle_call函数走默认的 idle 流程,否则就会进入 if/else 的分支判断,其中 if 分支表示的就是 s2idle 流程,也就是本篇文章分析的主要类容,else 分支则是通过 cpuidle framework 来实现 CPU idle 的,比如调用 cpuidle_select函数,通过cpuidle governor 选择一个合适的 idle state,并通过 call_cpuidle 函数进入该 idle state,当 CPU 从该 idle state 返回后调用 cpuidle_reflect 函数更新 cpuidle governor 的状态。
但本篇文章主要是分析 s2idle 流程,所以主要还是 if 分支,idle_should_enter_s2idle 函数如下:
static inline bool idle_should_enter_s2idle(void)
{
return unlikely(s2idle_state == S2IDLE_STATE_ENTER);
}
可以看到 idle_should_enter_s2idle 函数主要是判断 s2idle_state 是否为 S2IDLE_STATE_ ENTER 并返回相应的 bool 值,而s2idle_state变量在前面的 s2idle_enter 函数中就已经被设置成了 S2IDLE_STATE_ENTER,所以会进入 if 分支并调用 call_cpuidle_s2idle 函数,该函数就是 CPU 进入 s2idle state 的入口,省略其它代码,调用流程如下:
call_cpuidle_s2idle
-> cpuidle_enter_s2idle
-> find_deepest_state
-> enter_s2idle_proper
-> target_state->enter_s2idle
find_deepest_state 函数主要是找到 cpuidle_driver 中 state 数组的 idx,也就是 idle state,并且该 idx 应该是大于 0 的,找到 idle state 之后就会进入 enter_s2idle_proper 函数调用 enter_s2idle 回调,完成 CPU 的 suspend。
5、psci_enter_idle_state
现在关键的是 enter_s2idle 函数指针是指向哪一个函数的呢?因为这里的肯定是大于0的,所以通过 cpuidle framework 可以知道,该 idle state 是通过 DTS 进行初始化的,所以通过 init_state_node 函数知道 idle_state->enter_s2idle = idle_state->enter 都等于 psci_enter_idle_state 函数,所以 target_state->enter_s2idle 会调用到 psci_enter_idle_state 函数,如下:
// drivers/cpuidle/cpuidle-psci.c
static int psci_enter_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int idx)
{
u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
return psci_enter_state(idx, state[idx]);
}
psci_enter_state:
// drivers/cpuidle/cpuidle-psci.c
static inline int psci_enter_state(int idx, u32 state)
{
// 当 idx = 0, 会调用 cpu_do_idle 走默认 wfi
// 但是如果走 suspend 流程时就会走 psci_cpu_suspend_enter
return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
}
CPU_PM_CPU_IDLE_ENTER_PARAM
-> __CPU_PM_CPU_IDLE_ENTER
// include/linux/cpuidle.h
#define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \
idx, \
state, \
is_retention) \
({ \
int __ret = 0; \
\
if (!idx) { \
cpu_do_idle(); \
return idx; \
} \
\
if (!is_retention) \
__ret = cpu_pm_enter(); \
if (!__ret) { \
__ret = low_level_idle_enter(state); \
if (!is_retention) \
cpu_pm_exit(); \
} \
\
__ret ? -1 : idx; \
})
当 idx = 0 ,即cpuidle_state states[0]时,会调用默认的 idle 函数 cpu_do_idle,然后直接返回,但是当 idx 不等于 0 时会调用 psci_cpu_suspend_enter 函数,如下:
// drivers/firmware/psci/psci.c
int psci_cpu_suspend_enter(u32 state)
{
int ret;
......
ret = cpu_suspend(state, psci_suspend_finisher);
......
return ret;
}
6、cpu_suspend
cpu_suspend 函数如下:
// arch/arm64/kernel/suspend.c
/*
* cpu_suspend
*
* arg: argument to pass to the finisher function
* fn: finisher function pointer
*
*/
int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
{
int ret = 0;
unsigned long flags;
struct sleep_stack_data state;
struct arm_cpuidle_irq_context context;
/* Report any MTE async fault before going to suspend */
mte_suspend_enter();
/*
* From this point debug exceptions are disabled to prevent
* updates to mdscr register (saved and restored along with
* general purpose registers) from kernel debuggers.
*/
flags = local_daif_save();
/*
* Function graph tracer state gets inconsistent when the kernel
* calls functions that never return (aka suspend finishers) hence
* disable graph tracing during their execution.
*/
pause_graph_tracing();
/*
* Switch to using DAIF.IF instead of PMR in order to reliably
* resume if we're using pseudo-NMIs.
*/
arm_cpuidle_save_irq_context(&context);
if (__cpu_suspend_enter(&state)) {
/* Call the suspend finisher */
ret = fn(arg);
/*
* Never gets here, unless the suspend finisher fails.
* Successful cpu_suspend() should return from cpu_resume(),
* returning through this code path is considered an error
* If the return value is set to 0 force ret = -EOPNOTSUPP
* to make sure a proper error condition is propagated
*/
if (!ret)
ret = -EOPNOTSUPP;
} else {
RCU_NONIDLE(__cpu_suspend_exit());
}
arm_cpuidle_restore_irq_context(&context);
unpause_graph_tracing();
/*
* Restore pstate flags. OS lock and mdscr have been already
* restored, so from this point onwards, debugging is fully
* reenabled if it was enabled when core started shutdown.
*/
local_daif_restore(flags);
return ret;
}
在 cpu_suspend 函数中会通过 __cpu_suspend_enter 函数保存系统当前的状态,为resume 做准备,并且__cpu_suspend_enter 函数还会返回两次,当__cpu_suspend_enter函数返回true时,会回调 psci_suspend_finisher 函数,当 CPU resume 时__cpu_suspend_enter 函数还会返回一次 false,开始 resume 流程,具体后面再分析,__cpu_suspend_enter 函数如下:
// arch/arm64/kernel/sleep.S
/*
* Save CPU state in the provided sleep_stack_data area, and publish its
* location for cpu_resume()'s use in sleep_save_stash.
*
* cpu_resume() will restore this saved state, and return. Because the
* link-register is saved and restored, it will appear to return from this
* function. So that the caller can tell the suspend/resume paths apart,
* __cpu_suspend_enter() will always return a non-zero value, whereas the
* path through cpu_resume() will return 0.
*
* x0 = struct sleep_stack_data area
*/
SYM_FUNC_START(__cpu_suspend_enter)
stp x29, lr, [x0, #SLEEP_STACK_DATA_CALLEE_REGS]
stp x19, x20, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+16]
stp x21, x22, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+32]
stp x23, x24, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+48]
stp x25, x26, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+64]
stp x27, x28, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+80]
/* save the sp in cpu_suspend_ctx */
mov x2, sp
str x2, [x0, #SLEEP_STACK_DATA_SYSTEM_REGS + CPU_CTX_SP]
/* find the mpidr_hash */
ldr_l x1, sleep_save_stash
mrs x7, mpidr_el1
adr_l x9, mpidr_hash
ldr x10, [x9, #MPIDR_HASH_MASK]
/*
* Following code relies on the struct mpidr_hash
* members size.
*/
ldp w3, w4, [x9, #MPIDR_HASH_SHIFTS]
ldp w5, w6, [x9, #(MPIDR_HASH_SHIFTS + 8)]
compute_mpidr_hash x8, x3, x4, x5, x6, x7, x10
add x1, x1, x8, lsl #3
str x0, [x1]
add x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
stp x29, lr, [sp, #-16]!
bl cpu_do_suspend
ldp x29, lr, [sp], #16
mov x0, #1
ret
SYM_FUNC_END(__cpu_suspend_enter)
psci_suspend_finisher 如下:
static noinstr int psci_suspend_finisher(unsigned long state)
{
u32 power_state = state;
phys_addr_t pa_cpu_resume;
pa_cpu_resume = __pa_symbol_nodebug((unsigned long)cpu_resume);
return psci_ops.cpu_suspend(power_state, pa_cpu_resume);
}
首先通过 __pa_symbol_nodebug 函数将 resume 函数的的地址转换成物理地址,这是为 cpu resume 做准备,然后调用 psci_ops.cpu_suspend 函数,该函数初始化如下:
psci_1_0_init
-> psci_0_2_init
-> psci_probe
-> psci_0_2_set_functions
psci_0_2_set_functions:
// drivers/firmware/psci/psci.c
static void __init psci_0_2_set_functions(void)
{
pr_info("Using standard PSCI v0.2 function IDs\n");
psci_ops = (struct psci_operations){
.get_version = psci_0_2_get_version,
.cpu_suspend = psci_0_2_cpu_suspend,
.cpu_off = psci_0_2_cpu_off,
.cpu_on = psci_0_2_cpu_on,
.migrate = psci_0_2_migrate,
.affinity_info = psci_affinity_info,
.migrate_info_type = psci_migrate_info_type,
};
register_restart_handler(&psci_sys_reset_nb);
pm_power_off = psci_sys_poweroff;
}
所以 psci_ops.cpu_suspend 最后会调用到 psci_0_2_cpu_suspend 函数,并将 state 和 resume 地址传递了下来,如下:
// drivers/firmware/psci/psci.c
static __always_inline int
psci_0_2_cpu_suspend(u32 state, unsigned long entry_point)
{
return __psci_cpu_suspend(PSCI_FN_NATIVE(0_2, CPU_SUSPEND),
state, entry_point);
}
继续往下:
__psci_cpu_suspend
-> invoke_psci_fn
-> __invoke_psci_fn_smc
-> arm_smccc_smc
-> 最后通过 smc 命令到 ATF
至此,CPU 就已经 suspend下去了,
7、开始 resume 流程
当系统中发生中断事件系统被唤醒时,CPU 会从 ATF 中返回,那返回到哪里呢?还记得在 suspend 时传递的两个参数吗?第一个是 state,而第二个参数就是 resume 时CPU执行的函数,即 cpu_resume 函数,如下:
// arch/arm64/kernel/sleep.S
SYM_CODE_START(cpu_resume)
bl init_kernel_el
bl finalise_el2
#if VA_BITS > 48
ldr_l x0, vabits_actual
#endif
bl __cpu_setup
/* enable the MMU early - so we can access sleep_save_stash by va */
adrp x1, swapper_pg_dir
adrp x2, idmap_pg_dir
bl __enable_mmu
ldr x8, =_cpu_resume
br x8
SYM_CODE_END(cpu_resume)
.ltorg
.popsection
SYM_FUNC_START(_cpu_resume)
mrs x1, mpidr_el1
adr_l x8, mpidr_hash // x8 = struct mpidr_hash virt address
/* retrieve mpidr_hash members to compute the hash */
ldr x2, [x8, #MPIDR_HASH_MASK]
ldp w3, w4, [x8, #MPIDR_HASH_SHIFTS]
ldp w5, w6, [x8, #(MPIDR_HASH_SHIFTS + 8)]
compute_mpidr_hash x7, x3, x4, x5, x6, x1, x2
/* x7 contains hash index, let's use it to grab context pointer */
ldr_l x0, sleep_save_stash
ldr x0, [x0, x7, lsl #3]
add x29, x0, #SLEEP_STACK_DATA_CALLEE_REGS
add x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
/* load sp from context */
ldr x2, [x0, #CPU_CTX_SP]
mov sp, x2
/*
* cpu_do_resume expects x0 to contain context address pointer
*/
bl cpu_do_resume
#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
mov x0, sp
bl kasan_unpoison_task_stack_below
#endif
ldp x19, x20, [x29, #16]
ldp x21, x22, [x29, #32]
ldp x23, x24, [x29, #48]
ldp x25, x26, [x29, #64]
ldp x27, x28, [x29, #80]
ldp x29, lr, [x29]
mov x0, #0
ret
SYM_FUNC_END(_cpu_resume)
可以看到 在 cpu_resume 函数中会通过 _cpu_resume 还原系统 suspend 之前的一些状态,其实就是将通过 __cpu_suspend_enter 函数保存的寄存器信息进行还原,所以 _cpu_resume 返回 0 还会从 __cpu_suspend_enter 函数返回,所以会走 if (__cpu_suspend_enter(&state)) 的else 分支,即 RCU_NONIDLE(__cpu_suspend_exit()),开始系统的 resume 流程。
8、开启 IRQ 中断
因为在开始 suspend 的时候系统的 IRQ 中断是被关闭的,所以虽然系统现在开始 resume,但是暂时还是不能处理中断事件,只有当 IRQ 中断被打开后才能够处理中断事件,那什么时候才打开呢?回到 cpuidle_enter_s2idle 函数,如下:
// drivers/cpuidle/cpuidle.c
int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
int index;
/*
* Find the deepest state with ->enter_s2idle present, which guarantees
* that interrupts won't be enabled when it exits and allows the tick to
* be frozen safely.
*/
index = find_deepest_state(drv, dev, U64_MAX, 0, true); // index = 1
if (index > 0) {
enter_s2idle_proper(drv, dev, index);
local_irq_enable();
}
return index;
}
当 CPU 一直resume,知道从 enter_s2idle_proper 函数返回时,才调用 local_irq_enable 函数开启中断,这时就可以转去处理中断事件了。
9、pm_system_wakeup
进入到中断处理函数,如下:
gic_handle_irq
-> ......
-> __handle_irq_event_percpu
-> wakeup_interrupt_handler
-> pm_system_wakeup
pm_system_wakeup 函数如下:
// drivers/base/power/wakeup.c
void pm_system_wakeup(void)
{
atomic_inc(&pm_abort_suspend);
s2idle_wake();
}
EXPORT_SYMBOL_GPL(pm_system_wakeup);
函数首先对原子变量 pm_abort_suspend 进行递增,还记得这个原子变量之前在哪里看到过吗?其实是在 s2idle_loop 函数中的 pm_wakeup_pending 函数中会判断 pm_abort_suspend 是否大于 0,即 atomic_inc(&pm_abort_suspend) 就是在设置退出 s2idle_loop 函数的条件。
s2idle_wake 函数如下:
// kernel/power/suspend.c
void s2idle_wake(void)
{
unsigned long flags;
raw_spin_lock_irqsave(&s2idle_lock, flags);
if (s2idle_state > S2IDLE_STATE_NONE) {
s2idle_state = S2IDLE_STATE_WAKE;
swake_up_one(&s2idle_wait_head);
}
raw_spin_unlock_irqrestore(&s2idle_lock, flags);
}
EXPORT_SYMBOL_GPL(s2idle_wake);
在 s2idle_wake 函数中注意到一句代码 s2idle_state = S2IDLE_STATE_WAKE,是不是前面也见到过该变量?其实是在 s2idle_enter 函数中,如下代码段:
swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);
CPU 会一直等待,等待的条件是 s2idle_state == S2IDLE_STATE_WAKE,在 s2idle_wake 函数中设置了 s2idle_state 变量,那 CPU 也就会继续往下执行 resume 的其它流程。
三、总结
至此,suspend to idle(s2idle) 流程就分析到这里,最后用一张图来总结全文内容,因为图片太大不能直接上传,所以以网盘的形式进行分享,如下:
链接: https://pan.baidu.com/s/1pvhZVbRwauhWUzJ5nL_3rw?pwd=5j9f 提取码: 5j9f
References
[1] CPU Idle Time Management — The Linux Kernel documentation
[2] https://www.cnblogs.com/hellokitty2/p/14224548.html
[3] https://blog.csdn.net/weixin_48185168/article/details/133576463
[4] http://www.wowotech.net/pm_subsystem/cpuidle_overview.html
[5] https://blog.csdn.net/qq_36654175/article/details/124799473
[6] https://blog.csdn.net/qq_37294304/article/details/133763112
[7] http://dumpstack.cn/index.php/2022/03/17/639.html
[8] https://www.cnblogs.com/hellokitty2/p/12898962.html
[9] https://www.cnblogs.com/arnoldlu/p/6344847.html
[10] Power management/Suspend and hibernate - ArchWiki
[11] https://www.cnblogs.com/LoyenWang/p/11372679.html
[12] https://blog.csdn.net/qq_28779021/article/details/80046713
[13] https://blog.csdn.net/qq_39575672/article/details/129708512
[14] https://www.cnblogs.com/lifexy/p/9629699.html
[15] https://blog.csdn.net/weixin_48185168/article/details/133585872