当前位置: 首页 > article >正文

Linux电源管理——Suspend-to-Idle(s2idle) 流程

目录

一、常见的 Suspend 方式

1、Suspend-to-idle

2、Standby 

3、Suspend-to-RAM

4、Hibernation 

二、代码分析

1、state_store

2、suspend_enter

3、s2idle_loop

4、idle loop

5、psci_enter_idle_state

6、cpu_suspend

7、开始 resume 流程

8、开启 IRQ 中断

9、pm_system_wakeup

三、总结

References


Linux Version:linux-6.1

一、常见的 Suspend 方式

1、Suspend-to-idle

        suspend to idle 也称为 S2I 或 S2Idle,它通过冻结用户空间程序、暂停计时并将所有 I/O 设备置于低功耗状态,这样 CPU 就可以在系统 suspend 时处于最深的 idle state。系统由内中断唤醒,所以理论上所有可产生内中断的设备都可以作为S2Idle的唤醒源。

进入方式:

echo s2idle > /sys/power/mem_state
echo mem > /sys/power/state

echo freeze > /sys/power/state

2、Standby 

        standby 也称为待机,除了 S2Idle 中需要 suspend 的设备外,其还会下线 Non boot cpu,这样做能更节能,但恢复时间更长。相较于 S2Idle,可以唤醒系统的设备更少,可能需要依赖平台设置来进行唤醒。

进入方式:

echo shallow > /sys/power/mem_state
echo mem > /sys/power/state

echo standby > /sys/power/state

3、Suspend-to-RAM

        将系统状态保存至RAM,又称STR或S2RAM,除了内存外的所有资源设备都处于低功耗状态,大多情况下所有外设总线都会断电,可以将系统从S2RAM唤醒的设备更少,并且需要平台来实现唤醒功能。

进入方式:

echo deep > /sys/power/mem_state
echo mem > /sys/power/state

4、Hibernation 

        Hibernation 翻译过来就是冬眠,也被称为Suspend-to-disk或STD,有更强的节能功能。但是需要底层代码来恢复系统。当触发Hibernation 时,内核停止所有系统活动,并创建内存快照映像,并写入持久存储设备(如Disk),保存好镜像之后系统进入低功耗状态,除了少量的唤醒设备外,几乎所有硬件组件(包括RAM)都断电。

进入方式:

echo disk > /sys/power/state

二、代码分析

        本篇文章主要是以 cpuidle framework 为基础,对Suspend-to-idle(s2idle) 的流程进行分析,但在分析之前建议读者可以先去了解一下 cpuidle framework,因为 suspend 后面的流程就是通过 cpuidle framework 实现的,关于 cpuidle framework 的内容可点击这里查看。

1、state_store

        下面开始分析代码流程,进入 state_store 函数,如下:

//  kernel/power/main.c
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
			   const char *buf, size_t n)
{
	suspend_state_t state;
	int error;

	error = pm_autosleep_lock();
	if (error)
		return error;

	if (pm_autosleep_state() > PM_SUSPEND_ON) {
		error = -EBUSY;
		goto out;
	}

	state = decode_state(buf, n);
	if (state < PM_SUSPEND_MAX) {
		if (state == PM_SUSPEND_MEM)
			state = mem_sleep_current;

		error = pm_suspend(state);
	} else if (state == PM_SUSPEND_MAX) {
		error = hibernate();
		trace_android_vh_hibernate_state(error);
	} else {
		error = -EINVAL;
	}

 out:
	pm_autosleep_unlock();
	return error ? error : n;
}

        这里主要就是通过用户输入不同的 state 来确定使用那一种 suspend 方式,因为是 suspend to idle ,所以 state = PM_SUSPEND_TO_IDLE,并且会进入 pm_suspend 函数,为了方便分析,直接跳过其它代码,进入到关键的函数,如下:

pm_suspend

        -> enter_state

                -> s2idle_begin          ======>   s2idle_state = S2IDLE_STATE_NONE 

                -> suspend_prepare  ======>   suspend_freeze_processes()

                -> suspend_devices_and_enter

                        -> suspend_console

                        -> dpm_suspend_start

                        -> suspend_test_finish

                        -> suspend_enter

2、suspend_enter

suspend_enter 函数如下:

//  kernel/power/suspend.c
/**
 * suspend_enter - Make the system enter the given sleep state.
 * @state: System sleep state to enter.
 * @wakeup: Returns information that the sleep state should not be re-entered.
 *
 * This function should be called after devices have been suspended.
 */
static int suspend_enter(suspend_state_t state, bool *wakeup)
{
	int error, last_dev;

	error = platform_suspend_prepare(state);
	if (error)
		goto Platform_finish;

	error = dpm_suspend_late(PMSG_SUSPEND);
	if (error) {
		last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
		last_dev %= REC_FAILED_NUM;
		pr_err("late suspend of devices failed\n");
		log_suspend_abort_reason("late suspend of %s device failed",
					 suspend_stats.failed_devs[last_dev]);
		goto Platform_finish;
	}
	error = platform_suspend_prepare_late(state);
	if (error)
		goto Devices_early_resume;

	error = dpm_suspend_noirq(PMSG_SUSPEND);
	if (error) {
		last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
		last_dev %= REC_FAILED_NUM;
		pr_err("noirq suspend of devices failed\n");
		log_suspend_abort_reason("noirq suspend of %s device failed",
					 suspend_stats.failed_devs[last_dev]);
		goto Platform_early_resume;
	}
	error = platform_suspend_prepare_noirq(state);
	if (error)
		goto Platform_wake;

	if (suspend_test(TEST_PLATFORM))
		goto Platform_wake;

	if (state == PM_SUSPEND_TO_IDLE) {
		s2idle_loop();
		goto Platform_wake;
	}

	error = pm_sleep_disable_secondary_cpus();
	if (error || suspend_test(TEST_CPUS)) {
		log_suspend_abort_reason("Disabling non-boot cpus failed");
		goto Enable_cpus;
	}

	arch_suspend_disable_irqs();
	BUG_ON(!irqs_disabled());

	system_state = SYSTEM_SUSPEND;

	error = syscore_suspend();
	if (!error) {
		*wakeup = pm_wakeup_pending();
		if (!(suspend_test(TEST_CORE) || *wakeup)) {
			trace_suspend_resume(TPS("machine_suspend"),
				state, true);
			error = suspend_ops->enter(state);
			trace_suspend_resume(TPS("machine_suspend"),
				state, false);
			trace_android_vh_early_resume_begin(NULL);
		} else if (*wakeup) {
			error = -EBUSY;
		}
		syscore_resume();
	}

	system_state = SYSTEM_RUNNING;

	arch_suspend_enable_irqs();
	BUG_ON(irqs_disabled());

 Enable_cpus:
	pm_sleep_enable_secondary_cpus();

 Platform_wake:
	platform_resume_noirq(state);
	dpm_resume_noirq(PMSG_RESUME);

 Platform_early_resume:
	platform_resume_early(state);

 Devices_early_resume:
	dpm_resume_early(PMSG_RESUME);

 Platform_finish:
	platform_resume_finish(state);
	return error;
}

        suspend_enter 函数主要作用就是让系统进入指定的 suspend state,通过 platform_suspend _prepare、dpm_suspend_late、platform_suspend_prepare_late、dpm_suspend_noirq、等函数是为了让系统中的 device 能够正常进入 suspend state,因为 device 会在 suspend 的不同阶段调用不同的回调函数,其中的回调包括 dev->pm_domain->ops、dev->type->pm、dev->class->pm、dev->bus->pm,最后再调用 dev->driver->pm。

        因为是 suspend to idle,所以 state == PM_SUSPEND_TO_IDLE 会调用 s2idle_loop 函数,该函数就是 suspend to idle 中比较重要的函数之一了,s2idle_loop 函数后面的一段流程是其它 suspend state 调用的,包括关闭非启动 CPU(pm_sleep_disable_secondary_cpus),关闭 boot CPU(suspend_ops->enter(state))等,关于这部分代码的分析可点击这里。

3、s2idle_loop

下面直接进入 s2idle_loop 函数,如下:

//  kernel/power/suspend.c
static void s2idle_loop(void)
{
	pm_pr_dbg("suspend-to-idle\n");

	/*
	 * Suspend-to-idle equals:
	 * frozen processes + suspended devices + idle processors.
	 * Thus s2idle_enter() should be called right after all devices have
	 * been suspended.
	 *
	 * Wakeups during the noirq suspend of devices may be spurious, so try
	 * to avoid them upfront.
	 */
	for (;;) {
		if (s2idle_ops && s2idle_ops->wake) {
			if (s2idle_ops->wake())
				break;
		} else if (pm_wakeup_pending()) {
			break;
		}

		clear_wakeup_reasons();

		if (s2idle_ops && s2idle_ops->check)
			s2idle_ops->check();

		s2idle_enter();
	}
}

        可以看到这里是一个无限循环,当系统中没有唤醒事件时,系统将会一直 suspend 下去,回到代码,首先来看一下检查是否有唤醒的代码,如下:

if (s2idle_ops && s2idle_ops->wake) {
		if (s2idle_ops->wake())
			break;
	} else if (pm_wakeup_pending()) {
		break;
}

        首先会调用 s2idle_ops->wake() 函数检查是否有唤醒事件,否则调用pm_wakeup_ pending 函数检查是否有唤醒事件,如果有则跳出循环,pm_wakeup_pending 函数如下:

//  drivers/base/power/wakeup.c
/**
 * pm_wakeup_pending - Check if power transition in progress should be aborted.
 *
 * Compare the current number of registered wakeup events with its preserved
 * value from the past and return true if new wakeup events have been registered
 * since the old value was stored.  Also return true if the current number of
 * wakeup events being processed is different from zero.
 */
bool pm_wakeup_pending(void)
{
	unsigned long flags;
	bool ret = false;
	char suspend_abort[MAX_SUSPEND_ABORT_LEN];

	raw_spin_lock_irqsave(&events_lock, flags);
	if (events_check_enabled) {
		unsigned int cnt, inpr;

		split_counters(&cnt, &inpr);
		ret = (cnt != saved_count || inpr > 0);
		events_check_enabled = !ret;
	}
	raw_spin_unlock_irqrestore(&events_lock, flags);    
    ......
	return ret || atomic_read(&pm_abort_suspend) > 0;
}
EXPORT_SYMBOL_GPL(pm_wakeup_pending);

         函数会通过 ret 和原子变量 pm_abort_suspend 的值的真假来返回 bool 值,该原子变量的设置后面再分析,回到 s2idle_loop 函数中,当判断完是否有 唤醒事件后会调用 clear_wakeup_reasons 清除唤醒事件,并调用 s2idle_enter 函数,如下:

//  kernel/power/suspend.c
static void s2idle_enter(void)
{
	trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, true);

	raw_spin_lock_irq(&s2idle_lock);
	if (pm_wakeup_pending())
		goto out;

	s2idle_state = S2IDLE_STATE_ENTER;
	raw_spin_unlock_irq(&s2idle_lock);

	cpus_read_lock();
	/* Push all the CPUs into the idle loop. */
	wake_up_all_idle_cpus();
	/* Make the current CPU wait so it can enter the idle loop too. */
	swait_event_exclusive(s2idle_wait_head,
		    s2idle_state == S2IDLE_STATE_WAKE);
	/*
	 * Kick all CPUs to ensure that they resume their timers and restore
	 * consistent system state.
	 */
	wake_up_all_idle_cpus();
	cpus_read_unlock();

	raw_spin_lock_irq(&s2idle_lock);
 out:
	s2idle_state = S2IDLE_STATE_NONE;
	raw_spin_unlock_irq(&s2idle_lock);

	trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, false);
}

        该函数就是进入 CPU 进入 idle 之前最后要执行的函数了,函数首先会检查是否有唤醒事件,如果有则结束 suspend,然后进入关键代码,如下:

s2idle_state = S2IDLE_STATE_ENTER;
raw_spin_unlock_irq(&s2idle_lock);

cpus_read_lock();
/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
/* Make the current CPU wait so it can enter the idle loop too. */
swait_event_exclusive(s2idle_wait_head,
		   s2idle_state == S2IDLE_STATE_WAKE);

        首先将全局变量 s2idle_state 设置为 S2IDLE_STATE_ENTER,表示开始进入 s2idle ,该标志位的设置会在 idle loop 中体现,然后调用 wake_up_all_idle_cpus 函数将所有 CPU 都推入 idle 线程中,并调用 swait_event_exclusive 函数等待条件 s2idle_state == S2IDLE_STATE_WAKE 的满足,这样当前 CPU 也会进入 idle loop。

4、idle loop

因为已经把 CPU push 到了 idle loop 中,所以直接进入 idle 部分的代码,如下:

do_idle

        -> local_irq_disable

        -> cpuidle_idle_call

 cpuidle_idle_call:

//  kernel/sched/idle.c
/**
 * cpuidle_idle_call - the main idle function
 *
 * NOTE: no locks or semaphores should be used here
 *
 * On architectures that support TIF_POLLING_NRFLAG, is called with polling
 * set, and it returns with polling set.  If it ever stops polling, it
 * must clear the polling bit.
 */
static void cpuidle_idle_call(void)
{
	struct cpuidle_device *dev = cpuidle_get_device();
	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
	int next_state, entered_state;

	/*
	 * Check if the idle task must be rescheduled. If it is the
	 * case, exit the function after re-enabling the local irq.
	 */
	if (need_resched()) {
		local_irq_enable();
		return;
	}

	/*
	 * The RCU framework needs to be told that we are entering an idle
	 * section, so no more rcu read side critical sections and one more
	 * step to the grace period
	 */

	//	判断 cpu idle 是否可用,包括 idle driver/device 是否有提供
	//  如果没有提供则使用默认的 idle 流程
	if (cpuidle_not_available(drv, dev)) {
		tick_nohz_idle_stop_tick();

		default_idle_call();
		goto exit_idle;
	}

	/*
	 * Suspend-to-idle ("s2idle") is a system state in which all user space
	 * has been frozen, all I/O devices have been suspended and the only
	 * activity happens here and in interrupts (if any). In that case bypass
	 * the cpuidle governor and go straight for the deepest idle state
	 * available.  Possibly also suspend the local tick and the entire
	 * timekeeping to prevent timer interrupts from kicking us out of idle
	 * until a proper wakeup interrupt happens.
	 */

	if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) {
		u64 max_latency_ns;

		if (idle_should_enter_s2idle()) {

			entered_state = call_cpuidle_s2idle(drv, dev);
			if (entered_state > 0) {
				goto exit_idle;
			}

			max_latency_ns = U64_MAX;
		} else {
			max_latency_ns = dev->forced_idle_latency_limit_ns;
		}

		tick_nohz_idle_stop_tick();

		next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
		call_cpuidle(drv, dev, next_state);
	} else {
		bool stop_tick = true;

		/*
		 * Ask the cpuidle framework to choose a convenient idle state.
		 */
		next_state = cpuidle_select(drv, dev, &stop_tick);  //通过cpuidle governor,选择一个cpuidle state 

		if (stop_tick || tick_nohz_tick_stopped())
			tick_nohz_idle_stop_tick();
		else
			tick_nohz_idle_retain_tick();

		entered_state = call_cpuidle(drv, dev, next_state);  //通过cpuidle state,进入该idle状态 
		/*
		 * Give the governor an opportunity to reflect on the outcome
		 */
		cpuidle_reflect(dev, entered_state);  // 通知cpuidle governor,更新状态
	}

exit_idle:
	__current_set_polling();

	/*
	 * It is up to the idle functions to reenable local interrupts
	 */
	if (WARN_ON_ONCE(irqs_disabled()))
		local_irq_enable();  //  使能中断,响应中断事件,跳转到对应的中断处理函数
}

        cpuidle_idle_call 函数是主 idle 函数,并且在进入该函数之前 IRQ 中断已经被关闭,在该函数中首先会通过 cpuidle_not_available 函数来判断当前系统是否支持 idle framework,如果不支持则会调用default_idle_call函数走默认的 idle 流程,否则就会进入 if/else 的分支判断,其中 if 分支表示的就是 s2idle 流程,也就是本篇文章分析的主要类容,else 分支则是通过 cpuidle framework 来实现 CPU idle 的,比如调用 cpuidle_select函数,通过cpuidle governor 选择一个合适的 idle state,并通过 call_cpuidle 函数进入该 idle state,当 CPU  从该 idle state 返回后调用 cpuidle_reflect 函数更新 cpuidle governor 的状态。

        但本篇文章主要是分析 s2idle 流程,所以主要还是 if 分支,idle_should_enter_s2idle 函数如下:

static inline bool idle_should_enter_s2idle(void)
{
	return unlikely(s2idle_state == S2IDLE_STATE_ENTER);
}

        可以看到 idle_should_enter_s2idle 函数主要是判断 s2idle_state 是否为 S2IDLE_STATE_ ENTER 并返回相应的 bool 值,而s2idle_state变量在前面的 s2idle_enter 函数中就已经被设置成了 S2IDLE_STATE_ENTER,所以会进入 if 分支并调用 call_cpuidle_s2idle 函数,该函数就是 CPU 进入 s2idle state 的入口,省略其它代码,调用流程如下:

call_cpuidle_s2idle

        -> cpuidle_enter_s2idle

                -> find_deepest_state

                -> enter_s2idle_proper

                -> target_state->enter_s2idle

        find_deepest_state 函数主要是找到 cpuidle_driver 中 state 数组的 idx,也就是 idle state,并且该 idx 应该是大于 0 的,找到 idle state 之后就会进入 enter_s2idle_proper 函数调用 enter_s2idle 回调,完成 CPU 的 suspend。

5、psci_enter_idle_state

        现在关键的是 enter_s2idle 函数指针是指向哪一个函数的呢?因为这里的肯定是大于0的,所以通过 cpuidle framework 可以知道,该 idle state 是通过 DTS 进行初始化的,所以通过 init_state_node 函数知道  idle_state->enter_s2idle = idle_state->enter 都等于 psci_enter_idle_state 函数,所以 target_state->enter_s2idle 会调用到 psci_enter_idle_state 函数,如下:

//  drivers/cpuidle/cpuidle-psci.c
static int psci_enter_idle_state(struct cpuidle_device *dev,
				struct cpuidle_driver *drv, int idx)
{
	u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);

	return psci_enter_state(idx, state[idx]);
}

psci_enter_state:

//  drivers/cpuidle/cpuidle-psci.c
static inline int psci_enter_state(int idx, u32 state)
{
	//  当 idx = 0, 会调用 cpu_do_idle 走默认 wfi
	//  但是如果走 suspend 流程时就会走 psci_cpu_suspend_enter
	return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
}

CPU_PM_CPU_IDLE_ENTER_PARAM

        -> __CPU_PM_CPU_IDLE_ENTER

//  include/linux/cpuidle.h
#define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter,			\
				idx,					\
				state,					\
				is_retention)				\
({									\
	int __ret = 0;							\
									\
	if (!idx) {							\
		cpu_do_idle();						\
		return idx;						\
	}								\
									\
	if (!is_retention)						\
		__ret =  cpu_pm_enter();				\
	if (!__ret) {							\
		__ret = low_level_idle_enter(state);			\
		if (!is_retention)					\
			cpu_pm_exit();					\
	}								\
									\
	__ret ? -1 : idx;						\
})

        当 idx = 0 ,即cpuidle_state states[0]时,会调用默认的 idle 函数 cpu_do_idle,然后直接返回,但是当 idx 不等于 0 时会调用 psci_cpu_suspend_enter 函数,如下:

//  drivers/firmware/psci/psci.c
int psci_cpu_suspend_enter(u32 state)
{
	int ret;

    ......
	ret = cpu_suspend(state, psci_suspend_finisher);
    ......

	return ret;
}

6、cpu_suspend

cpu_suspend 函数如下:

//  arch/arm64/kernel/suspend.c
/*
 * cpu_suspend
 *
 * arg: argument to pass to the finisher function
 * fn: finisher function pointer
 *
 */
int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
{
	int ret = 0;
	unsigned long flags;
	struct sleep_stack_data state;
	struct arm_cpuidle_irq_context context;

	/* Report any MTE async fault before going to suspend */
	mte_suspend_enter();

	/*
	 * From this point debug exceptions are disabled to prevent
	 * updates to mdscr register (saved and restored along with
	 * general purpose registers) from kernel debuggers.
	 */
	flags = local_daif_save();

	/*
	 * Function graph tracer state gets inconsistent when the kernel
	 * calls functions that never return (aka suspend finishers) hence
	 * disable graph tracing during their execution.
	 */
	pause_graph_tracing();

	/*
	 * Switch to using DAIF.IF instead of PMR in order to reliably
	 * resume if we're using pseudo-NMIs.
	 */
	arm_cpuidle_save_irq_context(&context);

	if (__cpu_suspend_enter(&state)) {
		/* Call the suspend finisher */
		ret = fn(arg);

		/*
		 * Never gets here, unless the suspend finisher fails.
		 * Successful cpu_suspend() should return from cpu_resume(),
		 * returning through this code path is considered an error
		 * If the return value is set to 0 force ret = -EOPNOTSUPP
		 * to make sure a proper error condition is propagated
		 */
		if (!ret)
			ret = -EOPNOTSUPP;
	} else {
		RCU_NONIDLE(__cpu_suspend_exit());
	}

	arm_cpuidle_restore_irq_context(&context);

	unpause_graph_tracing();

	/*
	 * Restore pstate flags. OS lock and mdscr have been already
	 * restored, so from this point onwards, debugging is fully
	 * reenabled if it was enabled when core started shutdown.
	 */
	local_daif_restore(flags);

	return ret;
}

        在 cpu_suspend 函数中会通过 __cpu_suspend_enter 函数保存系统当前的状态,为resume 做准备,并且__cpu_suspend_enter 函数还会返回两次,当__cpu_suspend_enter函数返回true时,会回调 psci_suspend_finisher 函数,当 CPU resume 时__cpu_suspend_enter 函数还会返回一次 false,开始 resume 流程,具体后面再分析,__cpu_suspend_enter 函数如下:

//  arch/arm64/kernel/sleep.S
/*
 * Save CPU state in the provided sleep_stack_data area, and publish its
 * location for cpu_resume()'s use in sleep_save_stash.
 *
 * cpu_resume() will restore this saved state, and return. Because the
 * link-register is saved and restored, it will appear to return from this
 * function. So that the caller can tell the suspend/resume paths apart,
 * __cpu_suspend_enter() will always return a non-zero value, whereas the
 * path through cpu_resume() will return 0.
 *
 *  x0 = struct sleep_stack_data area
 */
SYM_FUNC_START(__cpu_suspend_enter)
	stp	x29, lr, [x0, #SLEEP_STACK_DATA_CALLEE_REGS]
	stp	x19, x20, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+16]
	stp	x21, x22, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+32]
	stp	x23, x24, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+48]
	stp	x25, x26, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+64]
	stp	x27, x28, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+80]

	/* save the sp in cpu_suspend_ctx */
	mov	x2, sp
	str	x2, [x0, #SLEEP_STACK_DATA_SYSTEM_REGS + CPU_CTX_SP]

	/* find the mpidr_hash */
	ldr_l	x1, sleep_save_stash
	mrs	x7, mpidr_el1
	adr_l	x9, mpidr_hash
	ldr	x10, [x9, #MPIDR_HASH_MASK]
	/*
	 * Following code relies on the struct mpidr_hash
	 * members size.
	 */
	ldp	w3, w4, [x9, #MPIDR_HASH_SHIFTS]
	ldp	w5, w6, [x9, #(MPIDR_HASH_SHIFTS + 8)]
	compute_mpidr_hash x8, x3, x4, x5, x6, x7, x10
	add	x1, x1, x8, lsl #3

	str	x0, [x1]
	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
	stp	x29, lr, [sp, #-16]!
	bl	cpu_do_suspend
	ldp	x29, lr, [sp], #16
	mov	x0, #1
	ret
SYM_FUNC_END(__cpu_suspend_enter)

psci_suspend_finisher 如下:

static noinstr int psci_suspend_finisher(unsigned long state)
{
	u32 power_state = state;
	phys_addr_t pa_cpu_resume;

	pa_cpu_resume = __pa_symbol_nodebug((unsigned long)cpu_resume);

	return psci_ops.cpu_suspend(power_state, pa_cpu_resume);
}

        首先通过 __pa_symbol_nodebug 函数将 resume 函数的的地址转换成物理地址,这是为 cpu resume 做准备,然后调用 psci_ops.cpu_suspend 函数,该函数初始化如下:

psci_1_0_init

        -> psci_0_2_init

                -> psci_probe

                        -> psci_0_2_set_functions

psci_0_2_set_functions:

// drivers/firmware/psci/psci.c
static void __init psci_0_2_set_functions(void)
{
	pr_info("Using standard PSCI v0.2 function IDs\n");

	psci_ops = (struct psci_operations){
		.get_version = psci_0_2_get_version,
		.cpu_suspend = psci_0_2_cpu_suspend,
		.cpu_off = psci_0_2_cpu_off,
		.cpu_on = psci_0_2_cpu_on,
		.migrate = psci_0_2_migrate,
		.affinity_info = psci_affinity_info,
		.migrate_info_type = psci_migrate_info_type,
	};

	register_restart_handler(&psci_sys_reset_nb);

	pm_power_off = psci_sys_poweroff;
}

        所以 psci_ops.cpu_suspend  最后会调用到 psci_0_2_cpu_suspend 函数,并将 state 和 resume 地址传递了下来,如下:

//  drivers/firmware/psci/psci.c
static __always_inline int
psci_0_2_cpu_suspend(u32 state, unsigned long entry_point)
{
	return __psci_cpu_suspend(PSCI_FN_NATIVE(0_2, CPU_SUSPEND),
				  state, entry_point);
}

继续往下:

__psci_cpu_suspend

        -> invoke_psci_fn

                -> __invoke_psci_fn_smc

                        -> arm_smccc_smc

                                -> 最后通过 smc 命令到 ATF

        至此,CPU 就已经 suspend下去了,

7、开始 resume 流程

        当系统中发生中断事件系统被唤醒时,CPU 会从 ATF 中返回,那返回到哪里呢?还记得在 suspend 时传递的两个参数吗?第一个是 state,而第二个参数就是 resume 时CPU执行的函数,即 cpu_resume 函数,如下:

// arch/arm64/kernel/sleep.S
SYM_CODE_START(cpu_resume)
	bl	init_kernel_el
	bl	finalise_el2
#if VA_BITS > 48
	ldr_l	x0, vabits_actual
#endif
	bl	__cpu_setup
	/* enable the MMU early - so we can access sleep_save_stash by va */
	adrp	x1, swapper_pg_dir
	adrp	x2, idmap_pg_dir
	bl	__enable_mmu
	ldr	x8, =_cpu_resume
	br	x8
SYM_CODE_END(cpu_resume)
	.ltorg
	.popsection

SYM_FUNC_START(_cpu_resume)
	mrs	x1, mpidr_el1
	adr_l	x8, mpidr_hash		// x8 = struct mpidr_hash virt address

	/* retrieve mpidr_hash members to compute the hash */
	ldr	x2, [x8, #MPIDR_HASH_MASK]
	ldp	w3, w4, [x8, #MPIDR_HASH_SHIFTS]
	ldp	w5, w6, [x8, #(MPIDR_HASH_SHIFTS + 8)]
	compute_mpidr_hash x7, x3, x4, x5, x6, x1, x2

	/* x7 contains hash index, let's use it to grab context pointer */
	ldr_l	x0, sleep_save_stash
	ldr	x0, [x0, x7, lsl #3]
	add	x29, x0, #SLEEP_STACK_DATA_CALLEE_REGS
	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
	/* load sp from context */
	ldr	x2, [x0, #CPU_CTX_SP]
	mov	sp, x2
	/*
	 * cpu_do_resume expects x0 to contain context address pointer
	 */
	bl	cpu_do_resume

#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
	mov	x0, sp
	bl	kasan_unpoison_task_stack_below
#endif

	ldp	x19, x20, [x29, #16]
	ldp	x21, x22, [x29, #32]
	ldp	x23, x24, [x29, #48]
	ldp	x25, x26, [x29, #64]
	ldp	x27, x28, [x29, #80]
	ldp	x29, lr, [x29]
	mov	x0, #0
	ret
SYM_FUNC_END(_cpu_resume)

        可以看到 在 cpu_resume 函数中会通过 _cpu_resume 还原系统 suspend 之前的一些状态,其实就是将通过 __cpu_suspend_enter 函数保存的寄存器信息进行还原,所以 _cpu_resume 返回 0 还会从 __cpu_suspend_enter 函数返回,所以会走 if (__cpu_suspend_enter(&state)) 的else 分支,即 RCU_NONIDLE(__cpu_suspend_exit()),开始系统的 resume 流程。

8、开启 IRQ 中断

        因为在开始 suspend 的时候系统的 IRQ 中断是被关闭的,所以虽然系统现在开始 resume,但是暂时还是不能处理中断事件,只有当 IRQ 中断被打开后才能够处理中断事件,那什么时候才打开呢?回到 cpuidle_enter_s2idle 函数,如下:

//  drivers/cpuidle/cpuidle.c
int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
	int index;

	/*
	 * Find the deepest state with ->enter_s2idle present, which guarantees
	 * that interrupts won't be enabled when it exits and allows the tick to
	 * be frozen safely.
	 */
	index = find_deepest_state(drv, dev, U64_MAX, 0, true);  // index = 1
	if (index > 0) {
		enter_s2idle_proper(drv, dev, index);
		local_irq_enable();
	}
	return index;
}

        当 CPU 一直resume,知道从 enter_s2idle_proper 函数返回时,才调用 local_irq_enable 函数开启中断,这时就可以转去处理中断事件了。

9、pm_system_wakeup

进入到中断处理函数,如下:

gic_handle_irq

        -> ......

                -> __handle_irq_event_percpu

                        -> wakeup_interrupt_handler

                                -> pm_system_wakeup

pm_system_wakeup 函数如下:

//  drivers/base/power/wakeup.c
void pm_system_wakeup(void)
{
	atomic_inc(&pm_abort_suspend);
	s2idle_wake();
}
EXPORT_SYMBOL_GPL(pm_system_wakeup);

        函数首先对原子变量 pm_abort_suspend 进行递增,还记得这个原子变量之前在哪里看到过吗?其实是在 s2idle_loop 函数中的 pm_wakeup_pending 函数中会判断 pm_abort_suspend 是否大于 0,即 atomic_inc(&pm_abort_suspend) 就是在设置退出 s2idle_loop 函数的条件。

s2idle_wake 函数如下:

//  kernel/power/suspend.c
void s2idle_wake(void)
{
	unsigned long flags;

	raw_spin_lock_irqsave(&s2idle_lock, flags);
	if (s2idle_state > S2IDLE_STATE_NONE) {
		s2idle_state = S2IDLE_STATE_WAKE;
		swake_up_one(&s2idle_wait_head);
	}
	raw_spin_unlock_irqrestore(&s2idle_lock, flags);
}
EXPORT_SYMBOL_GPL(s2idle_wake);

        在 s2idle_wake 函数中注意到一句代码 s2idle_state = S2IDLE_STATE_WAKE,是不是前面也见到过该变量?其实是在 s2idle_enter 函数中,如下代码段:

swait_event_exclusive(s2idle_wait_head,
		   s2idle_state == S2IDLE_STATE_WAKE);

        CPU 会一直等待,等待的条件是 s2idle_state == S2IDLE_STATE_WAKE,在 s2idle_wake 函数中设置了 s2idle_state 变量,那 CPU 也就会继续往下执行 resume 的其它流程。

三、总结

        至此,suspend to idle(s2idle) 流程就分析到这里,最后用一张图来总结全文内容,因为图片太大不能直接上传,所以以网盘的形式进行分享,如下:

链接: https://pan.baidu.com/s/1pvhZVbRwauhWUzJ5nL_3rw?pwd=5j9f 提取码: 5j9f

References

[1] CPU Idle Time Management — The Linux Kernel documentation

[2] https://www.cnblogs.com/hellokitty2/p/14224548.html

[3] https://blog.csdn.net/weixin_48185168/article/details/133576463

[4] http://www.wowotech.net/pm_subsystem/cpuidle_overview.html

[5] https://blog.csdn.net/qq_36654175/article/details/124799473

[6] https://blog.csdn.net/qq_37294304/article/details/133763112

[7] http://dumpstack.cn/index.php/2022/03/17/639.html

[8] https://www.cnblogs.com/hellokitty2/p/12898962.html

[9] https://www.cnblogs.com/arnoldlu/p/6344847.html

[10] Power management/Suspend and hibernate - ArchWiki

[11] https://www.cnblogs.com/LoyenWang/p/11372679.html

[12] https://blog.csdn.net/qq_28779021/article/details/80046713

[13] https://blog.csdn.net/qq_39575672/article/details/129708512

[14] https://www.cnblogs.com/lifexy/p/9629699.html

[15] https://blog.csdn.net/weixin_48185168/article/details/133585872


http://www.kler.cn/a/501114.html

相关文章:

  • Mysql 性能优化:覆盖索引
  • 学技术步骤,(tomcat举例)jar包api手写tomcat静态资源基础服务器
  • 内网服务器添加共享文件夹功能并设置端口映射
  • golang常用标准库
  • RabbitMQ高级篇
  • Vue.js组件开发-如何使用moment.js
  • mysql中创建计算字段
  • 网络原理(二)—— https
  • 使用 Python 实现自动化办公(邮件、Excel)
  • 支持向量机算法详解:从理论到实践
  • Redis 源码分析-内部数据结构 dict
  • acwing_5721_化学方程式配平
  • 预编译SQL
  • unity下载newtonsoft-json
  • Spring Boot性能提升的核武器,速度提升500%!
  • 【微服务】面试题 6、分布式事务
  • Agentless:OpenAI 采用的非代理框架
  • Postman接口测试基本操作
  • Linux常见命令总结
  • 循环神经网络(RNN):从基础到未来的应用
  • 美创科技获数字安全产业贡献奖
  • CSS语言的语法糖
  • 【软考】软件设计师
  • RV1126+FFMPEG推流项目(1)总体框架讲解
  • 基于mybatis-plus历史背景下的多租户平台改造
  • EFCore HasDefaultValueSql (续2 HasComputedColumnSql)