当前位置：首页 > article >正文

Prophet时间序列算法总结及python实现案例

article 2024/11/30 12:17:02

一、prophet理论总结

prophet模型是facebook开源的一个时间序列预测算法。[^1][2]，该算法主要为处理具有周期性、趋势变化以及缺失值和异常值的时间序列数据而设计。适合处理日级别（‌或以上频率）‌的时间序列数据，‌设计考虑了业务场景中的时间序列特点，‌如季节性变化、‌假日效应和趋势变化。它的核心思想是将时间序列数据分解为趋势、季节性和假期效应三个部分。
Prophet能够自动检测数据中的趋势和季节性，‌并将它们组合在一起以获得预测值。‌它基于加法模型，‌将时间序列分解成趋势项、‌周期项、‌节假日项/特殊事件影响项以及残差项的组合，‌从而实现对时间序列的有效预测。此外，‌Prophet还提供了强大的可视化分析辅助工具，‌便于分析趋势、‌不同周期、‌不同节假日/特殊事件各自的贡献，‌使得模型解释性较强[^3]。

算法优点

适用于具有季节性和趋势变化的时间序列。
对缺失值和异常值具有较强的鲁棒性。
模型易于使用，适合非专业用户。

算法缺点

对于数据量很大的情况，计算可能会变得比较慢。
对非平稳数据的处理较为简单，可能不足以处理复杂的非平稳特征。

应用场景

适用于各种具有强季节性和趋势性的数据[^4]

Prophet模型既可以使用加法模型，也可以使用乘法模型
在这里插入图片描述

加法模型

y(t)=g(t)+s(s)+h(t)+e(t)
g(t)表示时间序列的趋势，用来拟合非周期性变化的。
s(t)用来表示时间序列的季节性。
h(t)表示时间序列的假期效应，节日等特殊原因等造成的变化。
e(t)为误差项，用他来表示随机无法预测的波动。

适用场景：通常情况下，加法模型适用于时间序列的趋势和季节性与数据规模无关的情况，例如气温和降雨量；

乘法模型

在Prophet模型的乘法模型中，时间序列的预测值是趋势、季节性和假期效应的乘积
y(t)=g(t)∗s(t)∗h(t)∗e(t)

适用场景：用于时间序列的趋势和季节性与数据规模相关的情况，例如商品销售量和股票价格。

二、python导入模块方式

实际在程序导入该模块时，多次检查该模块已安装，但导入时总是提示如下错误[^6]：
ModuleNotFoundError: No module named ‘Prophet’

经过多次尝试和寻求解决方案，最终发现问题所在：
fbprophet 的命名空间可能会与其他库冲突。因此，fbprophet 在导入时通常使用：
from prophet import Prophet
而不是：
import fbprophet

正确的导入方式：
from prophet import Prophet

三、python实现案例

3.1帮助信息

通过pyhton的帮助，调用help(Prophet)查看如下帮助信息，有助于我们更好的了解python中，该函数具体有哪些参数以及相关参数的含义。

Prophet(
    growth='linear',
    changepoints=None,
    n_changepoints=25,
    changepoint_range=0.8,
    yearly_seasonality='auto',
    weekly_seasonality='auto',
    daily_seasonality='auto',
    holidays=None,
    seasonality_mode='additive',
    seasonality_prior_scale=10.0,
    holidays_prior_scale=10.0,
    changepoint_prior_scale=0.05,
    mcmc_samples=0,
    interval_width=0.8,
    uncertainty_samples=1000,
    stan_backend=None,
    scaling: str = 'absmax',
    holidays_mode=None,
)
Docstring:     
Prophet forecaster.

Parameters
----------
growth: String 'linear', 'logistic' or 'flat' to specify a linear, logistic or
    flat trend.
changepoints: List of dates at which to include potential changepoints. If
    not specified, potential changepoints are selected automatically.
n_changepoints: Number of potential changepoints to include. Not used
    if input `changepoints` is supplied. If `changepoints` is not supplied,
    then n_changepoints potential changepoints are selected uniformly from
    the first `changepoint_range` proportion of the history.
changepoint_range: Proportion of history in which trend changepoints will
    be estimated. Defaults to 0.8 for the first 80%. Not used if
    `changepoints` is specified.
yearly_seasonality: Fit yearly seasonality.
    Can be 'auto', True, False, or a number of Fourier terms to generate.
weekly_seasonality: Fit weekly seasonality.
    Can be 'auto', True, False, or a number of Fourier terms to generate.
daily_seasonality: Fit daily seasonality.
    Can be 'auto', True, False, or a number of Fourier terms to generate.
holidays: pd.DataFrame with columns holiday (string) and ds (date type)
    and optionally columns lower_window and upper_window which specify a
    range of days around the date to be included as holidays.
    lower_window=-2 will include 2 days prior to the date as holidays. Also
    optionally can have a column prior_scale specifying the prior scale for
    that holiday.
seasonality_mode: 'additive' (default) or 'multiplicative'.
seasonality_prior_scale: Parameter modulating the strength of the
    seasonality model. Larger values allow the model to fit larger seasonal
    fluctuations, smaller values dampen the seasonality. Can be specified
    for individual seasonalities using add_seasonality.
holidays_prior_scale: Parameter modulating the strength of the holiday
    components model, unless overridden in the holidays input.
changepoint_prior_scale: Parameter modulating the flexibility of the
    automatic changepoint selection. Large values will allow many
    changepoints, small values will allow few changepoints.
mcmc_samples: Integer, if greater than 0, will do full Bayesian inference
    with the specified number of MCMC samples. If 0, will do MAP
    estimation.
interval_width: Float, width of the uncertainty intervals provided
    for the forecast. If mcmc_samples=0, this will be only the uncertainty
    in the trend using the MAP estimate of the extrapolated generative
    model. If mcmc.samples>0, this will be integrated over all model
    parameters, which will include uncertainty in seasonality.
uncertainty_samples: Number of simulated draws used to estimate
    uncertainty intervals. Settings this value to 0 or False will disable
    uncertainty estimation and speed up the calculation.
stan_backend: str as defined in StanBackendEnum default: None - will try to
    iterate over all available backends and find the working one
holidays_mode: 'additive' or 'multiplicative'. Defaults to seasonality_mode.

3.2 案例

如下案例脚本，实际使用时，将数据处理成两列数据，模型整体的运行步骤和其他机器学习模型类似，需要注意的一点是：两列数据的名称必须是 ds 和 y 。因此实际处理完数据后，需要重命名列名称。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from prophet import Prophet  # 使用 prophet 替代 fbprophet

# 生成示例数据：带有季节性和趋势的时间序列
np.random.seed(1024)
dates = pd.date_range('2023-01-01', periods=365)
data = np.linspace(10, 50, 365) + 10 * np.sin(np.linspace(0, 10 * np.pi, 365)) + np.random.randn(365) * 5

# 创建DataFrame
df = pd.DataFrame({'ds': dates, 'y': data})

# 拟合Prophet模型
model = Prophet(yearly_seasonality=True)
model.fit(df)

# 预测未来30天
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# 可视化
fig = model.plot(forecast)
plt.title('Prophet Model Demo')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()