当前位置: 首页 > article >正文

[MDM 2024]Spatial-Temporal Large Language Model for Traffic Prediction

论文网址:[2401.10134] Spatial-Temporal Large Language Model for Traffic Prediction

论文代码:GitHub - ChenxiLiu-HNU/ST-LLM: Official implementation of the paper "Spatial-Temporal Large Language Model for Traffic Prediction"

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Large Language Models for Time Series Analysis

2.3.2. Traffic Prediction

2.4. Problem Definition

2.5. Methodology

2.5.1. Overview

2.5.2. Spatial-Temporal Embedding and Fusion

2.5.3. Partially Frozen Attention (PFA) LLM

2.6. Experiments

2.6.1. Datasdets

2.6.2. Baselines

2.6.3. Implementations

2.6.4. Evaluation Metrics

2.6.5. Main Results

2.6.6. Performance of ST-LLM and Ablation Studies

2.6.7. Parameter Analysis

2.6.8. Inference Time Analysis

2.6.9. Few-Shot Prediction

2.6.10. Zero-Shot Prediction

2.7. Conclusion

3. Reference


1. 心得

(1)尽管几天后要投的论文还没开始写,仍然嚼嚼饼干写写阅读笔记。哎。这年头大家都跑得太快了

(2)比起数学,LLM适合配一杯奶茶读,全程轻松愉悦,这一篇就是分开三个卷积→合在一起→LLM(部分解冻一些模块)→over

2. 论文逐段精读

2.1. Abstract

        ①They proposed Spatial-Temporal Large Language Model (ST-LLM) to predict traffic(好像没什么特别的我就不写了,就是在介绍方法,说以前的精度不高。具体方法看以下图吧)

2.2. Introduction

        ①Traditional CNN and RNN cannot capture complex/long range spatial and temporal dependencies. GNNs are prone to overfitting, thus reseachers mainly use attention mechanism.

        ②Existing traffic prediction methods mainly focus on temporal feature rather than spatial

        ③For better long term prediction, they proposed partially frozen attention (PFA)

2.3. Related Work

2.3.1. Large Language Models for Time Series Analysis

        ①Listing TEMPO-GPT, TIME-LLM, OFA, TEST, and LLM-TIME, which all utilize temporal feature only. However, GATGPT, which introduced spatial feature, ignores temporal dependencies.

imputation  n.归责;归罪;归咎;归因

2.3.2. Traffic Prediction

        ①Filter is a common and classic method for processing traffic data

        ②Irrgular city net makes CNN hard to apply or extract spatial feature

2.4. Problem Definition

        ①Input traffic data: \mathbf{X}\in\mathbb{R}^{T\times N\times C}, where T denotes timesteps, N denotes numberof spatial stations, C denotes feature

        ②Task: given historical traffic data \mathbf{X}_{P}=\{\mathbf{X}_{t-P+1},\mathbf{X}_{t-P+2},\ldots,\mathbf{X}_{t}\}\in\mathbb{R}^{P\times N\times C} of P time steps only, learning a function f\left ( \cdot \right ) with parameter \theta to predict future S timesteps: \mathbf{Y}_{S}=\{\mathbf{Y}_{t+1},\mathbf{Y}_{t+2},\ldots,\mathbf{Y}_{t+S}\}\in\mathbb{R}^{S\times N\times C}:

[\mathbf{X}_{t-P+1},\mathbf{X}_{t-P+2},\ldots,\mathbf{X}_{t}]\xrightarrow{f(\cdot)}[\mathbf{Y}_{t+1},\mathbf{Y}_{t+2},\ldots,\mathbf{Y}_{t+S}]

2.5. Methodology

2.5.1. Overview

        ①Overall framework of ST-LLM:

where Spatial-Temporal Embedding layer extracts timesteps \mathbf{E}_{T}\in\mathbb{R}^{N\times D}, spatial embedding \mathbf{E}_{S}\in\mathbb{R}^{N\times D}, and temporal embedding \mathbf{E}_{P}\in\mathbb{R}^{N\times D} of historical P timesteps. Then, they three are combined to \mathbf{E}_{F}\in\mathbb{R}^{N\times3D}. Freeze first F layers and preserve last U layers in PFA LLM and get output \mathbf{H}^{L}\in\mathbb{R}^{N\times3D}. Lastly, regresion convolution convert it to \widehat{\mathbf{Y}}_{S}\in\mathbb{R}^{S\times N\times C}.

2.5.2. Spatial-Temporal Embedding and Fusion

        ①They get tokens by pointwise convolution:

\mathbf{E}_{P}=PConv(\mathbf{X}_{P};\theta_{p})

        ②Applying linear layer to encode input \mathbf{X}_P\in\mathbb{R}^{P\times N\times C} to day \mathbf{X}_{day}\in\mathbb{R}^{N\times T_{d}} and week \mathbf{X}_{week}\in\mathbb{R}^{N\times T_{w}}:

E_T^d = W_{day}(X_{day}), \\ E_T^w = W_{week}(X_{week}), \\ E_T = E_T^d + E_T^w.

where \mathbf{W}_{day}\in\mathbb{R}^{T_{d}\times D} and \mathbf{W}_{week}\in\mathbb{R}^{T_{w}\times D} are learnable parameter and the output is \mathbf{E}_{T}\in\mathbb{R}^{N\times D}

        ③They extract spatial correlations by:

\mathbf{E}_S=\sigma(\mathbf{W}_s\cdot\mathbf{X}_\mathbf{P}+\mathbf{b}_s)

        ④Fusion convolution:

\mathbf{H}_F=FConv(\mathbf{E}_P||\mathbf{E}_S||\mathbf{E}_T;\theta_f)

where \mathbf{H}_{F}\in\mathbb{R}^{N\times3D}

2.5.3. Partially Frozen Attention (PFA) LLM

        ①They freeze the first F layers (including multihead attention and feed-forward layers) which contains important information:

\mathbf{\bar{H}}^{i}=MHA\left(LN\left(\mathbf{H}^{i}\right)\right)+\mathbf{H}^{i},\\\mathbf{H}^{i+1}=FFN\left(LN\left(\mathbf{\bar{H}}^{i}\right)\right)+\mathbf{\bar{H}}^{i},

where i \in \left \{ 1,F-1 \right \}\mathbf{H}^{1}=[\mathbf{H}_{F}+\mathbf{P}\mathbf{E}]\mathrm{PE} denotes learnable positional encoding, \mathbf{\bar{H}}^{i} represents the intermediate representation of the i-th layer after applying the frozen multi-head attention (MHA) and the first unfrozen layer normalization (LN), \mathbf{H}^{i} symbolizes the final representation after applying the unfrozen LN and frozen feed-forward network (FFN), and:

LN \left( \mathbf { H } ^ { i } \right) = \gamma \odot \frac { \mathbf { H } ^ { i } - \mu } { \sigma } + \beta ,\\ MHA ( \tilde { \mathbf { H } } ^ { i } ) = \mathbf { W } ^ { O } ( \mathrm { h e a d } _ { 1 } ^ { i } \| \cdots \| \mathrm { h e a d } _ { h } ^ { i } ) ,\\ \mathrm { h e a d } _ { k } ^ { i } = A t t e n t i o n ( \mathbf { W } _ { q } ^ { k } \tilde { \mathbf { H } } ^ { i } , \mathbf { W } _ { k } ^ { k } \tilde { \mathbf { H } } ^ { i } , \mathbf { W } _ { v } ^ { k } \tilde { \mathbf { H } } ^ { i } ) ,\\ A t t e n t i o n ( \tilde { \mathbf { H } } ^ { i } ) = \operatorname { s o f t m a x } \left( \frac { \tilde { \mathbf { H } } ^ { i } \tilde { \mathbf { H } } ^ { i T } } { \sqrt { d _ { k } } } \right) \tilde { \mathbf { H } } ^ { i } ,\\ F F N ( \tilde { \mathbf { H } } ^ { i } ) = \max \left( 0 , \mathbf { W } _ { 1 } \tilde { \mathbf { H } } ^ { i + 1 } + \mathbf { b } _ { 1 } \right) \mathbf { W } _ { 2 } + \mathbf { b } _ { 2 } ,\\

        ②Unfreezing the last U layers:

\mathbf{\bar{H}^{F+U-1}}=MHA\left(LN\left(\mathbf{H^{F+U-1}}\right)\right)+\mathbf{H^{F+U-1}},\\\mathbf{H^{F+U}}=FFN\left(LN\left(\mathbf{\bar{H}^{F+U-1}}\right)\right)+\mathbf{\bar{H}^{F+U-1}},

        ③The final regresion convolution (RConv):

\hat{\mathbf{Y}}_{S}=RCon\nu(\mathbf{H}^{F+U};\theta_{r})

        ④Loss function:

\mathcal{L}=\left\|\widehat{\mathbf{Y}}_{S}-\mathbf{Y}_{S}\right\|+\lambda\cdot L\mathrm{reg}

where \mathbf{Y}_{S} is ground truth

        ⑤Algorithm:

2.6. Experiments

2.6.1. Datasdets

        ①Statistics of datasets:

        ②NYCTaxi: includes 266 virtual stations and 4,368 timesteps (each timestep is half-hour)

        ③CHBike: includes 250 sites and 4,368 timesteps (30 mins as well)

2.6.2. Baselines

        ①GNN based baselines: DCRNN, STGCN, GWN, AGCRN, STGNCDE, DGCRN

        ②Attention based model: ASTGCN, GMAN, ASTGNN

        ③LLMs: OFA, GATGPT, GCNGPT, LLAMA2

2.6.3. Implementations

        ①Data split: 6:2:2

        ②Historical and future timesteps: P=12,S=12

        ③T_w=7,T_d=48

        ④Learning rate: 0.001 and Ranger21 optimizer for LLM and 0.001 and Adam for GCN and attention based

        ⑤LLM: GPT2 and LLAMA2 7B

        ⑥Layer: 6 for GPT2 and 8 for LLAMA2

        ⑦Epoch: 100

        ⑧Batch size: 64

2.6.4. Evaluation Metrics

        ①Metrics: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE)

2.6.5. Main Results

        ①Performance table:

2.6.6. Performance of ST-LLM and Ablation Studies

        ①Module ablation:

        ②Frozen ablation:

2.6.7. Parameter Analysis

        ①Hyperparameter U ablation:

2.6.8. Inference Time Analysis

        ①Inference time table:

2.6.9. Few-Shot Prediction

        ①10% samples few-shot learning:

2.6.10. Zero-Shot Prediction

        ①Performance:

2.7. Conclusion

        ~

3. Reference

@inproceedings{liu2024spatial,
  title={Spatial-Temporal Large Language Model for Traffic Prediction},
  author={Liu, Chenxi and Yang, Sun and Xu, Qianxiong and Li, Zhishuai and Long, Cheng and Li, Ziyue and Zhao, Rui},
  booktitle={MDM},
  year={2024}
}


http://www.kler.cn/a/558533.html

相关文章:

  • Linux命令大全完整版(02)
  • 【漫话机器学习系列】101.特征选择法之Lasso(Lasso For Feature Selection)
  • 【力扣Hot 100】堆
  • 【uni-app】对齐胶囊容器组件
  • Future和FutureTask实现类详解以及使用。
  • 阿里云CDN转https个人测试证书过期更换
  • CentOS 7.9 解决 python3 报错 ModuleNotFoundError: No module named ‘_ssl‘ 的问题
  • Gradio全解11——使用transformers.agents构建Gradio UI(6)
  • 字节跳动2面、美团2面Java面试真题总结
  • 跟着 Lua 5.1 官方参考文档学习 Lua (7)
  • vscode settings(一):全局| 用户设置常用的设置项
  • UE_C++ —— Delegates
  • Selenium控制已经打开的浏览器(Chrome,Edge)
  • 计算机网络之路由协议(RIP路由协议)
  • 选择排序(详解)c++
  • 智能控制基础应用-C#Codesys共享内存实现数据高速交互
  • 十、OSG学习笔记-多线程(OpenThreads)
  • android 网络防护 手机网络安全怎么防
  • ArcGIS Pro在洪水淹没分析中的应用与实践
  • 全面汇总windows进程通信(二)