当前位置：首页 > article >正文

Large Language Models(LLMs) Concepts

article 2024/11/15 13:01:12

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

Large: Training data and resources.
Language: Human-like text.
Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

Sentiment analysis
Identifying themes
Translating text or speech
Generating code
Next-word prediction

1.2、Real-world application

Transforming finance industry:

[Investment outlook] | [Annual reports] | [News articles] | [Social media posts]

--> LLM

[Market analysis] | [Portfolio management] [Investment opportunities]

Revolutionizing healthcare sector:

- Analyze patient data to offer personalized recommendations.

- Must adhere to privacy laws.

Education:

- Personalized coaching and feedback.

- Interactive learning experience.

- AI-powered tutor:
  - Ask questions.
  - Receive guidance.
  - Discuss ideas.

Visual question answering:

Defining multimodel:

Multimodel:
- Many types of processing or generation

Nun-multimodel:
- One type of processing or generation



Visual question answering:
- Answers to questions about visual content
- Object identification & relationships
- Scene description

1.3、Challenges of language modeling

Sequence matters
Context modeling
Long-range dependency
Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

Overcome data's unstructured nature
Outperform traditional models
Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

Tokenization: Splits text into individual words, or tokens.
Stop word removal: Stop words do not add meaning.
Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

Text data into numerical form.

Bag-of-words:

Limitation:

- Does not capture the order or context.

- Does not capture the semantics between the words.

Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.


Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

No explicit training.
Uses language understanding and context.
Generalizes without any prior examples.

2.4.2、Few-shot learning

Learn a new task with a few examples.

2.4.3、Multi-shot learning

Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training

- Input data of text tokens.

- Trained to predict the tokens within the dataset.



Types:

- Next word prediction.

- Masked language modeling.

3.1.2、Next word prediction

Supervised learning technique.
Predicts next word and generates coherent text.
Captures the dependencies between words.
Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

Hides a selective word.
Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

Relationship between words.
Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

Text preprocessing: tokenization, stop word removal, lemmatization.
Text representation: word embedding.

(2) Positional encoding:

Information on the position of each word.
Understand distant words.

(3) Encoders:

Attention mechanism: directs attention to specific words and relationships.
Neural network: process specific features.

(4) Decoders:

Includes attention and neural networks.
Generates the output.

3.2.3、Transformers and long-range dependencies

Initial challenge: lone-range dependency.
Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

Limitation of traditional language models: Sequential - one word at a time.
Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

Understand complex structures.
Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

Pre-training：
Fine-tuning:
RLHF:
（1）Why RLHF?

（2）Starts with the need to fine-tune

3.4.2、Simplifying RLHF

Model output reviewed by human.
Updates model based on the feedback.

Step1:

Receives a prompt.
Generates multiple responses.

Step2:

Human expert checks these responses.
Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

Learns from expert's ranking.
To align its response in future with their preferences.

And it goes on:

Continues to generate responses.
Receives expert's rankings.
Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

Data volume and compute power.
Data quality.
Labeling.
Bias.
Privacy.

4.1.1、Data volume and compute power

LLMs need a lot of data.
Extensive computing power.
Can cost millions of dollars.

4.1.2、Data quality

Quality data is essential.

4.1.3、Labeled data

Correct data label.
Labor-intensive.
Incorrect labels impact model performance.
Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

Influenced by societal stereotypes.
Lack of diversity in training data.
Discrimination and unfair outcomes.

Spot and deal with the biased data:

Evaluate data imbalances.
Promote diversity.
Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

Compliance with data protection and privacy regulations.
Sensitive or personally identifiable information (PII).
Privacy is a concern.
Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

Transparency risk - Challenging to understand the output.
Accountavility risk - Responsibility of LLMs' actions.
Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

Ecological footprint of LLMs.
Substantial energy resources to train.
Impact through carbon emissions.

4.3、Where are LLMs heading?

Model explainability.
Efficiency.
Unsupervised bias handling.
Enhanced creativity.

http://www.kler.cn/a/293113.html

相关文章：

【Excel】身份证号最后一位“X”怎么计算

arcgis做buffer

Database Advantages (数据库系统的优点)

Dockerfile的使用

【R78/G15 开发板测评】串口打印 DHT11 温湿度传感器、DS18B20 温度传感器数据，LabVIEW 上位机绘制演化曲线

贪心算法入门（二）

状压DP

docker容器命令汇总（全）

投资 - 什么是空中成交

CleanMyMac X2024破解激活码许可证号码

Flutter【03】图片输出package依赖关系

Alternative account/备选科目代码配置说明【1:1和国家科目配置运营科目】

Uniapp基础学习（二）

前端---对MVC MVP MVVM的理解

在postman中使用javascript脚本生成sign签名

VBA语言専攻T3学员领取资料通知

我父母对AI不太信任，直到我给他们展示了这7款应用

Datawhale X 李宏毅苹果书 AI夏令营进阶 Task3-批量归一化+卷积神经网络

【2024数模国赛赛题思路公开】国赛B题思路丨附可运行代码丨无偿自提

[数据集][目标检测]玉米病害检测数据集VOC+YOLO格式6000张4类别

分布式：浅谈幂等

浅谈城市地铁智能照明系统的能耗分析及节能措施

深度学习应用 - 大规模深度学习篇

pytorch pyro 贝叶斯神经网络 bnn beyesean neure network svi 定制SVI目标和培训循环，变更推理

算法day16|654.最大二叉树、617.合并二叉树、700.二叉搜索树中的搜索、98.验证二叉搜索树

救命！我已经彻底被最近的FLUX模型征服了