当前位置: 首页 > article >正文

Large Language Models(LLMs) Concepts

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

  • Large: Training data and resources.
  • Language: Human-like text.
  • Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

  • Sentiment analysis
  • Identifying themes
  • Translating text or speech
  • Generating code
  • Next-word prediction

1.2、Real-world application

  • Transforming finance industry: 
    [Investment outlook] | [Annual reports] | [News articles] | [Social media posts]
    
    --> LLM
    
    [Market analysis] | [Portfolio management] [Investment opportunities]

  • Revolutionizing healthcare sector:
    - Analyze patient data to offer personalized recommendations.
    
    - Must adhere to privacy laws.

  • Education:
    - Personalized coaching and feedback.
    
    - Interactive learning experience.
    
    - AI-powered tutor:
      - Ask questions.
      - Receive guidance.
      - Discuss ideas.

  • Visual question answering:
    Defining multimodel:
    
    Multimodel:
    - Many types of processing or generation
    
    Nun-multimodel:
    - One type of processing or generation
    
    
    
    Visual question answering:
    - Answers to questions about visual content
    - Object identification & relationships
    - Scene description

1.3、Challenges of language modeling

  • Sequence matters
  • Context modeling
  • Long-range dependency
  • Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

  • Overcome data's unstructured nature
  • Outperform traditional models
  • Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

  • Tokenization: Splits text into individual words, or tokens.

  • Stop word removal: Stop words do not add meaning.

  • Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

  • Text data into numerical form.
  • Bag-of-words:

     
    Limitation:
    
    - Does not capture the order or context.
    
    - Does not capture the semantics between the words.

  • Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.


Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

  • No explicit training.
  • Uses language understanding and context.
  • Generalizes without any prior examples.

2.4.2、Few-shot learning

  • Learn a new task with a few examples.

2.4.3、Multi-shot learning

  • Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training

- Input data of text tokens.

- Trained to predict the tokens within the dataset.



Types:

- Next word prediction.

- Masked language modeling.

3.1.2、Next word prediction

  • Supervised learning technique.
  • Predicts next word and generates coherent text.
  • Captures the dependencies between words.
  • Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

  • Hides a selective word.
  • Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

  • Relationship between words.
  • Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

  • Text preprocessing: tokenization, stop word removal, lemmatization.
  • Text representation: word embedding.

(2) Positional encoding:

  • Information on the position of each word.
  • Understand distant words.

(3) Encoders:

  • Attention mechanism: directs attention to specific words and relationships.
  • Neural network: process specific features.

(4) Decoders:

  • Includes attention and neural networks.
  • Generates the output.

3.2.3、Transformers and long-range dependencies

  • Initial challenge: lone-range dependency.
  • Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

  • Limitation of traditional language models: Sequential - one word at a time.
  • Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

  • Understand complex structures.
  • Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

  • Pre-training:
  • Fine-tuning:
  • RLHF:
    (1)Why RLHF?

    (2)Starts with the need to fine-tune

3.4.2、Simplifying RLHF

  • Model output reviewed by human.
  • Updates model based on the feedback.

Step1:

  • Receives a prompt.
  • Generates multiple responses.

Step2:

  • Human expert checks these responses.
  • Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

  • Learns from expert's ranking.
  • To align its response in future with their preferences.

And it goes on:

  • Continues to generate responses.
  • Receives expert's rankings.
  • Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

  • Data volume and compute power.
  • Data quality.
  • Labeling.
  • Bias.
  • Privacy.

4.1.1、Data volume and compute power

  • LLMs need a lot of data.
  • Extensive computing power.
  • Can cost millions of dollars.

4.1.2、Data quality

  • Quality data is essential.

4.1.3、Labeled data

  • Correct data label.
  • Labor-intensive.
  • Incorrect labels impact model performance.
  • Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

  • Influenced by societal stereotypes.
  • Lack of diversity in training data.
  • Discrimination and unfair outcomes.

Spot and deal with the biased data:

  • Evaluate data imbalances.
  • Promote diversity.
  • Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

  • Compliance with data protection and privacy regulations.
  • Sensitive or personally identifiable information (PII).
  • Privacy is a concern.
  • Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

  • Transparency risk - Challenging to understand the output.
  • Accountavility risk - Responsibility of LLMs' actions.
  • Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

  • Ecological footprint of LLMs.
  • Substantial energy resources to train.
  • Impact through carbon emissions.

4.3、Where are LLMs heading?

  • Model explainability.
  • Efficiency.
  • Unsupervised bias handling.
  • Enhanced creativity.


http://www.kler.cn/news/293113.html

相关文章:

  • 状压DP
  • docker容器命令汇总(全)
  • 投资 - 什么是空中成交
  • CleanMyMac X2024破解激活码许可证号码
  • Flutter【03】图片输出package依赖关系
  • Alternative account/备选科目代码配置说明 【1:1和国家科目配置运营科目】
  • Uniapp基础学习(二)
  • 前端---对MVC MVP MVVM的理解
  • 在postman中使用javascript脚本生成sign签名
  • VBA语言専攻T3学员领取资料通知
  • 我父母对AI不太信任,直到我给他们展示了这7款应用
  • Datawhale X 李宏毅苹果书 AI夏令营 进阶 Task3-批量归一化+卷积神经网络
  • 【2024数模国赛赛题思路公开】国赛B题思路丨附可运行代码丨无偿自提
  • [数据集][目标检测]玉米病害检测数据集VOC+YOLO格式6000张4类别
  • 分布式:浅谈幂等
  • 浅谈城市地铁智能照明系统的能耗分析及节能措施
  • 深度学习应用 - 大规模深度学习篇
  • pytorch pyro 贝叶斯神经网络 bnn beyesean neure network svi ​定制SVI目标和培训循环,变更推理
  • 算法day16|654.最大二叉树、617.合并二叉树、700.二叉搜索树中的搜索、98.验证二叉搜索树
  • 救命!我已经彻底被最近的FLUX模型征服了
  • 东南大学研究生-数值分析上机题(2023)Python 3 线性代数方程组数值解法
  • 初始MYSQL数据库(2)——创建、查询、更新、删除数据表的相关操作
  • python语言读入Excel文件
  • C++复习day05
  • Selenium分布式测试和操作监听
  • AUTOSAR开源OS——Trampoline的编译与使用(一)
  • Hive的存储格式
  • 晨控CK-FR08与汇川5U系列PLC配置EtherNet/IP通讯连接手册
  • MySQL的知识阶段小总结
  • k8s服务发布Ingress