当前位置：首页 > article >正文

Linux进程控制（四）之进程程序替换

article 2025/3/24 16:23:43

文章目录

- 进程程序替换
- - 单进程版程序替换
  - 替换原理
  - 多进程版程序替换
  - 替换函数
  - 函数解释
  - - 小知识
  - 命名理解

进程程序替换

如果要让子进程执行与父进程完全不同的代码，就要进行进程程序替换。

单进程版程序替换

执行一个可执行文件

makefile

mycommand:mycommand.c
    gcc -o $@ $^ -std=c99
.PHONY:clean
clean:
    rm -rf mycommand

mycommand.c

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main()
{
    printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
    //这类方法的标准写法 
    execl("/usr/bin/ls","ls","-a","-l",NULL);//必须以NULL结尾                              
    printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
    return 0;
}

现象：after没有执行，把ls -a -l执行了，每个pid也不一样了。

mycommand.c

#include <stdio.h>  
#include <unistd.h>  
#include <stdlib.h>  
  
int main()  
{  
    printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid()); 
    //这类方法的标准写法   
    //execl("/usr/bin/ls","ls","-a","-l",NULL);//必须以NULL结尾  
    execl("/usr/bin/top","top",NULL);//必须以NULL结尾                                      
    printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());  
    return 0;                                               
}

程序可以把系统里的命令封装起来，程序变成进程之后把命令跑起来。

替换原理

用fork创建子进程后执行的是和父进程相同的程序(但有可能执行不同的代码分支),

子进程往往要调用一种exec函数以执行另一个程序。

当进程调用一种exec函数时,该进程的用户空间代码和数据完全被新程序替换,从新程序的启动例程开始执行。

调用exec并不创建新进程,所以调用exec前后该进程的id并未改变。

代码运行起来就是我们系统当中的一个进程，

创建新进程总会创建一个进程的PCB(task_struct)，地址空间，页表等，

要把代码、数据加载到内存里，通过页表映射内存，

然后CPU根据PCB找到虚拟地址、页表映射代码和数据并执行。

以ls为例，ls用execl加载进内存，

直接把ls的代码替换原来的代码，

ls的数据直接替换原来的数据。（没有创建新进程！）

（如果ls代码数据加载到内存了，小了就新增空间，大了就释放空间）

把页表的右侧的映射地址改变一下（收缩一下）。

新加载的程序从main函数重新运行（从0开始执行）。

–程序替换！

多进程版程序替换

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h>
int main()
{
    pid_t id=fork();
    if(id==0)
    {
        printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        sleep(5);
        //这类方法的标准写法 
        execl("/usr/bin/ls","ls","-a","-l",NULL);//必须以NULL结尾
        printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        exit(0);        
    }                   
    pid_t ret=waitpid(id,NULL,0);
    if(ret>0)                                                                         
    {                                                                                 
        printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);              
    }
    sleep(5);
    return 0;
}

在子进程sleep(5)的时候，父进程阻塞等待子进程退出。

子进程在调用execl时，不会影响父进程。

原因：发生了写时拷贝，进程之间是相互独立的。

没有调用execl的时候，子进程只指向父进程的代码和数据，

当子进程执行execl时，ls代码数据替换进来的时候，

发生写时拷贝，不管是代码还是数据都有写时拷贝。

代码不一定是不可被写入的，用户直接写，操作系统会拦截崩溃，

但是execl是让操作系统写。

程序替换没有创建新进程，只进行代码和数据的程序替换。

父子进程的pid一直都没有变。

子进程结束前是2254，结束后还是2254。

所以，task_struct,mm_struct是没有被释放或者重新建立的。（字段可能会稍微修改）

替换函数

其实有六种以exec开头的函数,统称exec函数:

int execve(const char *path, char *const argv[], char *const envp[]);

函数解释

after及之后的代码没有被执行，

原因：after及后面代码是在execl程序替换之后的，

原来的代码会被替换，所以，after及后面代码被ls的代码替换了，

所以后续代码不会被执行。

程序替换成功之后，exec* 后续的代码不会被执行。

程序替换失败了，才可能执行后续代码。

这些函数如果调用成功则加载新的程序从启动代码开始执行,不再返回。

如果调用出错则返回-1

所以exec* 函数只有失败的返回值而没有成功的返回值。

小知识

CPU如何得知程序的入口地址？

Linux中形成的可执行程序，是由格式（ELF）的，可执行程序的表头，可执行程序的入口地址就在表头！！！

可执行程序的表包括：代码段，数据段，只读数据区等，这些段区的地址在表头。

加载可执行程序时，代码和数据可以先不加载，

但是一定要先加载表头，每个数据区的开始的数字（start）就是在表头中来的。

当我们替换了一个新进程，新进程也有表头，CPU就可以在表头读到对应的可执行程序的入口。

可执行程序在编译的时候，就产生了一个STARTART的地址（程序入口），

编写到表头中，加载到内存时，CPU可以获取。

表头一方面初始化我们的地址空间、页表等，另一方面告诉CPU代码的程序入口在哪里。

命名理解

l(list) : 表示参数采用列表

v(vector) : 参数用数组

p(path) : 有p自动搜索环境变量PATH

e(env) : 表示自己维护环境变量

#include <unistd.h>`

int execl(const char *path, const char *arg, ...);//库函数

int execlp(const char *file, const char *arg, ...);

int execle(const char *path, const char *arg, ...,char *const envp[]);

int execv(const char *path, char *const argv[]);//系统调用接口

int execvp(const char *file, char *const argv[]);

execl：

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h>
int main()
{
    pid_t id=fork();
    if(id==0)
    {
        printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
		execl("/usr/bin/ls","ls","-a","-l",NULL);//必须以NULL结尾               
        printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        exit(0);                                                   
    }                                                              
    pid_t ret=waitpid(id,NULL,0);                                  
    if(ret>0)                                                           
    {                                                                   
        printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
    }                                                              
    return 0;                                                      
}

传参的时候，从第二个参数开始，一个一个地传递，最后一个必须为NULL。

函数名带 ‘l’ 的，传参采用可变参数，而且传参时，一个一个地传。

从第二个参数开始，在命令行当中怎么写的，就依次怎么传递参数给程序，

空格改成逗号，最后加上NULL。

要执行一个程序第一件事就是找到要执行的程序。

所有的exec* 第一个参数（const char *path）：决定如何找到该程序。

名字不带 ‘p’ 必须是全路径（绝对或者相对路径的方式找到要执行的程序）

第一个参数解决：在什么路径下找到该程序。

第二个参数解决：如何执行这个程序，要不要涵盖选项，涵盖哪些。

execlp：

PATH：execlp会在默认的PATH环境变量中查找。

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h>
int main()
{
    pid_t id=fork();
    if(id==0)
    {
        printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        //execlp("/usr/bin/ls","ls","-a","-l",NULL);
        execlp("ls","ls","-a","-l",NULL);             
        printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        exit(0);                                                   
    }                                                              
    pid_t ret=waitpid(id,NULL,0);                                  
    if(ret>0)                                                           
    {                                                                   
        printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
    }                                                              
    return 0;                                                      
}

可以带路径也可以不带路径。

函数名中带了 ‘p’ 就不用带路径了，execlp会自动在环境变量中查找，

所有子进程都会继承父进程的环境变量列表，

当前的进程所有的环境变量都是从bash里来的，

bash里面本来就有path，就会被所有的子进程继承。（环境变量具有全局属性）

ls在usr/bin目录下，找到之后执行。

有两个ls是因为：

第一个参数为了找到程序，不仅要告诉execlp路径（路径由path解决），

而且要告诉execlp要执行什么程序。

第一个ls代表要执行谁，第二个ls代表要怎么执行。

execv：

char *const argv[] -> 字符串指针数组

第一个参数：怎么找到该程序。

第二个参数：如何执行这个程序。

就是把可变参数的传参形式变成了指针数组的形式。

中间const表示：

一旦写进去了，指向的地址不能改变。（指针本身不能改）

内容可以改，"-a"等可以改。（指针指向的内容可以修改）

  #include <stdio.h>
  #include <unistd.h>
  #include <stdlib.h>
  #include <sys/wait.h>
  #include <sys/types.h>
  int main()
  {
      pid_t id=fork();
      if(id==0)
      {
          char *const myargv[]={
              "ls",
              "-a",
              "-l",
              NULL
          };
          printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          execv("/usr/bin/ls",myargv);                           
          printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          exit(0);                                                      
      }                                                          
      pid_t ret=waitpid(id,NULL,0);                                     
      if(ret>0)                                                         
      {                                                                 
          printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
      }                                                    
      return 0;                                            
  }

ls是一个程序，有main函数，main函数有命令行参数，命令行参数由execv系统调用，第二个参数传入的。

可变参数最终也要变成指针数组的形式，然后传入ls调用的main函数当中。

execv系统调用，系统获取的命令行参数，会把参数传递给ls的main函数。

在Linux当中，所有的进程都是别人的子进程，

在命令行当中，所有的进程都是bash的子进程。

所以，所有的进程在启动的时候都是采用exec*系列函数来启动执行的。

程序替换在单进程当中，是把对应的可执行程序的代码和数据加载到内存当中，

为当前进程开辟空间等，然后把自己的代码数据加载进内存。

所以，exec*系列函数 - 起到了加载器的作用。（代码级别的加载器）

exec*函数把磁盘当中的可执行程序加载到内存中。

所以，exec*里会存在诸如内存申请、外设访问等动作。

exec把可执行程序导入到内存里，可以获得命令行参数，

所以execv就可以直接调用ls的main函数时，把argv参数传递给程序。

所有的函数都是压栈，在调用main函数之前先形成一个简单的栈帧结构，

把argv的地址push进去，构造一个main函数被调用的上下文，就可以把argv传入。

所以我们是可以把命令行参数传递给可执行程序的。

execvp：

  #include <stdio.h>
  #include <unistd.h>
  #include <stdlib.h>
  #include <sys/wait.h>
  #include <sys/types.h>
  int main()
  {
      pid_t id=fork();
      if(id==0)
      {
          char *const myargv[]={
              "ls",
              "-a",
              "-l",
              NULL
          };
          printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          execvp("ls",myargv);                           
          printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          exit(0);                                                      
      }                                                          
      pid_t ret=waitpid(id,NULL,0);                                     
      if(ret>0)                                                         
      {                                                                 
          printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
      }                                                    
      return 0;                                            
  }

exec*能够执行系统命令，那么也能执行我们自己的命令

一个c语言程序调用一个C++程序，两个可执行代码。

#include <stdio.h>  
#include <unistd.h>  
#include <stdlib.h>  
#include <sys/wait.h>  
#include <sys/types.h>  
int main()  
{  
    pid_t id=fork();  
    if(id==0)  
    {
        printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());  
        execl("./otherexe","otherexe",NULL);
        printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        exit(0);
    }
    pid_t ret=waitpid(id,NULL,0);
    if(ret>0)
    {
        printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
    }
    return 0;
}

第一个参数代表要执行的文件在哪，执行谁。

第二个参数代表怎么执行。

execl("./otherexe","otherexe",NULL);
execl("./otherexe","./otherexe",NULL);

以上两个都可以跑，结果一样。

命令行带"./"是因为要告诉bash可执行程序在哪。

但是execl的第一个参数已经说明了可执行程序的路径，所以第二个参数可以不用写"./"了。

用c语言调用其他的语言

c语言调用sh

test.sh

#!usr/bin/bash

function myfun()
{
    cnt=1
    while [ $cnt -le 10 ]
    do
        echo "Hello $cnt"
        let cnt++
    done
}
echo "Hello Linux!"
echo "Hello Linux!"
echo "Hello Linux!"

ls -a -l

myfun

所有的脚本语言都以"#!"开头，后面跟着脚本语言对应的解释器。

脚本语言并不是脚本在跑，而是由解释器来解释式执行的。

mycommand.c

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h>
int main()
{
    pid_t id=fork();
    if(id==0)
    {
        printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        execl("/usr/bin/bash","bash","test.sh",NULL);
        printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        exit(0);                                                    
    }                                                               
    pid_t ret=waitpid(id,NULL,0);                                   
    if(ret>0)                                                       
    {                                                                   
        printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
    }                                                               
    return 0;                                                       
}

在命令行上，要执行的可执行文件不是脚本文件，而是脚本文件的解释器。

用c语言调用py

  1 #!/usr/bin/python3                                                              
  2                                                                                 
  3 print("Hello py!")

mycommand.c

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h>
int main()
{
    pid_t id=fork();
    if(id==0)
    {
        printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        execl("/usr/bin/python3","python3","test.py",NULL);
        printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
        exit(0);                                                
    }                                                           
    pid_t ret=waitpid(id,NULL,0);                               
    if(ret>0)                                                   
    {                                                                   
        printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
    }                                                           
    return 0;                                                   
}

无论是可执行程序，还是脚本语言，为什么能跨语言调用？

所有语言运行起来，本质都是进程！

只要是进程就可以被调用。

基本上所有的语言都有execl等的接口。

补充：

C++文件名后缀包括： .cc .cpp .cxx

otherex.cc

#include <iostream>
using namespace std;
int main()
{
    cout<<"Hello C++ Linux!"<<endl;
    cout<<"Hello C++ Linux!"<<endl;
    cout<<"Hello C++ Linux!"<<endl;
    return 0;
}

makefile

mycommand:mycommand.c
    gcc -o $@ $^ -std=c99
otherexe:otherexe.cpp
    g++ -o $@ $^ -std=c++11
.PHONY:clean                     
clean:                       
    rm -rf mycommand otherexe

为什么只形成mycommand呢

在makefile中，自上往下的扫描，遇到的第一个文件就是目标文件，所以只执行目标文件的方法。

哪个目标文件在前就执行哪个依赖方法。

.PHONY:all
all:otherexe mycommand

mycommand:mycommand.c      
    gcc -o $@ $^ -std=c99  
otherexe:otherexe.cpp      
    g++ -o $@ $^ -std=c++11  
.PHONY:clean                 
clean:                       
    rm -rf mycommand otherexe

在makefile自顶向下扫描时，遇到的第一个目标文件是伪目标all

all又依赖于otherexe和mycommand，

所以就先形成otherexe，再形成mycommand

all没有依赖方法，所以关系推导完之后，就不执行了。

execle:

我们可以在我们编写的代码里获取命令行参数和环境变量！

可以验证mycommand给otherexe传入命令行参数和环境变量

mycommand一个程序形成的环境变量如何导给另一个程序otherexe？

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h> 
int main()
  {
      pid_t id=fork();
      if(id==0)
      {
          printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          char*const myargv[]={       
              "otherexe",
              "-a",
              "-b",
              "-c",
              NULL
          };
          execv("./otherexe",myargv);
          printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          exit(0);
      }
      pid_t ret=waitpid(id,NULL,0);
      if(ret>0)
      {
          printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
      }
      return 0;
  }

会把myargv作为参数传递给otherexe，otherexe就可以拿到对应的参数了。

  #include <iostream>
  using namespace std;
  int main(int argc,char *argv[])
  {
      cout<<argv[0]<<" begin running"<<endl;
      for(int i=0;argv[i];i++)
      {
          cout<<i<<" : "<<argv[i]<<endl;
      }
      cout<<argv[0]<<" stop running"<<endl;
      return 0;                           
  }

所以，exec所对应的参数就传入了。

  #include <iostream>
  using namespace std;
  int main(int argc,char *argv[],char*env[])
  {
      cout<<argv[0]<<" begin running"<<endl;
      cout<<"这是命令行参数："<<endl;
      for(int i=0;argv[i];i++)
      {
          cout<<i<<" : "<<argv[i]<<endl;
      }
      cout<<"这是环境变量："<<endl;
      for(int i=0;env[i];i++)
      {
          cout<<i<<" : "<<env[i]<<endl;
      }
      cout<<argv[0]<<" stop running"<<endl;
      return 0;
  }

命令行参数和环境变量都有！

在默认情况下，尽管没有传环境变量，但是子进程自动获取（继承）环境变量。

环境变量也是数据，在地址空间里是有命令行参数和环境变量列表的，

创建子进程的时候，环境变量就已经被子进程继承下去了。

extern char**environ 这个第三方变量直接指向进程的环境变量信息。

这个变量已经被父进程初始化，指向自己的环境变量表了。

这个变量拷贝的时候，也被子进程继承下去了。

不通过传参方式，在程序地址空间里也可以获得环境变量和命令行参数。

因为子进程会继承父进程的地址空间、页表等，所以命令行参数和环境变量就可以被继承。

程序替换只替换了代码和数据，环境变量信息不会被替换！

想给子进程传递环境变量，如何传递？

1.新增环境变量

给父进程导入新的环境变量，就会被子进程继承下去。

在Shell里新增环境变量

环境变量信息不随着进程替换而被替换，只会随着系统一路的被子进程获取。

bash -> mycommand ->otherexe 环境变量具有全局属性。

但如果只想在mycommand父进程中新增环境变量导入传递给otherexe呢？

putenv 添加一个环境变量添加到调用进程的上下文。

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h> 
int main()
  {
      putenv("MY_ENV=6666666666666");
      pid_t id=fork();
      if(id==0)
      {
          printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());           
          char*const myargv[]={
            "otherexe",
              "-a",
              "-b",
              "-c",
              NULL
          };
          execv("./otherexe",myargv);
          printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          exit(0);
      }
      pid_t ret=waitpid(id,NULL,0);
      if(ret>0)
      {
          printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
      }
      return 0;
}

putenv可以导入属于自己和自己的子进程的环境变量。

所以mycommand导入新环境变量与bash（父进程）没关系！

如果非得传

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h> 
int main()
  {
      extern char** environ;
      pid_t id=fork();
      if(id==0)
      {
          printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());           
          execle("./otherexe","otherexe","-a","-w",NULL,environ);
          printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          exit(0);
      }
      pid_t ret=waitpid(id,NULL,0);
      if(ret>0)
      {
          printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
      }
      return 0;
  }

可以把环境变量交给子进程，那也可以把整型、字符串交给子进程。

可以通过子进程继承父进程的数据，并且不修改，这样就是共享的方式去传，

也可以通过环境变量去传。

2.彻底替换

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h> 
int main()
{
      pid_t id=fork();
      if(id==0)
      {          
			printf("before,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          char *const myenv[]={
              "MYVAL=123666",
              "MYlll=567999",
              NULL
          };
          execle("./otherexe","otherexe","-a","-w",NULL,myenv);
          printf("After,I am a process,pid:%d,ppid:%d\n",getpid(),getppid());
          exit(0);
      }
      pid_t ret=waitpid(id,NULL,0);
      if(ret>0)
      {
          printf("wait successfully,father pid:%d,ret:%d\n",getpid(),ret);
      }
      return 0;
  }