当前位置: 首页 > article >正文

利用Python filestream实现文件流读

在 Python 中,文件流(filestream)操作通过内置的 open() 函数实现,它提供了对文件的读取、写入、以及流控制的支持。常见的文件模式包括:

  • r:只读模式(默认)。
  • w:写入模式(会覆盖已有内容)。
  • a:追加模式。
  • r+:读写模式。

下面介绍如何使用文件流进行基本的文件操作,以及如何控制文件流读取(如逐行读取、分块读取等)。

在这里插入图片描述

1、问题背景

在编写一个编译器时,需要逐个字符地读取文件中的内容。如果遇到 “/” 后跟另一个 “/”,则将把其余的行视为注释。使用 file.read(1) 每次读取一个字符。但是,如果查找到 “/” 后面跟着不是 “/” 的字符,有没有办法将文件流向后移动一个字符,以免丢失该字符?

以下是相关代码:

def tokenType(self):
    # PAGE 108
    if (self.current == '{' or self.current == '}' or self.current == '(' or self.current == ')' or self.current == '[' or self.current == ']' or self.current == '.' or self.current == ',' or self.current == ';' or self.current == '-' or self.current == '*' or self.current == '/' or self.current == '&' or self.current == '|' or self.current == '<' or self.current == '>' or self.current == '=' or self.current == '~'):
        if (self.current == '/'):
            next = self.file.read(1)
            if (next == '/'):
                while (next != "\n"):
                    next = self.file.read(1)
                return "IGNORE"
            if (next == '*'):
                while (True):
                    next = self.file.read(1)
                    if (next == '*'):
                        next = self.file.read(1)
                        if (next == '/'):
                            break
                return "IGNORE"
            else:
                return "SYMBOL"
        return "SYMBOL"
    elif (self.current == " " or self.current == "\n"):
        return "IGNORE"
    elif (self.current == "'"):
        while(next != "'"):
            self.current = self.current + next
        return "STRING_CONST"
    elif (type(self.current) == int):
        next = self.file.read(1)
        while(next != " "):
            self.current = self.current + next
        return "INT_CONST"
    else:
        next = self.file.read(1)
        while(next != " " and next != ""):
            self.current = self.current + next
            next = self.file.read(1)
        if (self.current == 'class' or self.current == 'constructor' or self.current == 'function' or self.current == 'method' or self.current == 'field' or self.current == 'static' or self.current == 'var' or self.current == 'int' or self.current == 'char' or self.current == 'boolean' or self.current == 'void' or self.current == 'true' or self.current == 'false' or self.current == 'null' or self.current == 'this' or self.current == 'let' or self.current == 'do' or self.current == 'if' or self.current == 'else' or self.current == 'while' or self.current == 'return'):
            return "KEYWORD"
        else:
            return "IDENTIFIER"

My problem seems to be when I have something like 10/5 and my program checks to see if the next character is a "/". Then on the next pass through my character interpreting function, the 5 has already been removed when it was checking for a comment.
So, is there any way I can get a character from a file stream without it being "removed" from the stream or is there a way I can move it back a character when I hit a case like this?

2、解决方案

  • 第一种方法: 使用 file.seek() 函数调整文件流位置

    file.seek() 可以将文件流指针定位到文件中的特定位置。在处理完一个字符后,可以使用 file.seek() 将流指针向前移动一个字符,以便在下次读取时能够读取该字符。

    def tokenType(self):
        # PAGE 108
        if (self.current == '{' or self.current == '}' or self.current == '(' or self.current == ')' or self.current == '[' or self.current == ']' or self.current == '.' or self.current == ',' or self.current == ';' or self.current == '-' or self.current == '*' or self.current == '/' or self.current == '&' or self.current == '|' or self.current == '<' or self.current == '>' or self.current == '=' or self.current == '~'):
            if (self.current == '/'):
                next = self.file.read(1)
                if (next == '/'):
                    while (next != "\n"):
                        next = self.file.read(1)
                    return "IGNORE"
                if (next == '*'):
                    while (True):
                        next = self.file.read(1)
                        if (next == '*'):
                            next = self.file.read(1)
                            if (next == '/'):
                                break
                    return "IGNORE"
                else:
                    self.file.seek(-1, 1)  # 将文件流指针向前移动一个字符
                    return "SYMBOL"
            return "SYMBOL"
        elif (self.current == " " or self.current == "\n"):
            return "IGNORE"
        elif (self.current == "'"):
            while(next != "'"):
                self.current = self.current + next
            return "STRING_CONST"
        elif (type(self.current) == int):
            next = self.file.read(1)
            while(next != " "):
                self.current = self.current + next
            return "INT_CONST"
        else:
            next = self.file.read(1)
            while(next != " " and next != ""):
                self.current = self.current + next
                next = self.file.read(1)
            if (self.current == 'class' or self.current == 'constructor' or self.current == 'function' or self.current == 'method' or self.current == 'field' or self.current == 'static' or self.current == 'var' or self.current == 'int' or self.current == 'char' or self.current == 'boolean' or self.current == 'void' or self.current == 'true' or self.current == 'false' or self.current == 'null' or self.current == 'this' or self.current == 'let' or self.current == 'do' or self.current == 'if' or self.current == 'else' or self.current == 'while' or self.current == 'return'):
                return "KEYWORD"
            else:
                return "IDENTIFIER"
    
    My problem seems to be when I have something like 10/5 and my program checks to see if the next character is a "/". Then on the next pass through my character interpreting function, the 5 has already been removed when it was checking for a comment.
    So, is there any way I can get a character from a file stream without it being "removed" from the stream or is there a way I can move it back a character when I hit a case like this?
    
  • 第二种方法: 使用 Python 的 io.StringIO()

    io.StringIO() 类可以创建一个文件对象,该对象将字符串作为输入。这样,就可以将字符串作为文件流来处理。当需要将文件流指针向前移动时,可以使用 io.StringIO()seek() 方法来调整指针位置。

    import io
    
    def tokenType(self):
        string_io = io.StringIO(self.file.read())  # 将文件内容作为字符串读入
        while True:
            char = string_io.read(1)
            if char == '{' or char == '}' or char == '(' or char == ')' or char == '[' or char == ']' or char == '.' or char == ',' or char == ';' or char == '-' or char == '*' or char == '/' or char == '&' or char == '|' or char == '<' or char == '>' or char == '=' or char == '~':
                if char == '/':
                    next = string_io.read(1)
                    if next == '/':
                        while next != "\n":
                            next = string_io.read(1)
                        return "IGNORE"
                    if next == '*':
                        while True:
                            next = string_io.read(1)
                            if next == '*':
                                next = string_io.read(1)
                                if next == '/':
                                    break
                        return "IGNORE"
                    else:
                        string_io.seek(-1, 1)  # 将文件流指针向前移动一个字符
                        return "SYMBOL"
                return "SYMBOL"
            elif char == " " or char == "\n":
                return
    

总结

  • 按行读取:适用于逐行处理大文件。
  • 分块读取:适用于内存敏感的操作,尤其是处理超大文件时。
  • 文件指针控制:通过 seek()tell() 可以实现随机访问和流控制。
  • 安全文件操作:使用 with 关键字和异常处理可以确保文件安全、正确地被打开和关闭。

这些方法可以帮助你高效地控制和处理文件流,尤其是在处理大文件时,能够大大优化内存使用。


http://www.kler.cn/news/350753.html

相关文章:

  • 在PHP中,读取大文件
  • 日常记录,使用springboot,vue2,easyexcel使实现字段的匹配导入
  • JavaEE----多线程(二)
  • 毕业设计—基于 Inception-ResNet模型的皮肤癌分类系统实现
  • 【SQL 数据库索引优化之 覆盖索引】原理深究剖析,拒绝假大空(VIP专属)
  • 养狗为什么需要宠物空气净化器?宠物空气净化器排行榜公布!
  • Python设计方差分析实验
  • 【AI绘画】Midjourney进阶:留白构图详解
  • Docker 安装 Oracle创建表空间并导入数据库
  • 初尝类型萃取--typename、模板偏特化、和traits之(一)typename
  • 【DBA Part03】国产Linux上Oracle RAC安装-升级-ADG-迁移
  • TEI text-embeddings-inference文本嵌入模型推理框架
  • 【MySQL】表的查询操作——SELECT
  • Redis 数据结构与操作详解:从基本命令到高效缓存策略mget-0707
  • C++之多继承
  • 开通微信小程序需要哪些资料?集师知识付费小程序
  • ROS2 “通信方式” 参数服务器
  • 数据挖掘示例分析
  • 一个检查OpenEuler版本的Python脚本的意义与实现
  • windows安装cuda与cudnn
  • 基于车辆轨迹时空数据的城市热点预测模型研究
  • AI工具推荐合集
  • 用html做跨平台应用程序——千丝冥缘应用——跨平台软件开发
  • 【Next.js 项目实战系列】03-查看 Issue
  • AI学习指南深度学习篇- 预训练模型的原理
  • 大衍数列——考研408考试科目之数据算法——未来之窗学习通