当前位置：首页 > article >正文

python清除一个月以前的ES索引文档数据

article 2025/1/24 17:52:36

python清除一个月以前的ES索引文档数据

 先查看一下mysql 数据，看一下那一列是日期字段
 看到是 edittime 列

在这里插入图片描述

以下是 python 脚本

vim delete_old_noticeresult.py

import datetime
from elasticsearch import Elasticsearch, RequestError
import logging

# 配置日志
logging.basicConfig(filename='/var/log/es-index/delete_old_noticeresult.log', level=logging.INFO, format='%(asctime)s - %(message)s')

def delete_old_documents():
    # 获取当前日期
    now = datetime.datetime.now()
    
    # 计算一个月前的日期
    two_months_ago = now - datetime.timedelta(days=30)
    
    # 创建 Elasticsearch 连接
    es = Elasticsearch(['http://127.0.0.1:9200'])
    
    # 构造删除请求
    delete_query = {
        "query": {
            "range": {
                "edittime.raw": {  # 字段名称， .raw 精确匹配
                    "lt": two_months_ago.strftime("%Y-%m-%dT%H:%M:%SZ")  # 格式化日期为Elasticsearch支持的格式，包含时分秒
                    #"lt": two_months_ago.strftime("%Y-%m-%d")  # 格式化日期为Elasticsearch支持的格式，
                }
            }
        }
    }
    
    try:
        # 发送删除请求，并等待完成
        response = es.delete_by_query(index='noticeresult', body=delete_query, wait_for_completion=True)
        logging.info("Deleted documents: %s", response.get('deleted', 'Unknown'))
    except RequestError as e:
        logging.error("Error deleting documents: %s", e)

if __name__ == "__main__":
    delete_old_documents()

# 安装 模块
pip install elasticsearch
# 创建存放日志目录
mkdir /var/log/es-index/

1.在删除操作中，由于 edittime 是 text 类型，直接对 edittime 字段进行范围查询可能会导致不准确的结果。应该使用 edittime.raw 字段来进行精确的范围查询。

2.日期格式：确保 edittime.raw 的格式与你的 Elasticsearch 索引中字段的实际格式一致。如果字段没有 .raw 后缀，请移除它。
3.索引名称：确认 noticeresult 是你要删除文档的正确索引名称。
4.时间戳格式：two_months_ago.strftime(“%Y-%m-%dT%H:%M:%SZ”) 是正确的日期格式。
5.日志路径：确保 /var/log/es-index/delete_old_noticeresult.log 路径存在，并且你的脚本有写权限。
6.Elasticsearch 配置：如果 Elasticsearch 需要认证或其他配置，请在 Elasticsearch 实例创建时提供相应参数。
7.异常处理：可以进一步捕获和处理其他可能的异常（如连接失败）。

创建索引命令

PUT /noticeresult
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "htmlStripAnalyzer": {
          "filter": ["lowercase", "classic", "trim"],
          "char_filter": ["html_strip"],
          "type": "custom",
          "tokenizer": "standard"
        },
        "chinese_analyzer": {
          "type": "custom",
          "tokenizer": "ik_max_word"  // 使用 IK 分词器进行中文分词
        }
      },
      "char_filter": {
        "html_strip": {
          "type": "html_strip"
        }
      },
      "tokenizer": {
        "ik_max_word": {
          "type": "ik_max_word"
        }
      }
    }
  },
  "mappings": {
    "dynamic": "true",
    "_source": {
      "excludes": [
        "fujcontent",
        "projdetail"
      ]
    },
    "date_detection": false,
    "numeric_detection": false,  
    "properties": {
      "results_id": { 
	      "type": "integer",
        "fields": {
          "raw": {
            "type": "keyword",
            "null_value": "NULL",
            "ignore_above": 256
          }
        }
	  },
      "notice_num": {
   	    "type": "text", 
        "fields": {
          "raw": {
            "type": "keyword",
            "null_value": "NULL",
            "ignore_above": 256
          }
        }	  
	  },
      "organ": { "type": "text", "analyzer": "htmlStripAnalyzer" },
   	....
   	....
      "editip": { "type": "text", "analyzer": "htmlStripAnalyzer" },  // 使用中文分析器
      "editname": { "type": "keyword" },
      "putip": { "type": "keyword" },
      "edittime": {    
	    "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "null_value": "NULL",
            "ignore_above": 256
          }
        }
      },
  ....
  ....
      "urlhost": { 
       "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "null_value": "null",
            "ignore_above": 256
          }
        }
      },
      "attachment_info": { "type": "integer" }
    }
  }
}

创建索引时查看 edittime 字段的映射，这个字段是 text 类型，并且有一个 raw 子字段，类型是 keyword。
这意味着你可以在查询中使用 edittime.raw 来进行精确匹配查询。
对应上 上方 python 的精确匹配。

查看全文

http://www.kler.cn/a/302218.html

OpenEuler学习笔记（四）：OpenEuler与CentOS的区别在那里？

leetcode 121. 买卖股票的最佳时机

差分轮算法－两个轮子计算速度的方法-阿克曼四轮小车计算方法

微信小程序使用上拉加载onReachBottom。页面拖不动。一直无法触发上拉的事件。

【面试】Java 记录一次面试过程三年工作经验

炸场硅谷，大模型“蒸汽机”迎来“瓦特时刻”

单片机组成原理

C语言——静态链表和动态链表

小红书品牌商家怎么接入三方IM服务商？

STM32（2）基础介绍及新建工程

Ton的编译过程（上）

Vue 文件转base64并获取文件编码格式

Spring 中使用的设计模式全面解析

flink 常见的缩减状态的方式

Java并发编程实战 03 | Java线程状态

python-pptx在PPT中插入各种形状

【Hadoop|HDFS篇】NameNode和SecondaryNameNode

设计模式学习[5]---装饰模式

sqlgun靶场漏洞挖掘

安泰功率放大器有哪些特点呢

Linux从入门到开发实战(C/C++)Day13-线程池

滚雪球学SpringCloud[1.1]：Spring Cloud概述与环境搭建(入门章节)

QT中使用UTF-8编码

Linux echo命令讲解及与重定向符搭配使用方法，tail命令及日志监听方式详解

从戴尔公司中国大饭店DTF大会，看科技外企如何在中国市场发展

Docker快速部署Apache Guacamole

python清除一个月以前的ES索引文档数据

以下是 python 脚本

vim delete_old_noticeresult.py

创建索引命令

相关文章：