当前位置: 首页 > article >正文

python爬虫题目

网站
https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/

第一道题爬取api并且保存

import requests,re
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers).json()
with open('1.json','w') as f:
    f.write(json.dumps(res,ensure_ascii=False))

第二道爬取所有图片

from urllib.parse import urljoin
import requests,re
from urllib.parse import urlparse
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers).json()
list1 = res['articles']
list2=[]
for i in list1:
    list2.append(i['image'])
base_url ="https://"+urlparse(url).netloc

for image in list2:
    image_url = urljoin(base_url,image)
    img = requests.get(image_url).content
    img_name = image.split("/")[-1]
    with open(img_name,'wb') as f:
        f.write(img)

第三道 爬取题目和摘要

import requests,csv
from lxml import etree
with open("data.csv","w",newline='',encoding='gbk') as f:
    writer = csv.writer(f)
    writer.writerow(["题目","再要"])
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/article/list/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers)
html = etree.HTML(res.text)
wen_zhang = html.xpath('//div[@class="lab-block"]//a//@href')
with open("data.csv","w",newline='',encoding='gbk') as f:
    writer = csv.writer(f)
    writer.writerow(["ti","zai"])



for i in wen_zhang:
    url_l = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/"+i
    result = requests.get(url_l,headers=headers)
    select = etree.HTML(result.text)
    timu = select.xpath('//h2/text()')[0]
    zaiyao = select.xpath('//p//text()')
    result = "".join(zaiyao)
    with open("data.csv", "a", newline='',encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow([timu, result])


http://www.kler.cn/news/341194.html

相关文章:

  • Pymysql cur.fetchall() 返回 None
  • 哪个编程工具让你的工作效率翻倍?
  • 前端页面模块修改成可动态生成数据模块——大部分数据为GPT生成(仅供学习参考)
  • Oceanbase学习之—手工搭建oceanbaes测试
  • Pikachu-目录遍历
  • Mac上功能全面,免费好用的解压缩工具
  • Elasticsearch Suggester
  • 算法:位运算
  • 【C语言】数组(下)
  • 【CSS】让元素消失的方式
  • ceph基础
  • C++入门基础知识106—【关于C++continue 语句】
  • 操作系统中的并发控制——使用条件变量同步
  • linux-二进制工具
  • 深度学习基础—交并比与非极大值抑制
  • Java如何对接云闪付支付接口及示例
  • 基于开元鸿蒙(OpenHarmony)的【智能药房与药品管理综合应用系统】
  • 无感升级有三种常见的可行性方案:蓝绿部署、灰度发布、和滚动更新
  • Python Django ORM 的工作原理
  • arcgis for js点位渲染与实际坐标不一致且popupTemplate偏移