初识爬虫4
1.理解代理ip,正向代理和反向代理
2.代理ip分类,根据匿名度分类:透明,匿名,高匿
3.防止频繁向同一个域名发送请求被封ip,需使用代理ip
# -*- coding: utf-8 -*-
import requests
url = 'https://www.baidu.com'
proxies = {
'http': 'http://47.122.65.254:8080',
# 'https': 'https://47.122.65.254:8080'
}
response = requests.get(url, proxies=proxies)
print(response.content)
4.CA证书
# -*- coding: utf-8 -*-
import requests
url = 'https://www.baidu.com'
response = requests.get(url, verify=False)
print(response.content)
5.简易爬虫,实现金山翻译的爬取
import requests
# 获取翻译包的url,需要去掉多余的保护壳:
# https://ifanyi.iciba.com/index.php?c=trans&m=fy&client=6&auth_user=key_web_new_fanyi&sign=9X%2BHAviAKqteMMuVvr%2B0X9RriqVIAJSQ%2BxmfU0q7dIE%3D
url = 'https://ifanyi.iciba.com/index.php?c=trans'
# 构建请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
'Referer': 'https://www.iciba.com/',
'Host': 'ifanyi.iciba.com'
}
while True:
# 实现用户输入的功能
content = input('请输入您想翻译的内容(输入"exit"结束程序):')
# 检查是否需要退出
if content.lower() == 'exit':
break
# 构建参数字典
post_data = {
'from': 'auto',
'to': 'auto',
'q': content,
}
# 发送请求
res = requests.post(url, headers=headers, data=post_data)
res_1 = res.content.decode()
# 输出翻译结果
print(eval(res_1)['out'])