学习Python还是挺有意思的,特别是对于网站爬取成功的时候,总有一种成就感!
特别是在解决网页分析问题的时候,需要有很大的耐心,每次分析都要打印一下,看看数据抓取的对不对。
今天,学习了网页数据的抓取,以及保存成CSV的练习,发现收获良多!
好了,还是放代码,主要交流学习用!(网址还是遮蔽了)[Python] 纯文本查看 复制代码 import requestsimport parsel
import csv
import re
from threading import Thread
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
}
def respon(first_url): # 请求函数
resp = requests.get(url=first_url, headers=headers).text
return resp
def get_url(first_url): # 获取二级链接
resp = respon(first_url)
select = parsel.Selector(resp)
href = select.css('.house_item h1 a::attr(href)').getall()
href_list = []
for item in href:
if 'broker' in item:
pass
elif 'community' in item:
pass
else:
href_list.append(item)
return href_list
def get_data(second_url): # 获取数据
resp = respon(second_url)
select = parsel.Selector(resp)
title = select.css('.house_info h1::attr(title)').get()
info = select.css('.other span::text').getall()
price = select.css('.price_left::text').get()
phone = select.css('.other1 div:nth-child(4) span::text').get()
infor = [title, info[0], info[1], info[2], info[3], info[4], price, phone]
return infor
def save(first_url): # 保存函数
list_url = get_url(first_url)
title = ['名称', '房型', '属性', '平方', '楼层', '装修', '价格', '电话']
with open('二手房.csv', mode='a', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(title)
for link in list_url:
list_data = get_data(link)
writer.writerow(list_data)
def main(): # 主函数
for i in range(1, 11):
# print(i)
url = f'http://www.*****.com/house/second/f_-page_{i}.html'
save(url)
if __name__ == '__main__': # 调试函数
t = Thread(target=main)
t1 = Thread(target=main)
t2 = Thread(target=main)
t3 = Thread(target=main)
t.start()
t1.start()
t2.start()
t3.start()
main() SyntaxHighlighter Copyright 2004-2013 Alex Gorbatchev.
另外,请教大佬,这个多线程不知道对不对,感觉效果不是太大!
|