博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python爬虫实例
阅读量:4616 次
发布时间:2019-06-09

本文共 2635 字,大约阅读时间需要 8 分钟。

import reimport requestsfrom bs4 import BeautifulSoup# 主方法def main():    # 给请求指定一个请求头来模拟chrome浏览器    headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'} page_max = 100 for i in range(1, int(page_max) + 1): if i == 1: house = 'https://cc.lianjia.com/ershoufang/erdaoqu/' else: house = 'https://cc.lianjia.com/ershoufang/erdaoqu/pg'+str(i) res = requests.get(house, headers=headers) soup = BeautifulSoup(res.text, 'html.parser') li_max = soup.find('ul', class_='sellListContent').find_all('li') for li in li_max: try: house_param = {} content = li.find('div', class_='houseInfo').text content = content.split("|") house_param['housing_estate'] = content[0] house_param['square_metre'] = re.findall(r'-?\d+\.?\d*e?-?\d*?', content[2])[0] # --------------------------------------------------------# position = li.find('div', class_='positionInfo').find('a').text house_param['position'] = position # --------------------------------------------------------# totalprice = li.find('div', class_='totalPrice').text house_param['total_price'] = re.sub("\D", "", totalprice) unitprice = li.find('div', class_='unitPrice').text house_param['unit_price'] = re.sub("\D", "", unitprice) # --------------------------------------------------------# follow = li.find('div', class_='followInfo').text follow = follow.split("/") house_param['follow'] = re.sub("\D", "", follow[0]) house_param['take_look'] = re.sub("\D", "", follow[1]) # --------------------------------------------------------# title_src = li.find('div', class_='title').find('a').attrs['href'] house_param['url'] = re.sub("\D", "", title_src) res = requests.get(title_src, headers=headers) soup = BeautifulSoup(res.text, 'html.parser') # --------------------------------------------------------# pub_date = soup.find('div', class_='transaction').find_all('li')[0].find_all('span')[1].text house_param['pub_date'] = pub_date print(house_param) except Exception as e: print(e)if __name__ == '__main__': main()

 

转载于:https://www.cnblogs.com/kgdxpr/p/10072285.html

你可能感兴趣的文章
python 中的socket
查看>>
ASP.NET + VB.NET + SQL小网站程序
查看>>
Windows Media Player 键盘快捷键
查看>>
C++代码统计工具
查看>>
需求分析报告
查看>>
第四次作业
查看>>
多线程2:java.util.concurrent.atomic.*
查看>>
Linux下使用pv监控进度
查看>>
MySQL(MariaDB)默认密码和修改方法
查看>>
用jQuery File Upload实现简单的文件上传
查看>>
Luogu P4901 排队 fib数列+树状数组+倍增
查看>>
PHP 锁机制
查看>>
每天CookBook之Python-036
查看>>
Django 之 cookie & session
查看>>
反转字符串
查看>>
CRM客户关系管理系统(十二)
查看>>
洛谷P2776 [SDOI2007]小组队列 链表 + 模拟
查看>>
ORA-39006错误原因及解决办法
查看>>
linux常用目录与作用
查看>>
PHP 后台定时循环刷新某个页面 屏蔽apache意外停止
查看>>