Python爬取制服买家秀小姐姐
这个网站大概是1个月之前在水漫金山某位大神发出来的,当天我就写了爬虫,今天没事又去爬一下看更新了没
,发现是空的,网站内容更改了。
所以刚刚又重新用scrapy写了一个整站爬虫,但还是不发出来,省的各位把网站给爬死了
。
复制出来改成单分类爬虫,剩下的想爬取,自己更改!!!

[Python] 纯文本查看 复制代码
|
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
# from ip_proxy import ipsimport requests, os, re, randomfrom lxml import etree# ip_add = random.choice(ips())if not os.path.exists('./zhifu'): os.mkdir('./zhifu')headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}for i in range(1,4): url = 'https://www.ikmjx.com/index.php?g=portal&m=list&a=index&id=3&p=' + str(i) r = requests.get(url=url, headers=headers).text tree = etree.HTML(r) div_list = tree.xpath('/html/body/main/div/div[2]/div')[1:-1] for li in div_list: a = 0 src = 'https://www.ikmjx.com' + li.xpath('./div[2]/a/@href')[0] titles = li.xpath('./div[2]/a/@title')[0] title = titles.replace('?','') req = requests.get(url=src, headers=headers).text tree1 = etree.HTML(req) div1_list = tree1.xpath('/html/body/main/div/div/div/div[3]/p[2]') for p in div1_list: src_path = p.xpath('./img/@src') # print(src_path) for img in src_path: a = a+1 img_data = requests.get(url=img, headers=headers).content img_path = './zhifu/' + title + '_' + str(a) + '.jpg' with open(img_path, 'wb') as fp: fp.write(img_data) # print(img_data, '下载完成!!!') |