爬取美女图片（修改版）

Posted by By admin · 2022年6月5日

2021-06-09 更新，拳新版本
原帖子地址：https://www.52pojie.cn/thread-1394757-1-1.html
@culprit
与修改版不同，该版本将所有文章链接都取到本地，然后枚举下载，与修改版各有好处，个人觉得好用因此分享给大家

唯美图库的图片质量确实很高，画质拿来当壁纸都没有问题的

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

from bs4 import BeautifulSoup
import requests,re,os
 
headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
        'referer' : 'https://www.vmgirls.com/'
    }
def loadDatas(datas):
    for data in datas:
        url = "https://www.vmgirls.com/" + data
        print(url)
        print('-----------------------------------------------')
        Down_Image(url)
        print('-----------------------------------------------')
 
def Down_Image(url):
    response = requests.get(url, headers=headers).text
    soup = BeautifulSoup(response, 'html.parser')
 
    image_url = soup.find_all('img')
    for data in  image_url:
        image_type = data.get('src').split('.')[-1]
        if image_type == 'jpg' or image_type == 'jpeg' or image_type == 'png':
            url_data = data.get('src')
            # print(url_data)
 
            dir_name = soup.find(class_='post-title h1').string
            if not os.path.exists(dir_name):
                os.mkdir(dir_name)
                # print(dir_name)
            # 解决报错问题
            str_url_data = str(url_data)
            if not re.match(r'^http', str_url_data):
                str_url_data = "https:" + str_url_data
            image = requests.get(str_url_data, headers=headers).content
            file_name = url_data.split('/')[-1]
            # print(file_name)
            with open(dir_name + '/' + file_name, 'wb') as f:
                print('正在写入----->' + dir_name + '/' + file_name)
                f.write(image)
 
 
if __name__ == '__main__':
    print(' ---------------------------------------------------------------------')
    print('|                                                                     |')
    print('|               Author:culprit --- 52pojie                            |')
    print('|               Modified by panpanpan(1277936431) --- 52pojie         |')
    print('|                                                                     |')
    print(' ---------------------------------------------------------------------')
    with open(r'datas.txt') as f:
        content = f.read()
    datas = content.split('\n')
    input('点击开始！')    loadDatas(datas)

*** 有需则取，无需者请勿恶意占用站点资源！ ***

截至6月9号新增的链接，加入或者覆盖再运行就可以了

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html
html

2021-06-09：
这段时间都没时间上论坛，发现代码运行爬下来的都是空白字符，然后重新优化了下代码，现在基本上没有问题了。

觉得好用给个评分支持，我寻思怎么收藏比评分还多

数据+代码链接：链接：https://pan.baidu.com/s/1yGCpMFIi1yDuVs8_7bjgbQ
提取码：52pj

微信截图_20210328212155.png (402.89 KB, 下载次数: 9)