丝域's Archiver

Icewolf001 发表于 2022-1-28 12:39

[2022-01-28] 利用PYTHON爬取ARTOFGLOSS的免费预览图(目前有近万张)

# coding=utf-8

import time
import wget
import requests as req
import re

def get_images(_imgnumber):
    '''
    用于得到图片的地址并下载
    '''
    requestUrl = "http://www.artofgloss.net/preview/displayimage.php?album=lastup&cat=0&pos=" + str(_imgnumber)
    attempts = 0
    success = False
    while attempts < 3 and not success:
        try:
            resp = req.get(requestUrl)
            success = True
        except:
            time.sleep(5)
            attempts += 1
            if attempts == 3:
                print("\n网页获取失败")
                break

    result = re.search('src="(albums/\S*.jpg)"', resp.text)
    if "thumb" in result.group(1):
        result = re.search('src="(albums/\S*.gif)"', resp.text)
    fullhttp = "http://www.artofgloss.net/preview/" + result.group(1)
    newfullhttp = fullhttp.replace("normal_", "")
    print("\n",_imgnumber)
    print(newfullhttp)
    wget.download(newfullhttp, 'G:/temp')
    if ("00.jpg" in newfullhttp):
        collecthttp = newfullhttp.replace("00.jpg", "02.jpg")
        print("\n", collecthttp)
        try:
            wget.download(collecthttp, 'G:/temp')
        except:
            print("\n文件不存在")

if __name__ == "__main__":
    for imgnumber in range(0, 100):
        get_images(imgnumber)

说明:1.文件默认存储在[size=3]G:/temp目录下;2.程序默认下载从0-100编号的图片,你可以自行修改for imgnumber in range(0, 100)这条语句里面的起始编号和结束编号。目前可用的编号大概是6000多。[/size]

页: [1]

Powered by Discuz! Archiver 6.0.0  © 2001-2006 Comsenz Inc.