Kimi
generated.
duitang
Introduction
This Python script is designed to download images from the Duitang website based on a user-provided search query. It uses the argparse
library to parse command line arguments, BeautifulSoup
for parsing HTML content, selenium
for browser automation, and tqdm
for progress display.
Parameters
-q
,--query
: The search query string to use on Duitang. If the query contains spaces, it should be enclosed in quotation marks. Defaults to an empty string.-n
,--num-pages
: The number of pages from which to download images. Defaults to 5.-o
,--output
: The directory path where the downloaded images will be saved. Defaults to the current directory.-t
,--type
: The file type of the image files to download. Can be either 'jpg' or 'avif'. Defaults to 'jpg'.-g
,-gui
: A flag to indicate whether to use Selenium's GUI mode for downloading images. Defaults to False (headless mode).-u
,--undetected-chromedriver
: A flag to indicate whether to use an undetected ChromeDriver with Selenium. This can help avoid detection by certain websites. Defaults to False.
Behavior
- The script sets up command line argument parsing and defines the expected arguments.
- It initializes a Selenium
Browser
object with the specified options, including whether to run in headless mode and whether to use an undetected ChromeDriver. - The script uses the Duitang base URL to start a search based on the provided query.
- It scrolls through each page, extracts image URLs, and initiates downloads for each image.
- The script uses a coroutine pool to asynchronously download images, which can improve download speed.
- It saves a record of downloaded URLs to avoid re-downloading the same images in future runs.
Notes
- The script requires the
selenium
,beautifulsoup4
,tqdm
, andpillow_heif
(for AVIF support) libraries to be installed. - The
Configs.web.request_header
is used as the User-Agent string for HTTP requests, which should be defined within thembapy
library or the script's configuration. - The script includes error handling and logging functionality through the
put_err
function from thembapy
library. - The
clean_path
function is used to sanitize the output directory path. - The
show_args
function displays the provided arguments in a formatted manner.
Examples
To run the script with a search query and download 10 pages of images in AVIF format to a specified directory:
mbapy-cli duitang -q "search term" -n 10 -o "/path/to/output" -t avif
To run the script in GUI mode with headless disabled:
mbapy-cli duitang -q "search term" -g
To run the script with an undetected ChromeDriver:
mbapy-cli duitang -q "search term" -u