mbapy.sci_utils.paper_download
_get_available_scihub_urls -> list
Finds available scihub urls via http://tool.yovisun.com/scihub/
Params
- proxies (dict, optional): A dictionary of proxies to be used for the HTTP request. Defaults to None.
Returns
- list: A list of available SciHub URLs.
Notes
- This function sends a GET request to 'http://tool.yovisun.com/scihub/' to find available SciHub URLs.
Example
available_urls = _get_available_scihub_urls(proxies={'http': 'http://proxy.example.com:8080'})
_update_available_scihub_urls -> list
Updates the list of available SciHub URLs.
Returns
- list: The updated list of available SciHub URLs.
Notes
- This function updates the global variable
available_scihub_urls
by calling the_get_available_scihub_urls()
function ifavailable_scihub_urls
is None. Otherwise, it returns the current value ofavailable_scihub_urls
.
Example
updated_urls = _update_available_scihub_urls()
get_clean_doi -> str
Cleans and returns the DOI string.
Params
- doi (str): The DOI string to be cleaned.
Returns
- str: The cleaned DOI string.
Example
cleaned_doi = get_clean_doi('10.123/abc.456')
_get_scihub_valid_download_link -> str
Generates the valid Sci-Hub download link for the given input link.
Params
- link (str): The input link for which the valid Sci-Hub download link needs to be generated.
Returns
- str: The valid Sci-Hub download link.
Example
valid_link = _get_scihub_valid_download_link('http://example.com/paper.pdf')
_download_from_scihub_webpage -> dict
Downloads a file from the SciHub webpage.
Params
- webpage (requests.Response): The response object of the SciHub webpage.
- proxies (dict, optional): The proxies to be used for the request. Defaults to None.
- try_times (int): The number of times to attempt the download.
Returns
- dict: A dictionary containing the title, DOI, and the response object of the download request.
Notes
- This function attempts to download a file from the SciHub webpage and returns the result as a dictionary.
Example
download_result = _download_from_scihub_webpage(webpage, proxies={'http': 'http://proxy.example.com:8080'}, try_times=3)
download_from_scihub_by_doi -> dict or None
Downloads a file from the Sci-Hub database using the DOI.
Params
- doi (str): The DOI of the file to download.
- proxies (dict): A dictionary of proxies to use for the request.
- try_times (int): The number of times to attempt the download.
Returns
- dict or None: A dictionary containing the title, DOI, and the response object of the download request. If meets error, returns None.
Raises
- Exception: If the DOI does not exist or if there is an error fetching the file from Sci-Hub.
Example
download_result = download_from_scihub_by_doi('10.123/abc.456', proxies={'http': 'http://proxy.example.com:8080'}, try_times=3)
download_from_scihub_by_title -> dict or None
Downloads a document from Scihub by title.
Params
- title (str): The title of the document to be downloaded.
- proxies (dict, optional): A dictionary of proxies to be used for the HTTP request.
- try_times (int): The number of times to attempt the download.
Returns
- dict or None: A dictionary containing the title, DOI, and the response object of the download request. If meets error, returns None.
Raises
- Exception: If the document with the given title does not exist on Scihub.
Example
download_result = download_from_scihub_by_title('Sample Paper Title', proxies={'http': 'http://proxy.example.com:8080'}, try_times=3)
download_by_scihub -> dict or None
Download a paper from Sci-Hub using its DOI.
Params
- dir (str): The directory where the downloaded file will be saved.
- doi (str): The DOI (Digital Object Identifier) of the paper.
- title (str): The title of the document to be downloaded.
- file_full_name (str, optional): The name of the downloaded file, include the file extension(.pdf). Defaults to None.
- use_title_as_name (bool, optional): Whether to use the paper's title as the file name. Defaults to True.
- valid_path_chr (str, optional): The character used to replace invalid characters in the file name. Defaults to '_'.
- try_times (int): The number of times to attempt the download.
Returns
- dict or None: If successful, returns a dictionary containing information about the downloaded paper. If unsuccessful, returns None.
Notes
- If doi is None and can't get doi from sci-hub webpage, doi will be set as %Y%m%d.%H%M%S.
Example
download_result = download_by_scihub(dir='path/to/save', doi='10.123/abc.456', try_times=3)