mbapy.sci_utils.paper_parse
_flatten_pdf_bookmarks -> List[Any]
This function takes a variable number of bookmark lists and returns a flattened list of all bookmarks.
Params
*bookmarks (List[Any])
: A variable number of bookmark lists.
Returns
List[Any]
: A flattened list of all bookmarks.
Notes
None
Example
bookmarks = [
['Chapter 1', 'Section 1.1', 'Section 1.2'],
['Chapter 2', 'Section 2.1', 'Section 2.2'],
['Chapter 3', 'Section 3.1', 'Section 3.2']
]
flattened_bookmarks = _flatten_pdf_bookmarks(*bookmarks)
print(flattened_bookmarks)
# Output: ['Chapter 1', 'Section 1.1', 'Section 1.2', 'Chapter 2', 'Section 2.1', 'Section 2.2', 'Chapter 3', 'Section 3.1', 'Section 3.2']
has_sci_bookmarks
Checks if a PDF document has bookmarks for scientific sections.
Params
pdf_obj
: The PDF object(Being opened!). Defaults to None.pdf_path (str)
: The path to the PDF document. Defaults to None.section_names (list[str])
: A list of section names to check for bookmarks. Defaults to an empty list.
Returns
list[str]
orbool
: list of section names if the PDF has bookmarks, False otherwise.
Notes
None
Example
pdf_path = 'path/to/pdf/document.pdf'
section_names = ['Abstract', 'Introduction', 'Materials', 'Methods', 'Results', 'Discussion', 'References']
result = has_sci_bookmarks(pdf_path, section_names)
print(result)
# Output: ['Abstract', 'Introduction', 'Materials', 'Methods', 'Results', 'Discussion', 'References']
get_sci_bookmarks_from_pdf -> List[str]
Returns a list of section names from a scientific PDF.
Params
- pdf_path (str): The path to the PDF file. Default is None.
- pdf_obj: The PDF object. Default is None.
- section_names (List[str]): A list of section names to search for. If None, all sections include 'Abstract', 'Introduction', 'Materials', 'Methods', 'Results', 'Conclusions, 'Discussion', 'References' will be searched.
Returns
- List[str]: A list of section names found in the PDF.
Notes
None
Example
pdf_path = 'example.pdf'
section_names = ['Abstract', 'Introduction', 'Methods']
result = get_sci_bookmarks_from_pdf(pdf_path, section_names)
print(result)
# Output: ['Abstract', 'Introduction', 'Methods']
get_section_bookmarks -> List[str]
Returns a list of titles of bookmark sections in a PDF.
Params
- pdf_path (str): The path to the PDF file. Defaults to None.
- pdf_obj: The PDF object(Being opened!). Defaults to None.
Returns
- list: A list of titles of bookmark sections in the PDF. Returns None if there are no bookmark sections or if the PDF file does not exist.
Notes
None
Example
pdf_path = 'example.pdf'
result = get_section_bookmarks(pdf_path)
print(result)
# Output: ['Abstract', 'Introduction', 'Methods']
get_english_part_of_bookmarks -> list[str]
Retrieves the English part of the given list of bookmarks.
Params
- bookmarks (list[str]): A list of bookmarks.
Returns
- list[str]: A list containing only the English part of the bookmarks.
Notes
None
Example
bookmarks = ['Introduction', '方法', 'Results', 'Discussion']
result = get_english_part_of_bookmarks(bookmarks)
print(result)
# Output: ['Introduction', 'Results', 'Discussion']
get_section_from_paper -> str
Extracts a section of a science paper by key.
Params
- paper (str): A science paper.
- key (str): One of the sections in the paper. Can be 'Title', 'Authors', 'Abstract', 'Keywords', 'Introduction', 'Materials & Methods', 'Results', 'Discussion', 'References'
- keys (List[str], optional): A list of keys to extract. Defaults to ['Title', 'Authors', 'Abstract', 'Keywords', 'Introduction', 'Materials & Methods', 'Results', 'Discussion', 'References'].
Returns
- str: The extracted section of the paper.
Notes
- The function searches for the specified key in the paper and returns the corresponding section.
- If the key is not found, an error message is returned.
Example
paper = "This is a science paper. It has a Title, Authors, Abstract, Introduction, and References."
section = get_section_from_paper(paper, "Abstract")
print(section)
# Output: "This is the abstract of the paper."
format_paper_from_txt -> dict
Formats a science paper from plain text into a structured dictionary.
Params
- content (str): The content of the paper in plain text.
- struct (List[str], optional): A list of section names in the desired structure. Defaults to ['Title', 'Authors', 'Abstract', 'Keywords', 'Introduction', 'Materials & Methods', 'Results', 'Discussion', 'References'].
Returns
- dict: A dictionary containing the formatted sections of the paper.
Notes
- The function uses the
get_section_from_paper
function to extract each section from the plain text content. - The sections are stored in a dictionary with the section names as keys.
Example
content = "This is a science paper. It has a Title, Authors, Abstract, Introduction, and References."
paper = format_paper_from_txt(content)
print(paper['Abstract'])
# Output: "This is the abstract of the paper."