mbapy.sci_utils.paper_parse

_flatten_pdf_bookmarks -> List[Any]

This function takes a variable number of bookmark lists and returns a flattened list of all bookmarks.

Params

  • *bookmarks (List[Any]): A variable number of bookmark lists.

Returns

  • List[Any]: A flattened list of all bookmarks.

Notes

None

Example

bookmarks = [
    ['Chapter 1', 'Section 1.1', 'Section 1.2'],
    ['Chapter 2', 'Section 2.1', 'Section 2.2'],
    ['Chapter 3', 'Section 3.1', 'Section 3.2']
]
flattened_bookmarks = _flatten_pdf_bookmarks(*bookmarks)
print(flattened_bookmarks)
# Output: ['Chapter 1', 'Section 1.1', 'Section 1.2', 'Chapter 2', 'Section 2.1', 'Section 2.2', 'Chapter 3', 'Section 3.1', 'Section 3.2']

has_sci_bookmarks

Checks if a PDF document has bookmarks for scientific sections.

Params

  • pdf_obj: The PDF object(Being opened!). Defaults to None.
  • pdf_path (str): The path to the PDF document. Defaults to None.
  • section_names (list[str]): A list of section names to check for bookmarks. Defaults to an empty list.

Returns

  • list[str] or bool: list of section names if the PDF has bookmarks, False otherwise.

Notes

None

Example

pdf_path = 'path/to/pdf/document.pdf'
section_names = ['Abstract', 'Introduction', 'Materials', 'Methods', 'Results', 'Discussion', 'References']
result = has_sci_bookmarks(pdf_path, section_names)
print(result)
# Output: ['Abstract', 'Introduction', 'Materials', 'Methods', 'Results', 'Discussion', 'References']

get_sci_bookmarks_from_pdf -> List[str]

Returns a list of section names from a scientific PDF.

Params

  • pdf_path (str): The path to the PDF file. Default is None.
  • pdf_obj: The PDF object. Default is None.
  • section_names (List[str]): A list of section names to search for. If None, all sections include 'Abstract', 'Introduction', 'Materials', 'Methods', 'Results', 'Conclusions, 'Discussion', 'References' will be searched.

Returns

  • List[str]: A list of section names found in the PDF.

Notes

None

Example

pdf_path = 'example.pdf'
section_names = ['Abstract', 'Introduction', 'Methods']
result = get_sci_bookmarks_from_pdf(pdf_path, section_names)
print(result)
# Output: ['Abstract', 'Introduction', 'Methods']

get_section_bookmarks -> List[str]

Returns a list of titles of bookmark sections in a PDF.

Params

  • pdf_path (str): The path to the PDF file. Defaults to None.
  • pdf_obj: The PDF object(Being opened!). Defaults to None.

Returns

  • list: A list of titles of bookmark sections in the PDF. Returns None if there are no bookmark sections or if the PDF file does not exist.

Notes

None

Example

pdf_path = 'example.pdf'
result = get_section_bookmarks(pdf_path)
print(result)
# Output: ['Abstract', 'Introduction', 'Methods']

get_english_part_of_bookmarks -> list[str]

Retrieves the English part of the given list of bookmarks.

Params

  • bookmarks (list[str]): A list of bookmarks.

Returns

  • list[str]: A list containing only the English part of the bookmarks.

Notes

None

Example

bookmarks = ['Introduction', '方法', 'Results', 'Discussion']
result = get_english_part_of_bookmarks(bookmarks)
print(result)
# Output: ['Introduction', 'Results', 'Discussion']

get_section_from_paper -> str

Extracts a section of a science paper by key.

Params

  • paper (str): A science paper.
  • key (str): One of the sections in the paper. Can be 'Title', 'Authors', 'Abstract', 'Keywords', 'Introduction', 'Materials & Methods', 'Results', 'Discussion', 'References'
  • keys (List[str], optional): A list of keys to extract. Defaults to ['Title', 'Authors', 'Abstract', 'Keywords', 'Introduction', 'Materials & Methods', 'Results', 'Discussion', 'References'].

Returns

  • str: The extracted section of the paper.

Notes

  • The function searches for the specified key in the paper and returns the corresponding section.
  • If the key is not found, an error message is returned.

Example

paper = "This is a science paper. It has a Title, Authors, Abstract, Introduction, and References."
section = get_section_from_paper(paper, "Abstract")
print(section)
# Output: "This is the abstract of the paper."

format_paper_from_txt -> dict

Formats a science paper from plain text into a structured dictionary.

Params

  • content (str): The content of the paper in plain text.
  • struct (List[str], optional): A list of section names in the desired structure. Defaults to ['Title', 'Authors', 'Abstract', 'Keywords', 'Introduction', 'Materials & Methods', 'Results', 'Discussion', 'References'].

Returns

  • dict: A dictionary containing the formatted sections of the paper.

Notes

  • The function uses the get_section_from_paper function to extract each section from the plain text content.
  • The sections are stored in a dictionary with the section names as keys.

Example

content = "This is a science paper. It has a Title, Authors, Abstract, Introduction, and References."
paper = format_paper_from_txt(content)
print(paper['Abstract'])
# Output: "This is the abstract of the paper."