mbapy.file

This module provides utility functions for file operations, including reading and writing files, working with different file formats, and handling file paths.

Functions

get_paths_with_extension -> List[str]

Returns a list of file paths within a given folder that have a specified extension.

Params

  • folder_path (str): The path of the folder to search for files.
  • file_extensions (List[str]): A list of file extensions to filter the search by.

Returns

  • List[str]: A list of file paths that match the specified file extensions.

Notes

None

Example

folder_path = '/path/to/folder'
file_extensions = ['.txt', '.csv']
file_paths = get_paths_with_extension(folder_path, file_extensions)
print(file_paths)

extract_files_from_dir

Move all files in subdirectories to the root directory and add the subdirectory name as a prefix to the file name.

Params

  • root (str): The root directory path.
  • file_extensions (list[str]): specific file types string (without '.'), if None, means all types.
  • extract_sub_dir (bool, optional): Whether to recursively extract files from subdirectories. If set to False, only files in the immediate subdirectories will be extracted. Defaults to True.
  • join_str (str): string for link prefix and the file name.

Returns

None

Notes

None

Example

root = '/path/to/root'
file_extensions = ['.txt', '.csv']
extract_files_from_dir(root, file_extensions, extract_sub_dir=True, join_str='_')

replace_invalid_path_chr -> str

Replaces any invalid characters in a given path with a specified valid character.

Params

  • path (str): The path string to be checked for invalid characters.
  • valid_chrs (str, optional): The valid characters that will replace any invalid characters in the path. Defaults to '_'.

Returns

  • str: The path string with all invalid characters replaced by the valid character.

Notes

None

Example

path = '/path/with/invalid?characters'
valid_path = replace_invalid_path_chr(path, valid_chrs='_')
print(valid_path)

get_valid_file_path -> str

Returns a valid file path by replacing any invalid characters in the given path with a specified valid character and truncating the path to a specified length.

Params

  • path (str): The path string to be checked for invalid characters.
  • valid_chrs (str, optional): The valid characters that will replace any invalid characters in the path. Defaults to '_'.
  • valid_len (int, optional): The maximum length of the valid file path. Defaults to 250.

Returns

  • str: The valid file path.

Notes

None

Example

path = '/path/with/invalid?characters'
valid_path = get_valid_file_path(path, valid_chrs='_', valid_len=100)
print(valid_path)

opts_file

A function that reads or writes data to a file based on the provided options.

Params

  • path (str): The path to the file.
  • mode (str, optional): The mode in which the file should be opened. Defaults to 'r'.
  • encoding (str, optional): The encoding of the file. Defaults to 'utf-8'.
  • way (str, optional): The way in which the data should be read or written. Defaults to 'lines'.
  • data (Any, optional): The data to be written to the file. Only applicable in write mode. Defaults to None.

Returns

  • list or str or dict or None: The data read from the file, or None if the file was opened in write mode and no data was provided.

Notes

None

Example

path = '/path/to/file.txt'
data = ['line 1', 'line 2', 'line 3']
read_data = opts_file(path, mode='w', data=data)
print(read_data)

read_bits -> bytes

Reads a file in binary mode and returns the content as bytes.

Params

  • path (str): The path to the file.

Returns

  • bytes: The content of the file as bytes.

Notes

None

Example

path = '/path/to/file.bin'
content = read_bits(path)
print(content)

read_text -> str or List[str]

Reads a file in text mode and returns the content as a string or a list of lines.

Params

  • path (str): The path to the file.
  • decode (str, optional): The encoding of the file. Defaults to 'utf-8'.
  • way (str, optional): The way in which the data should be read. Defaults to 'lines'.

Returns

  • str or List[str]: The content of the file as a string or a list of lines.

Notes

None

Example

path = '/path/to/file.txt'
content = read_text(path, decode='utf-8', way='lines')
print(content)

detect_byte_coding(bits:bytes) -> str

Detects the byte coding of a given byte array.

Parameters:
- bits (bytes): The byte array to be analyzed.

Returns:
- str: The detected byte coding of the input sequence.

Example:

detect_byte_coding(b'\xe4\xb8\xad\xe6\x96\x87')

decode_bits_to_str(bits:bytes) -> str

Decodes a bytes object to a string using either GB2312 or utf-8 encoding.

Parameters:
- bits (bytes): The bytes object to decode.

Returns:
- str: The decoded string.

Example:

decode_bits_to_str(b'\xe4\xb8\xad\xe6\x96\x87')

save_json(path:str, obj, encoding:str = 'utf-8', forceUpdate = True) -> None

Saves an object as a JSON file at the specified path.

Parameters:
- path (str): The path where the JSON file will be saved.
- obj: The object to be saved as JSON.
- encoding (str): The encoding of the JSON file. Default is 'utf-8'.
- forceUpdate (bool): Determines whether to overwrite an existing file at the specified path. Default is True.

Returns:
- None

Example:

data = {'name': 'John', 'age': 30}
save_json('data.json', data)

read_json(path:str, encoding:str = 'utf-8', invalidPathReturn = None) -> Union[dict, Any]

Reads a JSON file from the given path and returns the parsed JSON data.

Parameters:
- path (str): The path to the JSON file.
- encoding (str, optional): The encoding of the file. Defaults to 'utf-8'.
- invalidPathReturn (any, optional): The value to return if the path is invalid. Defaults to None.

Returns:
- dict: The parsed JSON data.
- invalidPathReturn (any): The value passed as invalidPathReturn if the path is invalid.

Example:

read_json('data.json')

save_excel(path:str, obj:List[List[str]], columns:List[str], encoding:str = 'utf-8', forceUpdate = True) -> bool

Save a list of lists as an Excel file.

Parameters:
- path (str): The path where the Excel file will be saved.
- obj (List[List[str]]): The list of lists to be saved as an Excel file.
- columns (List[str]): The column names for the Excel file.
- encoding (str, optional): The encoding of the Excel file. Defaults to 'utf-8'.
- forceUpdate (bool, optional): If True, the file will be saved even if it already exists. Defaults to True.

Returns:
- bool: True if the file was successfully saved, False otherwise.

Example:

data = [['Name', 'Age'], ['John', '30'], ['Jane', '25']]
columns = ['Name', 'Age']
save_excel('data.xlsx', data, columns)

read_excel(path:str, sheet_name:str = None, ignore_head:bool = True, ignore_first_col:bool = True, invalid_path_return = None) -> Union[pandas.DataFrame, Any]

Reads an Excel file and returns a pandas DataFrame.

Parameters:
- path (str): The path to the Excel file.
- sheet_name (str, optional): The name of the sheet to read. Defaults to None.
- ignore_head (bool, optional): Whether to ignore the first row (header) of the sheet. Defaults to True.
- ignore_first_col (bool, optional): Whether to ignore the first column of the sheet. Defaults to True.
- invalid_path_return (Any, optional): The value to return if the path is invalid. Defaults to None.

Returns:
- pandas.DataFrame: The DataFrame containing the data from the Excel file.
- invalid_path_return (Any): The value specified if the path is invalid.

Example:

read_excel('data.xlsx')

write_sheets(path:str, sheets:Dict[str, pd.DataFrame]) -> None

Write multiple sheets to an Excel file.

Parameters:
- path (str): The path to the Excel file.
- sheets (Dict[str, pd.DataFrame]): A dictionary mapping sheet names to dataframes.

Returns:
- None

Example:

data1 = pd.DataFrame({'Name': ['John', 'Jane'], 'Age': [30, 25]})
data2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Country': ['USA', 'USA']})
sheets = {'Sheet1': data1, 'Sheet2': data2}
write_sheets('data.xlsx', sheets)

update_excel(path:str, sheets:Dict[str, pd.DataFrame] = None) -> Union[Dict[str, pd.DataFrame], None]

Updates an Excel file with the given path by adding or modifying sheets.

Parameters:
- path (str): The path of the Excel file.
- sheets (Dict[str, pd.DataFrame], optional): A dictionary of sheets to add or modify. The keys are sheet names and the values are pandas DataFrame objects. Defaults to None.

Returns:
- Union[Dict[str, pd.DataFrame], None]: If the Excel file exists and sheets is None, returns a dictionary containing all the sheets in the Excel file. Otherwise, returns None.

Raises:
- None

Example:

data1 = pd.DataFrame({'Name': ['John', 'Jane'], 'Age': [30, 25]})
data2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Country': ['USA', 'USA']})
sheets = {'Sheet1': data1, 'Sheet2': data2}
update_excel('data.xlsx', sheets)

convert_pdf_to_txt(path: str, backend = 'PyPDF2') -> str

Convert a PDF file to a text file.

Parameters:
- path: The path to the PDF file.
- backend: The backend library to use for PDF conversion. Defaults to 'PyPDF2'.

Returns:
- The extracted text from the PDF file as a string.

Raises:
- NotImplementedError: If the specified backend is not supported.

Example:

convert_pdf_to_txt('document.pdf')

is_jsonable -> bool

This function checks if the given data is JSON serializable.

Params

  • data (any): The data to be checked.

Returns

  • bool: True if the data is JSON serializable, False otherwise.

Notes

  • The function checks if the data is of type str, int, float, bool, or None. These types are JSON serializable.
  • If the data is a mapping (e.g. dict), the function recursively checks if all values in the mapping are JSON serializable.
  • If the data is a sequence (e.g. list, tuple), the function recursively checks if all items in the sequence are JSON serializable.
  • If the data is of any other type, it is not JSON serializable.

Example

data1 = "Hello"
print(is_jsonable(data1))  # Output: True

data2 = {"name": "John", "age": 30}
print(is_jsonable(data2))  # Output: True

data3 = [1, 2, 3, {"name": "John"}]
print(is_jsonable(data3))  # Output: True

data4 = {"name": "John", "age": datetime.datetime.now()}
print(is_jsonable(data4))  # Output: False

convert_pdf_to_txt -> str

Convert a PDF file to a text file.

Params

  • path: The path to the PDF file.
  • backend: The backend library to use for PDF conversion.
    • 'PyPDF2' is the default.
    • 'pdfminer'.

Returns

The extracted text from the PDF file as a string.

Raises

  • NotImplementedError: If the specified backend is not supported.

Example

text = convert_pdf_to_txt('path/to/pdf/file.pdf')
print(text)
text = convert_pdf_to_txt('path/to/pdf/file.pdf', backend='pdfminer')
print(text)