kitcar_ml.utils.data package

Subpackages

Submodules

kitcar_ml.utils.data.analyse_bbox_dataset module

Classes:

AnalyseBBoxDataset(attributes, classes, ...)

class AnalyseBBoxDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: BBoxDataset

Methods:

`statistical_height`([class_names])	Get the average height of all bounding boxes.
`statistical_width`([class_names])	Get the average and standard deviation width of all bounding boxes.
`statistical_aspect_ratio`([class_names])	Get the average and standard deviation aspect ratio of all bounding boxes.
`bboxes_by_class`([class_names])	List bounding boxes.
`draw_class_distribution`([output_path, show])	Draw class distribution.
`scatter_plot`([class_names, output_path, show])	Draw scatter plot of bounding box center points.
`heatmap_plot`([class_names, output_path, show])	Draw heatmap of bounding boxes.
`basic_info`()	Get some basic information about the dataset.
`report`([output_folder])	Collect information in a report.

Attributes:

img_size

Get Image size.

statistical_height(class_names: Optional[Union[str, List[str]]] = None) → Tuple[float, float][source]

Get the average height of all bounding boxes.

Parameters: class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

statistical_width(class_names: Optional[Union[str, List[str]]] = None) → Tuple[float, float][source]

Get the average and standard deviation width of all bounding boxes.

Parameters: class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

statistical_aspect_ratio(class_names: Optional[Union[str, List[str]]] = None) → Tuple[float, float][source]

Get the average and standard deviation aspect ratio of all bounding boxes.

Parameters: class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

property img_size: Tuple[int, int]

Get Image size.

Assumption: All Images have the same size.

bboxes_by_class(class_names: Optional[Union[str, List[str]]] = None) → List[BoundingBox][source]

List bounding boxes.

Parameters: class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

draw_class_distribution(output_path: Optional[str] = None, show: bool = False)[source]

Draw class distribution.

Parameters

output_path – Path for output figure
show – Show the resulting diagram

scatter_plot(class_names: Optional[Union[str, List[str]]] = None, output_path: Optional[str] = None, show: bool = False)[source]

Draw scatter plot of bounding box center points.

Parameters

class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
output_path – Path for output figure
show – Show the resulting diagram

heatmap_plot(class_names: Optional[Union[str, List[str]]] = None, output_path: Optional[str] = None, show: bool = False)[source]

Draw heatmap of bounding boxes.

Parameters

class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
output_path – Path for output figure
show – Show the resulting diagram

basic_info() → str[source]

Get some basic information about the dataset.

Returns: Path, num classes, num bounding boxes, num images, image size
Return type: Information about

report(output_folder: Optional[str] = 'analysis')[source]

Collect information in a report.

Parameters: output_folder – Path to store the report

classes: Dict[int, str]: Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]: Collection of all labels structured as a dictionary.

kitcar_ml.utils.data.bbox_dataset module

Classes:

BBoxDataset(attributes, classes, str] =, ...)

class BBoxDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: LabeledDataset

Methods:

prune_small_boxes(min_area)

Keep only boxes with a minimal area.

Attributes:

prune_small_boxes(min_area: int)[source]: Keep only boxes with a minimal area.

classes: Dict[int, str]: Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]: Collection of all labels structured as a dictionary.

kitcar_ml.utils.data.image_folder module

A modified image folder class.

We modify the official PyTorch image folder (https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py) so that this class can load images from both current directory and its subdirectories.

Functions:

`is_image_file`(filename)	Check if a file is an image.
`find_images`(dir[, max_dataset_size])	Recursively search for images in the given directory.

is_image_file(filename)[source]

Check if a file is an image.

Parameters: filename – the file name to check

find_images(dir: str, max_dataset_size: Optional[int] = None) → List[str][source]

Recursively search for images in the given directory.

Parameters

dir – the directory of the dataset
max_dataset_size – the maximum amount of images to load, None means infinity

kitcar_ml.utils.data.import_kitcar_xml_labels module

Functions:

import_kitcar_xml_labels(input_dir, output_dir)

Convert a XML based Dataset to our LabeledDataset Format.

import_kitcar_xml_labels(input_dir: str, output_dir: str)[source]

Convert a XML based Dataset to our LabeledDataset Format.

See: https://doc.kitcar-team.de/kitcar-machine-learning/tutorials/datasets.html

kitcar_ml.utils.data.import_label_studio_labels module

Functions:

`download_image_from_s3`(s3, bucket, filepath, ...)
`import_label_studio_labels`(annotations_dir, ...)	Convert a Label Studio JSON based Dataset to our LabeledDataset Format.

download_image_from_s3(s3, bucket: str, filepath: str, output_path: str) → <module 'PIL.Image' from '/usr/local/lib/python3.8/dist-packages/PIL/Image.py'>[source]

import_label_studio_labels(annotations_dir: str, output_dir: str, force_download: bool = False)[source]

Convert a Label Studio JSON based Dataset to our LabeledDataset Format.

See: https://doc.kitcar-team.de/kitcar-machine-learning/tutorials/datasets.html

kitcar_ml.utils.data.labeled_dataset module

Classes:

LabeledDataset(attributes, classes, str] =, ...)

Dataset of images with labels.

class LabeledDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: Dataset, SaveOptions, InitOptions

Dataset of images with labels.

Attributes:

`attributes`	Description of what each label means.
`classes`	Description of what the class ids represent.
`labels`	Collection of all labels structured as a dictionary.
`_base_path`	Path to the root of the dataset.
`_cache`	Helper variable for dataset caching.
`available_files`

Methods:

`cache`([disable_tqdm])	Cache this dataset.
`filter_labels`()	Remove labels that have no corresponding image.
`append_label`(key, label)	Add a new label to the dataset.
`save_as_yaml`(file_path)	Save the dataset to a yaml file.
`make_ids_continuous`()	Reformat dataset to have continuous class ids.
`replace_id`(search_id, replace_id)	Replace id (search) with another id (replace) in the whole dataset.
`split`(fractions[, shuffle])	Split this dataset into multiple.
`from_yaml`(file_path)	Load a Labeled Dataset from a yaml file.
`split_file`(file, parts[, shuffle])	Split a dataset file into multiple datasets.
`filter_file`(file)	Filter broken file dependencies of a yaml file.
`adjust_classes`(other_classes[, class_attribute])	Change/add classes and class ids to the dataset.
`prune_classes`(classes)	Keep only labels of given classes.
`adjust_attributes`(other_attributes)	Change the order/add attributes to the dataset.
`merge_datasets`(*datasets)	Merge multiple labeled datasets together into one.

attributes: Optional[Sequence[str]] = None

Description of what each label means.

Similar to headers in a table.

classes: Dict[int, str]: Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]: Collection of all labels structured as a dictionary.

_base_path: Optional[str] = None

Path to the root of the dataset.

Only needs to be set if the dataset is used to load data.

_cache = None: Helper variable for dataset caching.

property available_files: List[str]

cache(disable_tqdm: bool = False)[source]: Cache this dataset.

filter_labels()[source]: Remove labels that have no corresponding image.

append_label(key: str, label: List[Sequence[str]])[source]

Add a new label to the dataset.

A single image (or any abstract object) can have many labels.

save_as_yaml(file_path: str)[source]

Save the dataset to a yaml file. Override the default method to temporarily remove base_path and prevent writing it to the yaml file.

Parameters: file_path – The output file.

make_ids_continuous()[source]: Reformat dataset to have continuous class ids.

replace_id(search_id: int, replace_id: int)[source]

Replace id (search) with another id (replace) in the whole dataset.

Parameters

search_id – The id being searched for.
replace_id – The replacement id that replaces the search ids

split(fractions: List[float], shuffle: bool = True) → List[LabeledDataset][source]: Split this dataset into multiple.

classmethod from_yaml(file_path: str) → LabeledDataset[source]

Load a Labeled Dataset from a yaml file.

Parameters: file_path – The path to the yaml file to load

classmethod split_file(file: str, parts: Dict[str, float], shuffle: bool = True) → List[LabeledDataset][source]

Split a dataset file into multiple datasets.

Parameters

file – The path to the yaml file which gets split
parts – A dict of names and and fractions
shuffle – Split the labels randomly

classmethod filter_file(file: str) → LabeledDataset[source]

Filter broken file dependencies of a yaml file.

Parameters: file – The path to the yaml file to filter

adjust_classes(other_classes: Dict[int, str], class_attribute='class_id')[source]

Change/add classes and class ids to the dataset.

Parameters

other_classes – New mapping of classes.
class_attribute – Optional name of the class attribute.

prune_classes(classes: List[str])[source]: Keep only labels of given classes.

adjust_attributes(other_attributes: List[str])[source]

Change the order/add attributes to the dataset.

Example

>>> from kitcar_ml.utils.data.labeled_dataset import LabeledDataset
>>> ds = LabeledDataset()
>>> ds.attributes = ["class_name", "x", "y"]
>>> ds.append_label("img1", ["cat", 2, 4])
>>> ds.append_label("img2", ["dog", 4, 6])
>>> ds
LabeledDataset(attributes=['class_name', 'x', 'y'], classes={}, labels={'img1': [['cat', 2, 4]], 'img2': [['dog', 4, 6]]}, _base_path=None)
>>> ds.adjust_attributes(["class_name","color", "x", "y"])
>>> ds
LabeledDataset(attributes=['class_name', 'color', 'x', 'y'], classes={}, labels={'img1': [['cat', None, 2, 4]], 'img2': [['dog', None, 4, 6]]}, _base_path=None)

Parameters: other_attributes – New list of attributes.

classmethod merge_datasets(*datasets: LabeledDataset) → LabeledDataset[source]

Merge multiple labeled datasets together into one.

If you have folders with images and multiple yaml files declaring the datasets, this method can merge them into a single dataset.

This doesn’t copy or move any images!

Parameters: datasets – Sequence of datasets that should be merged together.

kitcar_ml.utils.data.unlabeled_dataset module

Classes:

UnlabeledDataset(folder_path, ...)

This dataset class can load a set of unlabeled data.

class UnlabeledDataset(folder_path: ~typing.Union[str, ~typing.List[str]] = <factory>)[source]

Bases: Dataset

This dataset class can load a set of unlabeled data.

Attributes:

folder_path

Path[s] to folders that contain the data.

Methods:

load_file_paths()

List[str]: File paths to all data.

folder_path: Union[str, List[str]]: Path[s] to folders that contain the data.

load_file_paths() → List[str][source]: List[str]: File paths to all data.

kitcar_ml.utils.data.visual_labeled_dataset module

Classes:

VisualLabeledDataset(attributes, classes, ...)

Utilities for visualizing a labeled dataset.

class VisualLabeledDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: LabeledDataset

Utilities for visualizing a labeled dataset.

Methods:

create_debug_images([output_dir, sample_size])

Add rectangles for all labels and images in dataset.

Attributes:

create_debug_images(output_dir: str = 'debug', sample_size: Optional[int] = None)[source]: Add rectangles for all labels and images in dataset.

classes: Dict[int, str]: Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]: Collection of all labels structured as a dictionary.

kitcar_ml.utils.data package

Subpackages

Submodules

kitcar_ml.utils.data.analyse_bbox_dataset module

kitcar_ml.utils.data.bbox_dataset module

kitcar_ml.utils.data.image_folder module

kitcar_ml.utils.data.import_kitcar_xml_labels module

kitcar_ml.utils.data.import_label_studio_labels module

kitcar_ml.utils.data.labeled_dataset module

kitcar_ml.utils.data.unlabeled_dataset module

kitcar_ml.utils.data.visual_labeled_dataset module

Module contents