kitcar_ml.utils.data package

Subpackages

Submodules

kitcar_ml.utils.data.analyse_bbox_dataset module

Classes:

AnalyseBBoxDataset(attributes, classes, ...)

class AnalyseBBoxDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: BBoxDataset

Methods:

statistical_height([class_names])

Get the average height of all bounding boxes.

statistical_width([class_names])

Get the average and standard deviation width of all bounding boxes.

statistical_aspect_ratio([class_names])

Get the average and standard deviation aspect ratio of all bounding boxes.

bboxes_by_class([class_names])

List bounding boxes.

draw_class_distribution([output_path, show])

Draw class distribution.

scatter_plot([class_names, output_path, show])

Draw scatter plot of bounding box center points.

heatmap_plot([class_names, output_path, show])

Draw heatmap of bounding boxes.

basic_info()

Get some basic information about the dataset.

report([output_folder])

Collect information in a report.

Attributes:

img_size

Get Image size.

statistical_height(class_names: Optional[Union[str, List[str]]] = None) Tuple[float, float][source]

Get the average height of all bounding boxes.

Parameters

class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

statistical_width(class_names: Optional[Union[str, List[str]]] = None) Tuple[float, float][source]

Get the average and standard deviation width of all bounding boxes.

Parameters

class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

statistical_aspect_ratio(class_names: Optional[Union[str, List[str]]] = None) Tuple[float, float][source]

Get the average and standard deviation aspect ratio of all bounding boxes.

Parameters

class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

property img_size: Tuple[int, int]

Get Image size.

Assumption: All Images have the same size.

bboxes_by_class(class_names: Optional[Union[str, List[str]]] = None) List[BoundingBox][source]

List bounding boxes.

Parameters

class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

draw_class_distribution(output_path: Optional[str] = None, show: bool = False)[source]

Draw class distribution.

Parameters
  • output_path – Path for output figure

  • show – Show the resulting diagram

scatter_plot(class_names: Optional[Union[str, List[str]]] = None, output_path: Optional[str] = None, show: bool = False)[source]

Draw scatter plot of bounding box center points.

Parameters
  • class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

  • output_path – Path for output figure

  • show – Show the resulting diagram

heatmap_plot(class_names: Optional[Union[str, List[str]]] = None, output_path: Optional[str] = None, show: bool = False)[source]

Draw heatmap of bounding boxes.

Parameters
  • class_names – Filter bounding boxes by class names. If None, use all bounding boxes.

  • output_path – Path for output figure

  • show – Show the resulting diagram

basic_info() str[source]

Get some basic information about the dataset.

Returns

Path, num classes, num bounding boxes, num images, image size

Return type

Information about

report(output_folder: Optional[str] = 'analysis')[source]

Collect information in a report.

Parameters

output_folder – Path to store the report

classes: Dict[int, str]

Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]

Collection of all labels structured as a dictionary.

kitcar_ml.utils.data.bbox_dataset module

Classes:

BBoxDataset(attributes, classes, str] =, ...)

class BBoxDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: LabeledDataset

Methods:

prune_small_boxes(min_area)

Keep only boxes with a minimal area.

Attributes:

prune_small_boxes(min_area: int)[source]

Keep only boxes with a minimal area.

classes: Dict[int, str]

Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]

Collection of all labels structured as a dictionary.

kitcar_ml.utils.data.image_folder module

A modified image folder class.

We modify the official PyTorch image folder (https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py) so that this class can load images from both current directory and its subdirectories.

Functions:

is_image_file(filename)

Check if a file is an image.

find_images(dir[, max_dataset_size])

Recursively search for images in the given directory.

is_image_file(filename)[source]

Check if a file is an image.

Parameters

filename – the file name to check

find_images(dir: str, max_dataset_size: Optional[int] = None) List[str][source]

Recursively search for images in the given directory.

Parameters
  • dir – the directory of the dataset

  • max_dataset_size – the maximum amount of images to load, None means infinity

kitcar_ml.utils.data.import_kitcar_xml_labels module

Functions:

import_kitcar_xml_labels(input_dir, output_dir)

Convert a XML based Dataset to our LabeledDataset Format.

import_kitcar_xml_labels(input_dir: str, output_dir: str)[source]

Convert a XML based Dataset to our LabeledDataset Format.

See: https://doc.kitcar-team.de/kitcar-machine-learning/tutorials/datasets.html

kitcar_ml.utils.data.import_label_studio_labels module

Functions:

download_image_from_s3(s3, bucket, filepath, ...)

import_label_studio_labels(annotations_dir, ...)

Convert a Label Studio JSON based Dataset to our LabeledDataset Format.

download_image_from_s3(s3, bucket: str, filepath: str, output_path: str) <module 'PIL.Image' from '/usr/local/lib/python3.8/dist-packages/PIL/Image.py'>[source]
import_label_studio_labels(annotations_dir: str, output_dir: str, force_download: bool = False)[source]

Convert a Label Studio JSON based Dataset to our LabeledDataset Format.

See: https://doc.kitcar-team.de/kitcar-machine-learning/tutorials/datasets.html

kitcar_ml.utils.data.labeled_dataset module

Classes:

LabeledDataset(attributes, classes, str] =, ...)

Dataset of images with labels.

class LabeledDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: Dataset, SaveOptions, InitOptions

Dataset of images with labels.

Attributes:

attributes

Description of what each label means.

classes

Description of what the class ids represent.

labels

Collection of all labels structured as a dictionary.

_base_path

Path to the root of the dataset.

_cache

Helper variable for dataset caching.

available_files

Methods:

cache([disable_tqdm])

Cache this dataset.

filter_labels()

Remove labels that have no corresponding image.

append_label(key, label)

Add a new label to the dataset.

save_as_yaml(file_path)

Save the dataset to a yaml file.

make_ids_continuous()

Reformat dataset to have continuous class ids.

replace_id(search_id, replace_id)

Replace id (search) with another id (replace) in the whole dataset.

split(fractions[, shuffle])

Split this dataset into multiple.

from_yaml(file_path)

Load a Labeled Dataset from a yaml file.

split_file(file, parts[, shuffle])

Split a dataset file into multiple datasets.

filter_file(file)

Filter broken file dependencies of a yaml file.

adjust_classes(other_classes[, class_attribute])

Change/add classes and class ids to the dataset.

prune_classes(classes)

Keep only labels of given classes.

adjust_attributes(other_attributes)

Change the order/add attributes to the dataset.

merge_datasets(*datasets)

Merge multiple labeled datasets together into one.

attributes: Optional[Sequence[str]] = None

Description of what each label means.

Similar to headers in a table.

classes: Dict[int, str]

Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]

Collection of all labels structured as a dictionary.

_base_path: Optional[str] = None

Path to the root of the dataset.

Only needs to be set if the dataset is used to load data.

_cache = None

Helper variable for dataset caching.

property available_files: List[str]
cache(disable_tqdm: bool = False)[source]

Cache this dataset.

filter_labels()[source]

Remove labels that have no corresponding image.

append_label(key: str, label: List[Sequence[str]])[source]

Add a new label to the dataset.

A single image (or any abstract object) can have many labels.

save_as_yaml(file_path: str)[source]

Save the dataset to a yaml file. Override the default method to temporarily remove base_path and prevent writing it to the yaml file.

Parameters

file_path – The output file.

make_ids_continuous()[source]

Reformat dataset to have continuous class ids.

replace_id(search_id: int, replace_id: int)[source]

Replace id (search) with another id (replace) in the whole dataset.

Parameters
  • search_id – The id being searched for.

  • replace_id – The replacement id that replaces the search ids

split(fractions: List[float], shuffle: bool = True) List[LabeledDataset][source]

Split this dataset into multiple.

classmethod from_yaml(file_path: str) LabeledDataset[source]

Load a Labeled Dataset from a yaml file.

Parameters

file_path – The path to the yaml file to load

classmethod split_file(file: str, parts: Dict[str, float], shuffle: bool = True) List[LabeledDataset][source]

Split a dataset file into multiple datasets.

Parameters
  • file – The path to the yaml file which gets split

  • parts – A dict of names and and fractions

  • shuffle – Split the labels randomly

classmethod filter_file(file: str) LabeledDataset[source]

Filter broken file dependencies of a yaml file.

Parameters

file – The path to the yaml file to filter

adjust_classes(other_classes: Dict[int, str], class_attribute='class_id')[source]

Change/add classes and class ids to the dataset.

Parameters
  • other_classes – New mapping of classes.

  • class_attribute – Optional name of the class attribute.

prune_classes(classes: List[str])[source]

Keep only labels of given classes.

adjust_attributes(other_attributes: List[str])[source]

Change the order/add attributes to the dataset.

Example

>>> from kitcar_ml.utils.data.labeled_dataset import LabeledDataset
>>> ds = LabeledDataset()
>>> ds.attributes = ["class_name", "x", "y"]
>>> ds.append_label("img1", ["cat", 2, 4])
>>> ds.append_label("img2", ["dog", 4, 6])
>>> ds
LabeledDataset(attributes=['class_name', 'x', 'y'], classes={}, labels={'img1': [['cat', 2, 4]], 'img2': [['dog', 4, 6]]}, _base_path=None)
>>> ds.adjust_attributes(["class_name","color", "x", "y"])
>>> ds
LabeledDataset(attributes=['class_name', 'color', 'x', 'y'], classes={}, labels={'img1': [['cat', None, 2, 4]], 'img2': [['dog', None, 4, 6]]}, _base_path=None)
Parameters

other_attributes – New list of attributes.

classmethod merge_datasets(*datasets: LabeledDataset) LabeledDataset[source]

Merge multiple labeled datasets together into one.

If you have folders with images and multiple yaml files declaring the datasets, this method can merge them into a single dataset.

This doesn’t copy or move any images!

Parameters

datasets – Sequence of datasets that should be merged together.

kitcar_ml.utils.data.unlabeled_dataset module

Classes:

UnlabeledDataset(folder_path, ...)

This dataset class can load a set of unlabeled data.

class UnlabeledDataset(folder_path: ~typing.Union[str, ~typing.List[str]] = <factory>)[source]

Bases: Dataset

This dataset class can load a set of unlabeled data.

Attributes:

folder_path

Path[s] to folders that contain the data.

Methods:

load_file_paths()

List[str]: File paths to all data.

folder_path: Union[str, List[str]]

Path[s] to folders that contain the data.

load_file_paths() List[str][source]

List[str]: File paths to all data.

kitcar_ml.utils.data.visual_labeled_dataset module

Classes:

VisualLabeledDataset(attributes, classes, ...)

Utilities for visualizing a labeled dataset.

class VisualLabeledDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]

Bases: LabeledDataset

Utilities for visualizing a labeled dataset.

Methods:

create_debug_images([output_dir, sample_size])

Add rectangles for all labels and images in dataset.

Attributes:

create_debug_images(output_dir: str = 'debug', sample_size: Optional[int] = None)[source]

Add rectangles for all labels and images in dataset.

classes: Dict[int, str]

Description of what the class ids represent.

labels: Dict[str, List[Sequence[str]]]

Collection of all labels structured as a dictionary.

Module contents