kitcar_ml.utils.data package
Subpackages
- kitcar_ml.utils.data.data_loader package
- Subpackages
- Submodules
- kitcar_ml.utils.data.data_loader.base_data_loader module
- kitcar_ml.utils.data.data_loader.bbox_data_loader module
BBoxDataLoaderBBoxDataLoader.super_sampling_factorBBoxDataLoader.complete_transformBBoxDataLoader.prepare_batch()BBoxDataLoader.datasetBBoxDataLoader.batch_sizeBBoxDataLoader.num_workersBBoxDataLoader.pin_memoryBBoxDataLoader.drop_lastBBoxDataLoader.timeoutBBoxDataLoader.samplerBBoxDataLoader.pin_memory_deviceBBoxDataLoader.prefetch_factorBBoxDataLoader._iterator
- kitcar_ml.utils.data.data_loader.example module
- kitcar_ml.utils.data.data_loader.utils module
- Module contents
- kitcar_ml.utils.data.test package
Submodules
kitcar_ml.utils.data.analyse_bbox_dataset module
Classes:
|
- class AnalyseBBoxDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]
Bases:
BBoxDatasetMethods:
statistical_height([class_names])Get the average height of all bounding boxes.
statistical_width([class_names])Get the average and standard deviation width of all bounding boxes.
statistical_aspect_ratio([class_names])Get the average and standard deviation aspect ratio of all bounding boxes.
bboxes_by_class([class_names])List bounding boxes.
draw_class_distribution([output_path, show])Draw class distribution.
scatter_plot([class_names, output_path, show])Draw scatter plot of bounding box center points.
heatmap_plot([class_names, output_path, show])Draw heatmap of bounding boxes.
Get some basic information about the dataset.
report([output_folder])Collect information in a report.
Attributes:
Get Image size.
- statistical_height(class_names: Optional[Union[str, List[str]]] = None) Tuple[float, float][source]
Get the average height of all bounding boxes.
- Parameters
class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
- statistical_width(class_names: Optional[Union[str, List[str]]] = None) Tuple[float, float][source]
Get the average and standard deviation width of all bounding boxes.
- Parameters
class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
- statistical_aspect_ratio(class_names: Optional[Union[str, List[str]]] = None) Tuple[float, float][source]
Get the average and standard deviation aspect ratio of all bounding boxes.
- Parameters
class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
- property img_size: Tuple[int, int]
Get Image size.
Assumption: All Images have the same size.
- bboxes_by_class(class_names: Optional[Union[str, List[str]]] = None) List[BoundingBox][source]
List bounding boxes.
- Parameters
class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
- draw_class_distribution(output_path: Optional[str] = None, show: bool = False)[source]
Draw class distribution.
- Parameters
output_path – Path for output figure
show – Show the resulting diagram
- scatter_plot(class_names: Optional[Union[str, List[str]]] = None, output_path: Optional[str] = None, show: bool = False)[source]
Draw scatter plot of bounding box center points.
- Parameters
class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
output_path – Path for output figure
show – Show the resulting diagram
- heatmap_plot(class_names: Optional[Union[str, List[str]]] = None, output_path: Optional[str] = None, show: bool = False)[source]
Draw heatmap of bounding boxes.
- Parameters
class_names – Filter bounding boxes by class names. If None, use all bounding boxes.
output_path – Path for output figure
show – Show the resulting diagram
- basic_info() str[source]
Get some basic information about the dataset.
- Returns
Path, num classes, num bounding boxes, num images, image size
- Return type
Information about
- report(output_folder: Optional[str] = 'analysis')[source]
Collect information in a report.
- Parameters
output_folder – Path to store the report
- classes: Dict[int, str]
Description of what the class ids represent.
- labels: Dict[str, List[Sequence[str]]]
Collection of all labels structured as a dictionary.
kitcar_ml.utils.data.bbox_dataset module
Classes:
|
- class BBoxDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]
Bases:
LabeledDatasetMethods:
prune_small_boxes(min_area)Keep only boxes with a minimal area.
Attributes:
- classes: Dict[int, str]
Description of what the class ids represent.
- labels: Dict[str, List[Sequence[str]]]
Collection of all labels structured as a dictionary.
kitcar_ml.utils.data.image_folder module
A modified image folder class.
We modify the official PyTorch image folder (https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py) so that this class can load images from both current directory and its subdirectories.
Functions:
|
Check if a file is an image. |
|
Recursively search for images in the given directory. |
kitcar_ml.utils.data.import_kitcar_xml_labels module
Functions:
|
Convert a XML based Dataset to our LabeledDataset Format. |
- import_kitcar_xml_labels(input_dir: str, output_dir: str)[source]
Convert a XML based Dataset to our LabeledDataset Format.
See: https://doc.kitcar-team.de/kitcar-machine-learning/tutorials/datasets.html
kitcar_ml.utils.data.import_label_studio_labels module
Functions:
|
|
|
Convert a Label Studio JSON based Dataset to our LabeledDataset Format. |
- download_image_from_s3(s3, bucket: str, filepath: str, output_path: str) <module 'PIL.Image' from '/usr/local/lib/python3.8/dist-packages/PIL/Image.py'>[source]
- import_label_studio_labels(annotations_dir: str, output_dir: str, force_download: bool = False)[source]
Convert a Label Studio JSON based Dataset to our LabeledDataset Format.
See: https://doc.kitcar-team.de/kitcar-machine-learning/tutorials/datasets.html
kitcar_ml.utils.data.labeled_dataset module
Classes:
|
Dataset of images with labels. |
- class LabeledDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]
Bases:
Dataset,SaveOptions,InitOptionsDataset of images with labels.
Attributes:
Description of what each label means.
Description of what the class ids represent.
Collection of all labels structured as a dictionary.
Path to the root of the dataset.
Helper variable for dataset caching.
Methods:
cache([disable_tqdm])Cache this dataset.
Remove labels that have no corresponding image.
append_label(key, label)Add a new label to the dataset.
save_as_yaml(file_path)Save the dataset to a yaml file.
Reformat dataset to have continuous class ids.
replace_id(search_id, replace_id)Replace id (search) with another id (replace) in the whole dataset.
split(fractions[, shuffle])Split this dataset into multiple.
from_yaml(file_path)Load a Labeled Dataset from a yaml file.
split_file(file, parts[, shuffle])Split a dataset file into multiple datasets.
filter_file(file)Filter broken file dependencies of a yaml file.
adjust_classes(other_classes[, class_attribute])Change/add classes and class ids to the dataset.
prune_classes(classes)Keep only labels of given classes.
adjust_attributes(other_attributes)Change the order/add attributes to the dataset.
merge_datasets(*datasets)Merge multiple labeled datasets together into one.
- attributes: Optional[Sequence[str]] = None
Description of what each label means.
Similar to headers in a table.
- classes: Dict[int, str]
Description of what the class ids represent.
- labels: Dict[str, List[Sequence[str]]]
Collection of all labels structured as a dictionary.
- _base_path: Optional[str] = None
Path to the root of the dataset.
Only needs to be set if the dataset is used to load data.
- _cache = None
Helper variable for dataset caching.
- property available_files: List[str]
- append_label(key: str, label: List[Sequence[str]])[source]
Add a new label to the dataset.
A single image (or any abstract object) can have many labels.
- save_as_yaml(file_path: str)[source]
Save the dataset to a yaml file. Override the default method to temporarily remove base_path and prevent writing it to the yaml file.
- Parameters
file_path – The output file.
- replace_id(search_id: int, replace_id: int)[source]
Replace id (search) with another id (replace) in the whole dataset.
- Parameters
search_id – The id being searched for.
replace_id – The replacement id that replaces the search ids
- split(fractions: List[float], shuffle: bool = True) List[LabeledDataset][source]
Split this dataset into multiple.
- classmethod from_yaml(file_path: str) LabeledDataset[source]
Load a Labeled Dataset from a yaml file.
- Parameters
file_path – The path to the yaml file to load
- classmethod split_file(file: str, parts: Dict[str, float], shuffle: bool = True) List[LabeledDataset][source]
Split a dataset file into multiple datasets.
- Parameters
file – The path to the yaml file which gets split
parts – A dict of names and and fractions
shuffle – Split the labels randomly
- classmethod filter_file(file: str) LabeledDataset[source]
Filter broken file dependencies of a yaml file.
- Parameters
file – The path to the yaml file to filter
- adjust_classes(other_classes: Dict[int, str], class_attribute='class_id')[source]
Change/add classes and class ids to the dataset.
- Parameters
other_classes – New mapping of classes.
class_attribute – Optional name of the class attribute.
- adjust_attributes(other_attributes: List[str])[source]
Change the order/add attributes to the dataset.
Example
>>> from kitcar_ml.utils.data.labeled_dataset import LabeledDataset >>> ds = LabeledDataset() >>> ds.attributes = ["class_name", "x", "y"] >>> ds.append_label("img1", ["cat", 2, 4]) >>> ds.append_label("img2", ["dog", 4, 6]) >>> ds LabeledDataset(attributes=['class_name', 'x', 'y'], classes={}, labels={'img1': [['cat', 2, 4]], 'img2': [['dog', 4, 6]]}, _base_path=None) >>> ds.adjust_attributes(["class_name","color", "x", "y"]) >>> ds LabeledDataset(attributes=['class_name', 'color', 'x', 'y'], classes={}, labels={'img1': [['cat', None, 2, 4]], 'img2': [['dog', None, 4, 6]]}, _base_path=None)
- Parameters
other_attributes – New list of attributes.
- classmethod merge_datasets(*datasets: LabeledDataset) LabeledDataset[source]
Merge multiple labeled datasets together into one.
If you have folders with images and multiple yaml files declaring the datasets, this method can merge them into a single dataset.
This doesn’t copy or move any images!
- Parameters
datasets – Sequence of datasets that should be merged together.
kitcar_ml.utils.data.unlabeled_dataset module
Classes:
|
This dataset class can load a set of unlabeled data. |
- class UnlabeledDataset(folder_path: ~typing.Union[str, ~typing.List[str]] = <factory>)[source]
Bases:
DatasetThis dataset class can load a set of unlabeled data.
Attributes:
Path[s] to folders that contain the data.
Methods:
List[str]: File paths to all data.
- folder_path: Union[str, List[str]]
Path[s] to folders that contain the data.
kitcar_ml.utils.data.visual_labeled_dataset module
Classes:
|
Utilities for visualizing a labeled dataset. |
- class VisualLabeledDataset(attributes: ~typing.Optional[~typing.Sequence[str]] = None, classes: ~typing.Dict[int, str] = <factory>, labels: ~typing.Dict[str, ~typing.List[~typing.Sequence[str]]] = <factory>, _base_path: ~typing.Optional[str] = None)[source]
Bases:
LabeledDatasetUtilities for visualizing a labeled dataset.
Methods:
create_debug_images([output_dir, sample_size])Add rectangles for all labels and images in dataset.
Attributes:
- create_debug_images(output_dir: str = 'debug', sample_size: Optional[int] = None)[source]
Add rectangles for all labels and images in dataset.
- classes: Dict[int, str]
Description of what the class ids represent.
- labels: Dict[str, List[Sequence[str]]]
Collection of all labels structured as a dictionary.