Evaluation

We use the evaluators for the evaluation of our models. Several evaluators can be used for the evaluation. The superclass :py:mod`kitcar_ml.utils.evaluation.evaluator` provides the interface for all evaluators. The evaluators can be called with the detections and groundtruths.

InterpolationEvaluator

The kitcar_ml.utils.evaluation.tutorial demonstrates how to use the evaluators. First, we fake a dataset and the detections to demonstrate the evaluator, with the following two functions. The following code demonstrates how to use an evaluator.

from kitcar_ml.utils.bounding_box import BoundingBox
from kitcar_ml.utils.evaluation.interpolation_evaluator import InterpolationEvaluator


def fake_dataset():
    """Simulate the dataset and create the groundtruth."""

    # Create 10 images with one bounding box each.
    bb = [[BoundingBox(x1=0, y1=0, x2=10, y2=10, class_label="1", confidence=1)]] * 10
    return bb


def fake_prediction():
    """Simulate a model and create predictions."""
    groundtruth = fake_dataset()

    # Only change the confidence
    groundtruth[0] = [BoundingBox(0, 0, 10, 10, "1", 0.8)]
    # Create a large bounding box
    groundtruth[1] = [BoundingBox(0, 0, 16, 14, "1", 0.7)]
    # Create two slightly large bounding boxes.
    groundtruth[2] = [BoundingBox(0, 0, 12, 12, "1", 0.7)]
    groundtruth[3] = [BoundingBox(0, 0, 8, 8, "1", 0.7)]
    return groundtruth


if __name__ == "__main__":
    # Initialize the evaluator.
    iou_thresholds = (0.3, 0.5, 0.8)
    interpolation_evaluator = InterpolationEvaluator(
        iou_thresholds, use_every_point_interpolation=False
    )

    # Load the dataset (for this tutorial we just fake them)
    groundtruths = fake_dataset()
    detections = fake_prediction()

    # Call the evaluator with the groundtruths and the detections.
    interpolation_evaluator(groundtruths, detections)

    # Output a summary of the evaluator.
    print(interpolation_evaluator)

    # Plot the calculated graphics. This can be different for every evaluator.
    interpolation_evaluator.plot_precision_recall_curves()

We initialize the evaluator with custom iou thresholds. The evaluator calculates the evaluation with the groundtruths and detections. Then a summary is printed and looks like this:

--------------
IoU 0.3
mAP: 1.0
--------------
IoU 0.5
mAP: 0.88
--------------
IoU 0.8
mAP: 0.7
--------------

The mean average precision(mAP) is calculated with the true and false positives. A detection is accepted, if the intersection over union ratio is higher than the threshold. That is why the mAP decreases with a higher iou threshold. We can see that the bounding box 1 and 2 have a higher iou than 0.5 and the bounding boxes 3 and 4 have a higher iou than 0.8.

We can also show the calculated interpolation in a plot. That creates these plots:

../_images/plot_05_iou.png ../_images/plot_08_iou.png

An interpolation is calculated for every iou threshold. Therefore this creates a plot for every iou threshold.

An explanation for IoU can be found at :ref: https://en.wikipedia.org/wiki/Jaccard_index

A larger example that generates more bounding boxes and changes the boxes more can be found in kitcar_ml.utils.evaluation.example.