Data Loader

piqture.data_loader.mnist_data_loader module

This module provides a load_mnist_dataset function that simplifies loading the MNIST dataset for machine learning and deep learning experiments. It supports custom batch sizes, label selection, image resizing, and normalization options.

Data Loader for MNIST images

piqture.data_loader.mnist_data_loader.collate_fn(batch, labels: list, new_batch: list)

Batches the images wrt the provided labels.

piqture.data_loader.mnist_data_loader.load_mnist_dataset(img_size: int | tuple[int, int] = 28, batch_size: int = None, labels: list = None, normalize_min: float = None, normalize_max: float = None)

Loads MNIST dataset from PyTorch using DataLoader.

Args:

img_size (int or tuple[int, int], optional): Size to which images will be resized. Defaults to 28.

If integer, images will be resized to a square of that size. If tuple, images will be resized to specified height and width.

batch_size (int, optional): Batch size for the dataset. labels (list): List of desired labels. normalize_min (float, optional): Minimum value for normalization. normalize_max (float, optional): Maximum value for normalization.

Returns:

Train and Test DataLoader objects.

Overview

The load_mnist_dataset function in this module is designed to streamline the process of loading and preparing the MNIST dataset for image-based machine learning models, especially those involving quantum machine learning or custom image processing workflows.

Features

  • Supports custom image resizing to specified dimensions.

  • Optionally filters specific labels from the MNIST dataset.

  • Integrates custom normalization using MinMaxNormalization.

  • Provides separate training and testing DataLoaders.

Note

Make sure that the torch and torchvision libraries are installed, as these are used internally for dataset handling and transformations.

Function Documentation

`load_mnist_dataset`

piqture.data_loader.mnist_data_loader.load_mnist_dataset(img_size: int | tuple[int, int] = 28, batch_size: int = None, labels: list = None, normalize_min: float = None, normalize_max: float = None)

Loads MNIST dataset from PyTorch using DataLoader.

Args:

img_size (int or tuple[int, int], optional): Size to which images will be resized. Defaults to 28.

If integer, images will be resized to a square of that size. If tuple, images will be resized to specified height and width.

batch_size (int, optional): Batch size for the dataset. labels (list): List of desired labels. normalize_min (float, optional): Minimum value for normalization. normalize_max (float, optional): Maximum value for normalization.

Returns:

Train and Test DataLoader objects.

Usage Example

Here’s an example of how to use the load_mnist_dataset function to load the MNIST dataset and apply custom configurations:

from piqture.data_loader import mnist_data_loader

# Load MNIST dataset with custom configurations
train_loader, test_loader = mnist_data_loader.load_mnist_dataset(
    img_size=(32, 32),          # Resize images to 32x32
    batch_size=64,              # Set batch size to 64
    labels=[0, 1, 2],           # Include only labels 0, 1, and 2
    normalize_min=0.0,          # Normalize minimum value to 0.0
    normalize_max=1.0           # Normalize maximum value to 1.0
)

# Print some batch information
for images, labels in train_loader:
    print(f"Batch image shape: {images.shape}")
    print(f"Batch labels: {labels}")
    break

Parameters

  • `img_size` (int or tuple[int, int], optional): - Size to which MNIST images will be resized. - If an integer, images will be resized to a square of that size. - If a tuple, it should specify (height, width) for the images. - Default: 28 (images are resized to 28x28 pixels).

  • `batch_size` (int, optional): - Specifies the number of samples per batch for training and testing DataLoaders. - If not specified, the batch size defaults to 1.

  • `labels` (list[int], optional): - A list of integers representing the labels to include in the dataset. - For example, setting labels=[0, 1] will include images of digits 0 and 1 only.

  • `normalize_min` (float, optional): - Minimum value for pixel normalization. - Default: None (no normalization).

  • `normalize_max` (float, optional): - Maximum value for pixel normalization. - Default: None (no normalization).

Returns

  • `Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]`: - A tuple containing:

    • Training DataLoader: A PyTorch DataLoader for training data.

    • Testing DataLoader: A PyTorch DataLoader for testing data.

Dependencies

  • torch: Required for creating PyTorch DataLoaders.

  • torchvision: Required for dataset loading and transformations.

  • piqture.transforms.MinMaxNormalization: Custom normalization transform available in the piqture.transforms module.

Handling Edge Cases

The function performs type checking and validation to ensure that the input parameters are valid:

  • `img_size`: Raises a TypeError if the value is not of type int or tuple[int, int].

  • `batch_size`: Raises a TypeError if the value is not an integer.

  • `labels`: Raises a TypeError if the value is not a list.

Refer to the source code for additional implementation details and advanced configurations.