Data Loader

piqture.data_loader.mnist_data_loader module

This module provides a load_mnist_dataset function that simplifies loading the MNIST dataset for machine learning and deep learning experiments. It supports custom batch sizes, label selection, image resizing, and normalization options.

Overview

The load_mnist_dataset function in this module is designed to streamline the process of loading and preparing the MNIST dataset for image-based machine learning models, especially those involving quantum machine learning or custom image processing workflows.

Features

  • Supports custom image resizing to specified dimensions.

  • Optionally filters specific labels from the MNIST dataset.

  • Integrates custom normalization using MinMaxNormalization.

  • Provides separate training and testing DataLoaders.

Note

Make sure that the torch and torchvision libraries are installed, as these are used internally for dataset handling and transformations.

Function Documentation

`load_mnist_dataset`

Usage Example

Here’s an example of how to use the load_mnist_dataset function to load the MNIST dataset and apply custom configurations:

from piqture.data_loader import mnist_data_loader

# Load MNIST dataset with custom configurations
train_loader, test_loader = mnist_data_loader.load_mnist_dataset(
    img_size=(32, 32),          # Resize images to 32x32
    batch_size=64,              # Set batch size to 64
    labels=[0, 1, 2],           # Include only labels 0, 1, and 2
    normalize_min=0.0,          # Normalize minimum value to 0.0
    normalize_max=1.0           # Normalize maximum value to 1.0
)

# Print some batch information
for images, labels in train_loader:
    print(f"Batch image shape: {images.shape}")
    print(f"Batch labels: {labels}")
    break

Parameters

  • `img_size` (int or tuple[int, int], optional): - Size to which MNIST images will be resized. - If an integer, images will be resized to a square of that size. - If a tuple, it should specify (height, width) for the images. - Default: 28 (images are resized to 28x28 pixels).

  • `batch_size` (int, optional): - Specifies the number of samples per batch for training and testing DataLoaders. - If not specified, the batch size defaults to 1.

  • `labels` (list[int], optional): - A list of integers representing the labels to include in the dataset. - For example, setting labels=[0, 1] will include images of digits 0 and 1 only.

  • `normalize_min` (float, optional): - Minimum value for pixel normalization. - Default: None (no normalization).

  • `normalize_max` (float, optional): - Maximum value for pixel normalization. - Default: None (no normalization).

Returns

  • `Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]`: - A tuple containing:

    • Training DataLoader: A PyTorch DataLoader for training data.

    • Testing DataLoader: A PyTorch DataLoader for testing data.

Dependencies

  • torch: Required for creating PyTorch DataLoaders.

  • torchvision: Required for dataset loading and transformations.

  • piqture.transforms.MinMaxNormalization: Custom normalization transform available in the piqture.transforms module.

Handling Edge Cases

The function performs type checking and validation to ensure that the input parameters are valid:

  • `img_size`: Raises a TypeError if the value is not of type int or tuple[int, int].

  • `batch_size`: Raises a TypeError if the value is not an integer.

  • `labels`: Raises a TypeError if the value is not a list.

Refer to the source code for additional implementation details and advanced configurations.