Deep Learning Approach to Binary Image Classification

Binary Image Classification: Binary image classification is the task of categorizing images into one of two classes or categories. The goal is to develop a model that can automatically determine the class of an input image based on its features.

There are many deep learning models and frameworks available for Binary Image classification. I prefer CNN using the PyTorch framework because Convolutional Neural Networks (CNNs) are widely used in image classification tasks due to their ability to capture spatial dependencies in images and extract hierarchical features. PyTorch is a popular deep learning framework that provides efficient tools for building and training CNN models. Here's an outline of how you can use PyTorch to implement binary image classification using a CNN:


There are many datasets available in Kaggle .Here I choose car vs bike dataset. You can access the link at:

Training and testing are its two directories. Our goal is to create a model, train it using training data, and then test its accuracy using test data. The Pytorch framework will be employed. It has a simple API for building and training models, which makes it very well-liked. Let's import the necessary primary libraries and modules now as a first step.

Processing the Dataset:

Before sending the image data to the model, you may frequently need to process it. For instance, our model will fail if every image has a different size (which frequently occurs in big datasets). Therefore, we need to resize them and think about scaling the pixel values as well. In addition to this, we can add data by performing different transforms.

By using the Image Folder class and providing the root directory and transform, we can load the image dataset for further processing, such as training a machine learning model for car vs. bike classification.

Data loaders are commonly used in machine learning frameworks to efficiently load and process data in batches during the training and testing phases.

Batch_size: batch_size specifies the number of samples to include in each batch.

Shuffle: This parameter indicates that the samples in the training set should be shuffled randomly before each epoch. Shuffling the data helps introduce randomness and prevent any potential biases during training.

The shuffle parameter is not specified for the test set. By default, it is set to False, meaning that the samples in the test set will not be shuffled.


Normalization is a common preprocessing technique used in machine learning to scale and standardize the input data. It ensures that the features or variables have similar ranges and distributions, which can help improve the performance of machine learning models.

In the context of car-bike classification, normalization can be applied to the input features (such as image pixel values or numerical attributes) to bring them to a similar scale. This is important because different features can have different scales, and some machine learning algorithms are sensitive to the scale of the input data.

Here we can take an example from our car-bike dataset.

Before normalization:

After Normalization:

Here we use a normalization technique called standardisation. It is performed by subtracting the mean and dividing by the standard deviation. The mean and standard deviation values are provided as arguments to the Normalize function

The specific values used in the mean and standard deviation arguments indicate the mean and standard deviation values for each channel of the image. These values are typically precomputed based on a large dataset or commonly used values. In this case, the values [-0.485/0.229, -0.456/0.224, -0.406/0.225] correspond to the mean values divided by the respective standard deviations for the Red, Green, and Blue channels.

By normalizing the pixel values, the model can focus on learning meaningful patterns in the images rather than being influenced by variations in pixel intensity.

CNN Model:

Here, we built a convolutional neural network (CNN) with two convolutional layers and three fully connected layers. The forward pass starts by applying convolution and ReLU activation, followed by max pooling to reduce spatial dimensions. The output is then flattened and passed through fully connected layers with ReLU activation. Finally, the output is fed through the last fully connected layer and transformed using log SoftMax to obtain a probability distribution over classes. This algorithm captures hierarchical features in the input data through convolution and pooling operations and uses fully connected layers for classification.

Training the Model:

It measures the training and testing accuracy, keeps track of training and testing losses, and calculates the total duration of the training process. The training loop iterates over the dataset, performs forward and backward passes, updates the model's weights, and prints the training progress. After each epoch, it evaluates the model on the testing dataset and records the testing accuracy and loss. The total duration of the training process is calculated using the time module. The algorithm aims to optimize the model's weights to minimize the loss and improve accuracy on both training and testing datasets.

Do Checkout:

The link to our product named AIEnsured offers explainability and many more techniques.

To know more about explainability and AI-related articles please visit this link.


Binary Image Classification in PyTorch | by Marcello Politi | Towards Data Science

Binary Image Classifier using PyTorch - Analytics Vidhya