What is this project?

This was my intern interview project. I was asked to make a classification from a large dataset, which was given to me. There were a lot of garbage images in the data, which I had to take care of.

Tools used in this project

cv2, numpy, PIL, matplotlib, haarcascade, etc.

About the project

During my intern interview, I was given an exciting project to work on: making a classification from a large dataset that had been provided to me. At the time I had only a 110 GB SSD on my MacBook Air 2017, which is ok-ish laptop but the SSD storage was not adequate enough. As macOS gradually filld up most of the space and I had only 8 GB of free space, where the data set was more than 10 GB, so I had to dealt with it first. However, the dataset came with its own set of challenges. I quickly realized that there were a lot of garbage images that needed to be dealt with before I could make any progress. But, as an aspiring data scientist, I knew that cleaning the data was a crucial first step in any project. With dedication and attention to detail, I worked tirelessly to remove all the unwanted images from the dataset. It wasn’t an easy task, but it was one that I enjoyed and learned a lot from. Ultimately, I was able to successfully classify the remaining images, and my hard work paid off. This project was a valuable experience that taught me the importance of data cleaning and prepared me for future challenges in the field of data science.