This is our project for Introduction to Information Technology.
Our team's members:
Download at https://docs.conda.io/en/latest/miniconda.html
Download at https://code.visualstudio.com/Download/
pip install numpy
pip install matplotlib
pip install opencv-python
Download MNIST database at: http://yann.lecun.com/exdb/mnist/ and DO NOT UNZIP FILES.
The MNIST database contains 60,000 images used to recognise input numbers called train, and 10,000 images used to check if the algorithm is good or bad, called test. Every image has its label, respective to the number written in the image.
4 zips of MNIST database is in data subfolder.
Run test_MNIST.py file to make sure MNIST database is successfully installed and set up.
Step 1: Vectorize all the images of train database and the input img.
Step 2: Find the distance between input img and each img in train.
Step 3: Sort all the distances in increasing order.
Step 4: Choose k smallest value, called k nearest neighbours (KNN). k can be 50, 100, 500, etc. You can choose any value for it.
Step 5: Count and find in k labels which label has the largest frequency. That is the number this algorithm guess.
Run file main.py.
Run by this cmd: python main.py
Use C++ code to increase speed.
Get the lib.hpp and lib.cpp files.
Run these command (I use GNU-GCC):
Or compile them by Visual Studio
You now have a lib.so file. Keep this file and main_optimze.py file in same directory.
If you don't want to edit the library or you don't have a compiler, use mine instead of building by yourself.
Run main_optimize.py file instead of main.py file.
The only difference of these files is main.py runs guess() function in Python, but in main_optimize.py, the guess() function calls the guess_optimize() function written by C++ in lib.so.
Source: Thang Nguyen