How to find the md5 hash of a file in python

How to find the md5 hash of a file in python:

MD5 or MD5 checksum is a 32 character hexadecimal value that is computed on a file. Two files can’t have same MD5 values. If the MD5 of two files are same, both files should be same. During transmission of a file, we can pass the MD5 value with the file and that value can be used to check if the file is received correctly or the file got corrupted. Even for a small change in the file, it will create a different checksum.

We can find the md5 hash using a library in python. hashlib is the most popular library used for hashing in python. In this post, we will learn how to find the md5 hash value of a file in python.

Finding the md5 hash of a file:

To find the md5 hash of a file, we need to read the file as bytes. We can use the below method to find the md5 hash of a file:

hashlib.md5(bytes).hexdigest()

Below script will find the md5 hash of a given file:

import hashlib

if __name__ == '__main__':
    file_name = 'inputfile.txt'
    with open(file_name, 'rb') as f:
        bytes = f.read()
        hash_value = hashlib.md5(bytes).hexdigest()
        print(hash_value)

It is finding the md5 of a file inputfile.txt.

You can also try to create another text file in the same folder with the same content as inputfile.txt. It will print the same md5 value.

Finding the md5 hash of a large file:

The above program will work only for small files. But for large files, it will throw error. For that, we need to read the file content in small parts. Below is the complete program that will work with a large file:

import hashlib

MAX_BYTE_SIZE = 1024

if __name__ == '__main__':
    file_name = 'song.mp3'
    hash_value = hashlib.md5()

    with open(file_name, 'rb') as f:
        current_bytes = -1

        while current_bytes != b'':
            current_bytes = f.read(MAX_BYTE_SIZE)
            hash_value.update(current_bytes)

    print('Current hash md5 : {}'.format(hash_value.hexdigest()))

Here,

  • Using a while loop, we are reading the data in chunks and updating the hash value using update method.

You might also like: