What Is a Hash Function? Checksums, Passwords, and Fingerprints

Published 2026-05-19

Hash functions are one of the quiet workhorses of computing. They verify your downloads, identify your git commits, detect duplicate files, and protect stored passwords — all without most people ever noticing. This article explains what a hash function actually does, the differences between the common algorithms, and the single most important thing to understand about them.

What a hash function does

A hash function takes input of any size — a word, a file, a multi-gigabyte disk image — and produces a fixed-length string called a hash or digest. The same input always produces the same output, but the output looks nothing like the input:

hash("hello")  = 5d41402abc4b2a76b9719d911017c592
hash("hellO")  = 8b1a9953c4611296a827abf8c47804d7

Notice how changing a single letter completely transformed the result. This is called the avalanche effect, and it is a defining property of a good hash function: a tiny change in the input produces a wildly different output.

The key idea: hashing is one-way

This is the part people most often get wrong. Hashing is not encryption. Encryption is reversible — with the key, you can recover the original. Hashing is deliberately one-way: there is no key, and there is no function that turns a hash back into the original input. The digest is a fingerprint, not a locked box.

That one-way property is exactly what makes hashing useful for the jobs below. It is also why you must never think of a hash as a way to "hide" data — identical inputs always produce identical hashes, so common values can be looked up in precomputed tables.

What hashes are used for

Verifying downloads. A software vendor publishes the SHA-256 of a file. You hash your copy and compare — if the values match exactly, your download is intact and untampered.
Detecting duplicates. Two files with the same hash are (practically certainly) identical, so deduplication systems compare hashes instead of comparing every byte.
Storing passwords. Servers store the hash of your password, not the password itself. When you log in, they hash what you typed and compare. A breach leaks hashes, not plain passwords — provided the right algorithm was used.
Data integrity. Git identifies every commit by a hash of its contents, which is how it detects corruption and links history together.

MD5, SHA-1, SHA-256: which to use

Not all hash functions are equal, and some are now broken for security purposes:

MD5 — fast but cryptographically broken. Attackers can deliberately create two different inputs with the same MD5 (a "collision"). Never use it for security. It is still fine for non-security tasks like detecting accidental file corruption.
SHA-1 — also broken for security since 2017. Being phased out everywhere.
SHA-256 — part of the SHA-2 family and the current general-purpose standard. Fast, widely supported, and with no known practical collisions. This is the right default.
SHA-512 — a larger sibling of SHA-256, useful when you want an even bigger digest.

A crucial note on password hashing

For passwords specifically, even SHA-256 is not enough on its own — it is too fast, which lets attackers try billions of guesses per second. Real password storage uses deliberately slow algorithms designed for the job, such as bcrypt, scrypt, or Argon2, combined with a unique random "salt" per password. If you are building authentication, use one of those, never a plain hash.

Want to see hashing in action? The generator below computes MD5, SHA-1, SHA-256, and SHA-512 of any text instantly and locally — handy for verifying a checksum or comparing whether two pieces of text are identical.

Related tool: Hash Generator — Generate SHA-1, SHA-256, SHA-384, and SHA-512 hashes from text.