What Is a Hash Function? Checksums, Passwords, and Fingerprints

Published 2026-05-19

Hash functions are one of the quiet workhorses of computing. They verify your downloads, identify your git commits, detect duplicate files, and protect stored passwords — all without most people ever noticing. This article explains what a hash function actually does, the differences between the common algorithms, and the single most important thing to understand about them.

What a hash function does

A hash function takes input of any size — a word, a file, a multi-gigabyte disk image — and produces a fixed-length string called a hash or digest. The same input always produces the same output, but the output looks nothing like the input:

hash("hello")  = 5d41402abc4b2a76b9719d911017c592
hash("hellO")  = 8b1a9953c4611296a827abf8c47804d7

Notice how changing a single letter completely transformed the result. This is called the avalanche effect, and it is a defining property of a good hash function: a tiny change in the input produces a wildly different output.

The key idea: hashing is one-way

This is the part people most often get wrong. Hashing is not encryption. Encryption is reversible — with the key, you can recover the original. Hashing is deliberately one-way: there is no key, and there is no function that turns a hash back into the original input. The digest is a fingerprint, not a locked box.

That one-way property is exactly what makes hashing useful for the jobs below. It is also why you must never think of a hash as a way to "hide" data — identical inputs always produce identical hashes, so common values can be looked up in precomputed tables.

What hashes are used for

  • Verifying downloads. A software vendor publishes the SHA-256 of a file. You hash your copy and compare — if the values match exactly, your download is intact and untampered.
  • Detecting duplicates. Two files with the same hash are (practically certainly) identical, so deduplication systems compare hashes instead of comparing every byte.
  • Storing passwords. Servers store the hash of your password, not the password itself. When you log in, they hash what you typed and compare. A breach leaks hashes, not plain passwords — provided the right algorithm was used.
  • Data integrity. Git identifies every commit by a hash of its contents, which is how it detects corruption and links history together.

MD5, SHA-1, SHA-256: which to use

Not all hash functions are equal, and some are now broken for security purposes:

  • MD5 — fast but cryptographically broken. Attackers can deliberately create two different inputs with the same MD5 (a "collision"). Never use it for security. It is still fine for non-security tasks like detecting accidental file corruption.
  • SHA-1 — also broken for security since 2017. Being phased out everywhere.
  • SHA-256 — part of the SHA-2 family and the current general-purpose standard. Fast, widely supported, and with no known practical collisions. This is the right default.
  • SHA-512 — a larger sibling of SHA-256, useful when you want an even bigger digest.

A crucial note on password hashing

For passwords specifically, even SHA-256 is not enough on its own — it is too fast, which lets attackers try billions of guesses per second. Real password storage uses deliberately slow algorithms designed for the job, such as bcrypt, scrypt, or Argon2, combined with a unique random "salt" per password. If you are building authentication, use one of those, never a plain hash.

Want to see hashing in action? The generator below computes MD5, SHA-1, SHA-256, and SHA-512 of any text instantly and locally — handy for verifying a checksum or comparing whether two pieces of text are identical.

Related tool: Hash Generator — Generate SHA-1, SHA-256, SHA-384, and SHA-512 hashes from text.
Copied!