How Bitcoin Works: Hashing

CertiK | Sept 12, 2019

Article's Poster

Series written by Maxwell Foley, Software Engineer at CertiK

A hashing function is a function that can only be calculated in one direction — this singular flow is what gives the Blockchain world both clarity and safety.

This means that if we are given an input to the function, we can calculate the output, but given the output, there is no possible way to work backwards and figure out the input. (In that sense, it’s like getting a public key from a private key.)

We won’t go into the math behind this or how it works right now. All that’s necessary is that you understand what it does. Again, just think of it like magic, even though it’s math.

Furthermore, we can choose what range we want the output of the hashing function to take up. For instance we could use all the numbers from 0 to 9 (so only single digit numbers), 0 to 99, 0 to one million, or even something arbitrary like 0 to 894.

To show how this works, let’s imagine that our hash function has the range of 0 to 9. Anything we put in will give one of those numbers: if we give it an emoji, it might give us a 5. If we put in 7859, it might give back a 3. If we entered the entire text of Moby Dick, and it might give us back the number 7.

It will always give us a number — but we don’t know why.

Now imagine we feed it the entire text of Moby Dick, but with one extra word added. You may think that since this input is so similar to the last one we put in, which returned the number 7, this one is quite likely to return a 7 as well — but you would be wrong.

Despite the inputs being so similar, there is no predictable correlation in the outputs. That absolute cloud of unknowing is what prevents hacks — knowing the output tells you nothing about what the input might have been.

Outside of cryptocurrency, hash functions are used for storing passwords on centralized web servers. But web databases get hacked fairly often, and if the hacker was able to just read everybody’s password after a successful hack, this would be bad news for users.

To make things safer for users, almost all modern websites will hash the users’ passwords before storing them in the database. Therefore there is no way a hacker could figure out what the users’ passwords were just from reading the database, since this would mean going from output to input of a hash function — a deeply complex code that can’t be backwards engineered, meaning what the hacker reads looks just like a bunch of random numbers.

But, every time the user logs in, the server can properly verify that they are giving the true password — the server takes the exact password and can hash it once more and see that it matches the already-hashed result that is stored on the database.

In that way, the password itself is the key — but getting the “hashed” password is useless for a hacker as it can’t go backwards.

The Hash Guessing Game

On Bitcoin, we choose to reward the rights to publish a block to the miner who can run the block they are proposing through a hash function and have the output be a number in a certain range.

The miner then has to format the data they are publishing in an orderly, formatted way — but there is a spot where they are allowed to put a small amount of useless junk data (called a “nonce.”) This is like if we were hosting a competition where we tell people to add one or two random words of gobbledegook to the text of Moby Dick and run it through a hash function to try to match its output to some specific magic number.

Essentially, we are trying to go in reverse — given a hash output, the miner needs to find the proper input.

But wait! Isn’t that impossible?

Sort of: what’s impossible is to somehow use a method to deduce the input from the output. What is possible is to guess it randomly.

Let’s go back to our example of a hash function that spits out numbers between 0 and 9. Imagine that we are playing a Moby Dick Hashing Challenge where you can get a reward if your edit of Moby Dick with a few words of nonsense hashes to 3. This will be very easy, because someone’s chance of getting 3 from any given input is 1 out of 10. They will only have to try an average of five different inputs to win the prize.

A standard CPU on a MacBook going as fast as possible can plug 88,000 guesses into the hash function per second, so it will actually only take a fraction of a fraction of a fraction of a second to get the answer.

If, say, there were fifty people all using MacBooks to compete in this challenge, and you wanted the competition to last roughly an hour, you could multiply 88,000 by 60 seconds times 60 minutes to get the number of guesses someone can do in an hour, and then multiply that by fifty for the number of people playing.

So your hash function would have to have a much larger range: anywhere from 0 to (88000 * 60 * 60 * 50), which is a range from 0 to 15840000000.

To get the reward, you have to guess right.

See, Bitcoin uses very large numbers for the range of the hash function so that it still takes an average of ten minutes to run this guessing game for all the players — even as players grow. As soon as someone wins the game, the new block they wrote is considered “published.”

As more and more electricity has been thrown into mining, the Bitcoin network calibrates alongside it and raises the “difficulty” of each block dynamically based on how long it took to calculate previous blocks. By doing this, we obviously don’t mean that a centralized source is deciding this and telling everybody else what to do. Rather, each miner is calculating it themselves.

The more valuable Bitcoin is, and the more people want it, the higher the range goes — ensuring the “game” stays equal and wide even as more players join.

Now, Bitcoin does not actually adjust the range of the hashing function to make the challenge easier or harder, but rather uses a fixed range: 2 to the power of 256. However in Bitcoin’s version of the challenge miners are not racing to match a specific number, but rather the output of their hash function simply has to be under a certain cutoff.

Let’s explain.

Adjusting this cutoff reduces the challenge. For example if we have a hash function with a range of one to ten million, we can set our cutoff to “2” to make the odds of a match one in ten million — the output has to equal “1” for the miner to win. Or we can set the cutoff to five million, making the odds 50% that someone will get it on their first try.

On the Bitcoin network, the odds that someone will solve the challenge on their first try are always enormously small — much, much smaller than one in ten million — but given the amount of computing power directed towards solving the challenge all across the world, someone will still get it in around ten minutes.

Think about it: with enough scratch tickets, your personal odds of winning are low, even if you buy hundreds of them. But, with a winning ticket out there, somebody is bound to win — and that assurance and randomness replaces a central authority.

Putting the “Chain” in Blockchain

Once a miner has successfully solved the puzzle, what happens? How do they tell the rest of the network? How do all users come to accept the new block as canonical?

In short: with this competition, how do we end up creating one definitive history to ensure fairness and clarity?

Here we will explain a very elegant aspect of the design which ensures that miners will end up preventing double-spends simply for the sake of their own profit.

The brilliant idea that makes this possible is:

  1. Each block must include the hash of the previous block
  2. Users are told to always accept the longest valid chain of blocks as the canonical one

Together, these conditions make it so that as soon as a new, valid block is published on the network, miners have to pick up that new block and put it in their chain — start mining a new block completely from scratch.

Why?

The fact that each block must include the hash of the previous block ensures that the Blockchain is constantly updated.

Let’s say that the chain is five blocks long at the moment (this would be fifty minutes after Bitcoin was created — ten minutes for a block, remember?) That would mean we are trying to make block number six, and part of block number six is the hash of block number five.

But let’s say someone beats us to it, and publishes block number six before we do — now we are trying to make block number seven, and we must include the hash of the other person’s block number six, which we would have had no way of knowing before it was published. We can’t keep trying to make block number six, because we are wasting our time — people have already accepted the chain with six blocks as the longest — and soon there will be a seventh if we don’t hurry.

Yikes! That means must start from scratch our goal of creating the next block if we want for it to be accepted and get a reward.

The key point is that a block only is valid in reference to another block that came before it. We speak of mining “on top of” another block to ensure stacking to a cohesive future.

What if two blocks are published at the same time? Miners will pick one or the other to mine on top of, and for a little bit, there will be two competing chains. But! Within a few blocks, one will very quickly become longer than the other, and that chain will become canonical while the other is abandoned.

For this reason, users are recommended not to treat a payment as finalized until it is at least a few blocks deep.

Now we can see how it’s impossible to double-spend: after several blocks have been published “on top of” a given block, the transactions in that block are effectively “buried” — impossible to revert and locked into payment history.

There’s no way to undo them without undoing the entire blockchain. Sorry!

So, how would someone undo the entire blockchain? That would be quite hard to do. In fact, you could only outpower the other miners if you are spending more electricity than everyone else combined. This is called a 51% attack, since we would need to control 51% of the electricity going into the system in order to do this.

This is Bitcoin’s weakness, if it has one. And, as long as this scenario doesn’t happen, the system is secure.

(This is also how a government could potentially destroy the Bitcoin network. If an actor constantly 51% attacks to reverse history and replace it with a bunch of junk transactions, then no one’s payments ever end up being recorded, and it is like nothing is going on in the Bitcoin network at all — heads up if you’re paranoid!)

But this problem isn’t much of a problem of all: in order to upset the ecosystem, an actor (or government) would still need to spend as much electricity as the entire army of unsolicited volunteers from all around the world mining Bitcoin as fast as they can for a profit.

And, assuming anyone else in the world wants to stop that (think of other powerful, competing actors) they can’t get away with it.

That one actor would need 51% of all electricity in the Blockchain network — that could be hard.

***

Stay tuned for our conclusion of the series next week!