Crypto 101

What Is "Blockchain" And How Does It Work?

 This is both a video and text post, where I walk through the technical fundamentals of blockchain technology, focusing on Bitcoin's blockchain.

This is both a video and text post, where I walk through the technical fundamentals of blockchain technology, focusing on Bitcoin's blockchain.

Blockchains explained!! I’m going to go from a bit of a technical perspective. And then make another video in the future about the impact it has had and will have.

The blockchain can be thought of as a distributed ledger system. The key terms here are distributed and ledger, distributed being the opposite of centralized or being in one location; and ledger being a continuous recording of events, usually meaning transactions. You could also think of recording students coming and leaving school for example. I’m going to start with an overview and then go into detail.

Blockchains comprise of data or information that’s organized in succeeding blocks, one after the other in a chronological or time-based order to form a chain. The block is just collection of information collected in a standardized format among all the blocks. So the blockchain is non-physical and can be though of as a database, so you can expect to find the same fields, for example date or amount, within each block, but the data attached to those fields change, capturing what happens one moment to the next. 

Before getting into what is actually in the blocks and what happens when a new block is added, let’s look at why it’s distributed.

It is distributed because, by design, the information is not owned or updated by one person or one central group. This is called distributed consensus. There is a network of people participating, that can be joined by anyone through a connection point called a node, usually a computer interface. The information can be updated by anyone on the network who has the right resources to do so and can be verified by all the nodes. There are ways to verify that the information recorded is accurate and make it almost impossible for those updating the blockchain to change it in their favor, much more so than the centralized ledgers that have existed in human history.

One thing to note is that the blockchain, as it is information, is written in programming languages. But the specific language can change depending on the blockchain, and as well the kind of content stored within it. The blockchain itself is just specifying what kind of information and how it should be written, but the programming language used to do it can change. So Bitcoin, arguably the most popular use of a blockchain, is just a blockchain being used as a cryptocurrency. Bitcoin’s blockchain doesn’t have the same content as another’s say Ethereum, is not necessarily written in the same language, and does not serve the same purpose. Regardless there can be commonalities between different blockchains, such as the basic structure, and methods used to secure the information on it.

That’s really all you need to know for a basic understanding, that a blockchain is decentralized information stored in blocks and that it is continually updated by over time, But I’m going to go into more detail, using Bitcoin as the example.

For Bitcoin, it’s a peer-to-peer or distributed monetary system or peer to peer electronic cash system as coined by its mysterious creator Satoshi Nakamoto. It’s a way to define and store value, move that value from person to person, accurately track that movement, make sure the right person is doing the movement, as well as give the ability to mine or add to the already existing amount. The blockchain itself is the mechanism by which this is done. There are no physical coins just information, but for lack of a better term, from a coin is first created, one can track it’s location as belonging to one person, and then it’s movement as bits of it or all of it is moved from person to person. So the ledger or blockchain is actually tracking who spends how much, with whom, and when. When there is a transaction, there are inputs and outputs created to say how much was received, and how much was spent. The unspent amount is still seen as remaining at the address because it can be calculated that a certain amount was received, and a certain amount not spent and so can be spent. 

Instead of a person, an address is used, in this case a string of characters, which may or may not be able to be linked to an individual. The address is actually a key (actually a hash of a key/that’s a simplification), just one of a key-pair, something used in public-key or asymmetric cryptography. The address that is recorded on the blockchain as having sent or received funds, is the public address that someone can share, but only the owner of the other key in the pair, that’s the private key or private address, has the ability to spend the funds once received. For more information on public-key cryptography, watch my video ‘What is Cryptography?’ or read up some more on your own.

The blockchain just records how the funds are moving. When someone unlocks their wallet with their private key and decides to send it to someone else’s address, that is their public key, this transaction information is broadcast to the rest of the network. There are special nodes on the network, called miners who are able to write that transaction to the blockchain. The transaction is not complete until this is done. The first step is to verify that the sender has those funds to be spent, because the history of all transactions on the blockchain can be checked, and so unspent amounts calculated. The second step is to record the new transaction with others broadcasted around the same time, onto the blockchain by compiling it so that it forms a block, and then adding that block to the blockchain. Miners do both of these steps.

Since anyone can decide to be a miner and participate in the blockchain, there has to be a system to decide which miner gets to do it. First of all, at this point, for bitcoin, miners have to have specialized computers called ASICS (Application Specific Integrated Circuits) built specifically for mining crypto, that allow them to write to the blockchain. Back in the day, one could have used a regular old CPU. But things got a little difficult over time as I’ll explain. This is because they have to compete to figure out who gets the right to add the next set of transactions to the blockchain. They do this by performing a calculation that takes a very long time, figuring out a hash that begins with a certain number of zeros. A hash is the output after shortening an arbitrarily sized string of characters, to a specific length. Blockchains use a cryptographic hash function, which among other features, means one can’t figure out the original information from looking at the hash, and changing even one character in the original information, drastically changes the resulting hash. Bitcoin specifically uses SHA-256 (secure hash algorithm). Different mining computers have different hash rates, but the idea is that since many miners are working to solve this problem at the same time, the probability of the correct hash being found can be predicted to make it so that only one miner at a time will be likely to find it, and so be able to write the new block, and it also decreases the chances of one miner being able to do it successively and so write information in their favor. Miners are basically just arbitrarily adding characters, that is searching for a nonce (exactly what it sounds like), to the new set of transactions to be put in the new block, until the right combination is found that results in the right hash with the correct number of zeros at the beginning.

Screen Shot 2018-05-09 at 16.16.53.png

Each block generally has multiple transactions in it. And the information in one block is actually a hash of the previous block as a header, the current transactions being written, and some extra characters that are combined with those two in order to come up with this hash that has a special number of zeros. The time it takes for all the different miners on the blockchain network to compete, and then one miner winning, is about 10 minutes. So a new block is added every 10 minutes. The number of zeros needed is lengthened gradually in order to keep that time being 10 minutes. This is necessary because more miners with faster computers participate in the network overtime, so the difficulty of the hashing problem has to increase as well to maintain that 10 minute time-frame. This time and method of competing for the right to add a block, is specific to bitcoin, although other blockchains can have it too. This kind of right to add a block through calculating hashes is called proof-of-work mining. Even though it takes a very long time to calculate the hash, it does not take such a long time for other miners to verify that that hash is correct. This continuous process of each new block being compressed to form its own hash, then that hash being merged with new transaction information to form the next block hash, results in what is called a merkle tree. Although one can look back in time to view every transaction that ever occurred on the blockchain, the continuous compiling of the previous hash into the next, results in data integrity, where one can quickly check that the most recent block information is true. Blockchains can also be viewed as merkle tress, with a new hash being formed combining all previous hashes, so one cannot lie about what happened before, like saying you never spent money you did spend, as it would change the whole blockchain.

Continuing our explanation of how this works in bitcoins blockchain, I mentioned that the block itself contains a hash of the previous block, the new transactions, and then the extra information to find the right hash. Because each new block has the last block’s hash in it, and the hash changes if anything from the original message is changed, this prevents anyone from tampering with the blockchain.

The important thing to note here is that the miners competing for the right to add the next block of data with new transaction information, prevents what is called a double-spend attack from happening, that’s someone spending the same funds twice and trying to lie to the network. Because all the miners receive new transaction broadcasts, they all begin to compile new blocks whenever they receive those transactions, and then take some time to figure out what the right nonce is to be added. The first one to win the proof of work, broadcasts his version of the blockchain with their new block, and that is accepted because other miners can see that the transactions in it were viable, by checking what’s unspent at that address, and they can see that the right nonce was found. At this point, miners begin to compile the next block using this newly accepted blockchain, referencing the hash of this newly accepted block. If two miners somehow solve this at the same time, a very low probability, two versions of the blockchain are created with different transaction ordering, and both broadcasted. Each miner begins working to add the next block based on whichever one is received first. The longest version of the blockchain is the one that is accepted as valid.

If someone were to attempt to double-spend by sending a transaction then trying to send again when they don’t have the funds, they would write a new block spending funds they already spent elsewhere, replacing the one with the initially spent funds, and other miners would then deny the second transaction which could hurt the receiver. But this bad person would have to do this faster than other miners can write the blockchain. This would be very difficult to do because they would need to have the computing power necessary to outcompete the other miners, with their version of the blockchain that is longer than everyone else’s. They have to write all the new blocks based off of their new deceitful block since all blocks reference the one prior, and it has to be longer than everyone else’s to be accepted by the rest of the network. Because other miners were already working on writing the new block when the first transaction was sent to the first receiver, and they have been adding to it, they have the hash from the block with the correct information, included in all subsequent blocks, and they statistically speaking are faster at writing them than the bad person could ever be. That person can’t just slip in her second transaction that robs from the initial receiver, because she has to write that block, plus all the next blocks until it is longer than the other chain that other miners on the network are already creating. The odds of this dishonest person or miner being able to do this and write this second transaction to the blockchain, creating the longest one, faster than everyone else, is very very low, as thought out in the original idea for Bitcoin.

This is why it’s suggested that after sending or receiving funds, one wait for a certain number of confirmations (as in new blocks being added) to make sure that enough time has passed where a dishonest person’s probability of catching up to recreate a new chain where your funds are spent again, is negligibly low. There is the issue of mining pools being able to combine computing power and so having the ability to do double-spend attacks. The benchmark is having over 51% of hashing power on the network. At this point in Bitcoin, this is only solved by mining pools deciding to limit themselves. Another method of attack is to cripple other nodes and so automatically defer mining power to other miners who may then have the majority of it, thus being able to keep writing the blockchain which is likely to end up being the longest one. Other blockchains such as Particl or what Ethereum plans to do with CASPER, run on proof of stake instead to provide distributed consensus.

Apart from verifying that the transaction is valid, and ordering the transactions, miners also create new bitcoins in the first transaction that is added to the block. This is called the block reward and is an incentive for miners. This block reward is set to decrease every few years until it is no more. In the future, when there are no more block rewards, miners will be able to receive transaction fees based on simple economics, deferring to the participants willing to pay the higher fees for their transactions. 

As a side note, this hashing to write blocks, and use of the public and private key pair for sender verification, is the only cryptography that actually exists in bitcoin, and data is not actually encrypted on the blockchain. One cannot undo hashes to decrypt the data and see what was there, and the actual transactions are recorded because one can look back at transaction history for each block, but that is not hidden. The “cryptographic” security of the bitcoin blockchain is in the fact that only the person with the right private key can move their funds, that the address is not necessarily linked to a person, that distributed consensus is needed to verify transactions, and that the data (in a merkle-tree structure) cannot be altered after it is written. Other blockchains are emerging, such as privacy coins, like Particl, that provide much more security and there are also other blockchain solutions that encrypt the data itself before it is written to the blockchain.

To recap, a blockchain acts as a public ledger, recording information, transactions in the case of bitcoin, in a time based manner, using a decentralized network to update it. It is immutable meaning once the data is there it cannot be changed. It is publicly verifiable and doesn’t rely on one institution to update or validate it, It is secure in that it uses cryptography in the form of a public and private key system to ensure that only the right persons can move funds. The major issue it solves is that it removes the need for trust and discourages fraudulence from a centralized institution. It does not only have to be used as an electronic payment system as in cryptocurrencies, but any database with records can use a blockchain as its underlying technology.

I have to add that this is not a perfect explanation, because it would take a long time, especially the intricacies because adding a detail means explaining it. I hope that was really useful. Leave any comments or questions below. I’d love to read them. I’ll be talking about the implications of blockchain technology in the future. 

Learn more: https://bitcoin.org/bitcoin.pdf How Bitcoin Works Under the Hood by Curious Inventor https://www.youtube.com/watch?v=Lx9zgZCMqXE http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/

Proof of Stake vs. Proof of Work | Who Will Win?!

CR0013 2 Proof of Stake vs Proof of Work | Who Will Win?.jpg

Why Proof of Stake Wins. The battle between proof of work aka POW and proof of stake aka POS is raging. I'm placing my bets.

Is proof of stake better than proof of work? In my opinion, yes. Here’s why:  

Blockchains are essentially distributed ledgers created for the storage of data. In cryptocurrencies, they are used to store transaction information, verifying their accuracy and ordering them chronologically. Because the blockchain is distributed with many participants on the network, there has to be a way of deciding who gets to write the next set of transactions, so that there is only one unique blockchain. There has to be what is called distributed consensus. When the first cryptocurrencies were created, proof of work was this method of creating distributed consensus, by having special nodes called miners compete to solve a cryptographic problem. This solved the problem of needing honest nodes to validate transactions, because there was a method of competition to select who writes the next block, and then the rest of the network could also verify that the recorded transactions were true after the work had been completed. Incentive was also provided to the miners in the form of a block reward, or creation of a new token/coin on the blockchain, when a new block was written.

This proof of work method of distributed consensus has some disadvantages that are increasingly becoming a problem in the cryptocurrency world. These include a concentration of mining power which defeats the goal of decentralization of cryptocurrencies as well as the environmental impact which is still in its early stages if true cryptocurrency adoption emerges in the future. Centralization comes in the form of mining equipment manufacturers being limited to a few companies, the fact that only certain people can afford mining equipment, that mining is concentrated geographically, and that mining pools can now overtake the network and write transactions in their favor if they choose to or deny service to others. The high electricity costs resulting from proof of work mining is only expected to increase as cryptocurrency adoption grows, and will still add a hefty weight to the transaction fees if there are no block rewards in the future. In the case of Bitcoin, where mining fees will eventually be reduced to zero, there is also the issue of less incentive for miners to remain loyal to the network when mining another cryptocurrency may produce greater profits. Loyal and dedicated nodes are necessary to secure the blockchain and provide distributed consensus.

These disadvantages of proof of work mining have been known for a long time, but many of the problems were not immediate before the scaling of cryptocurrency networks to what they are today. It is important to note that proof of work has its advantages in that it solved the problem it was created for, namely that of getting honest nodes to validate and record transactions. Due to some of its disadvantages however, another alternative called proof of stake has arrived that can provide distributed consensus just as well, if not better.

For proof of stake cryptocurrencies, instead of having miners compete through solving a cryptographic problem, the next node to write the block is chosen depending on their proof of ownership or proof of stake in the network. There is some variety in how exactly this is determined, but the amount of stake is generally dependent on the amount of coins a holder has as well as the length of time they have been participating in the network. So instead of the probability of being chosen to write the next block being depending on mining power, the probability is dependent on the holder’s ‘stake’ or investment, meaning amount and time in the network. These nodes are called stakers or foragers and new coins are ‘minted’ rather than mined’. The effect of this on solving the centralization and environmental issues of proof of work coins like Bitcoin, is significant. Many proof of stake coins began as proof of work coins and then decided to switch to proof of stake. Examples of proof of stake coins include  peercoin, lisk, nxt, particl. Ethereum is also on its way to becoming a proof of stake coin. There are also delegated proof of stake coins which are not to be confused with regular proof of stake coins and those have a slightly different system, which I will not get into here.

The first obvious issue that a proof of stake system of distributed consensus solves is that of reducing electricity costs. Proof of stake blockchains do not need its validators to initially purchase and update expensive mining equipment. Proof of stake also requires more loyalty on the part of the stakers than proof of work does from its miners. Proof of stake can also give rise to the monopoly issue, created through wealth disparities or mining pools, as large holders have greater chances of earning more. However, it is more difficult for someone to own 51% of the coins on a network due to prohibitive costs than for someone to have 51% of the mining power, and thus become a dishonest node. This scenario of sufficient mining power being concentrated for an attack to occur, has already been reached and its negative effect has only been mitigated due to the choice of mining pools, requiring trust. The cost to invest 50% of bitcoin’s market cap, not assuming the price will go up as someone buys that much, is far greater than the cost to buy the mining equipment to achieve 51% of the mining power. It is also more likely for an individual with concentrated power on the network to use it benevolently, in the case of proof of stake, because their major investment is the coin itself, and reducing trust by double spending or denying service, would negatively impact their own capital. There are also variations on how proof of stake can be implemented to ensure some distribution for how often a staker gets to write to the blockchain based on how recently they did it. And the likelihood of a node being chosen also depends on its time invested not only amount. Other advantages of proof of stake include lower transaction fees due to lower hardware and software costs to keep the network running, faster validation times, and a smaller chance of honest nodes leaving as miner rewards are reduced overtime. There is a lower likelihood of over-reaching governments being able to create prohibitive barriers to entry, such as needing a license to mine, since only running software is less conspicuous than running specialized mining equipment.

Understanding that the power and promise of blockchain technology lies in its decentralized nature, as opposed to the centralized institutions of today, methods of decreasing centralization through proof of stake are more likely to succeed in the long run than only relying on proof of work as it exists today.

What is Cryptography?

CR0009 1 What is Cryptography?.jpg

This is both a video and a text post, walking through the fundamentals of cryptography and how they apply to blockchain technology and cryptocurrencies.

I am going to mull over some of the ideas and information I learned while doing research on cryptography. Cryptography arose out of the need for people communicating to ensure that the message is received by the correct person and also received accurately, meaning not having been tampered with. It’s a way to secure communication.

Cryptography is the term most commonly used but it’s actually just one half of the equation, the other being cryptanalysis, and both coming under the term cryptology which is the study of both cryptography and cryptanalysis. Cryptography does not only involve the encryption of data but means a way of scrambling information, or protecting information from unwanted third parties, or sometimes all parties but while maintaining data integrity. I speak more about this at the end of this article.

So cryptography specifically is the process of applying a formula or algorithm to a message so that it is indecipherable to everyone except the intended recipients. Cryptography, in the form of encryption, uses an encryption algorithm, an algorithm being a set of rules that define a process applied to a given input to get a given output. Cipher is the term used to describe the encryption algorithm and the cipher-text is the resulting information that is produced after applying the formula or algorithm to the original message or plain text.  Cryptanalysis is the method of deciphering the encrypted data by figuring out the pattern in the cipher-text that gives a clue as to what the original message was so you could work backwards to figure out the original method. This decryption or pattern analysis happens both at the recipient’s end when intended, but can also occur if an unwanted third party or eavesdropper is able to figure out the plain text by analyzing the cipher, or somehow getting the key. I’ll explain what a key is in a second.

Encryption algorithms used to involve letter substitution, and these evolved over time. But more modern methods involve multiple alphabets and converting between letters and numbers as well. In the past, people would come up with encryption algorithms that they kept secret as they assumed keeping the method of encryption secret led the to channel being very secure. But, counterintuitively, making the encryption algorithm public is the best way to ensure it is secure. Through elimination, cryptographers could find the most secure algorithms if they had not been broken as others kept trying over time. Thinking that a code was secure just because the people working on it thought it was secure resulted in even wars being lost (the enigma machine). Nowadays, there are standardized algorithms that have not yet been broken.

But because the encryption formulas are standardized so they are less likely to be broken, there has to be a way to make sure that not anyone who knows the formula can get the message. The way this is solved is through the use of keys. Even though there is an encryption formula, there are different keys that define what the cipher-text is when applied to the plain text or what the plain text is when applied to the cipher-text, that is encryption and decryption keys, respectively. So the encryption algorithm works with keys. Keys are the “key” to deciphering the text and the crux of the security. So the key has to be sent over a secure channel, say in person. There is symmetric cryptography where the same key is used to encrypt and decrypt the message and there is asymmetric or public-key encryption, where there is both a public and private key. 

In asymmetric cryptography, public and private keys work as a pair where someone could publish their public key so anyone can send a message to them specifically, but only that person with the private key is able to decrypt and read the message. Using a public and private key pair, the private key need not be shared with anyone. Multiple public keys can be generated from a private key, which adds another layer of security. A private key can also be used for authentication purposes as a digital signature, because a message can be encrypted with a private key as it is sent, and then the receiver can verify that it is sent from the right person using that person’s public key. 

It’s important to note the use of hash functions in cryptography, where a string of data can be output to a specified character length ‘hash’. When signing with the private key by generating an encrypted message using that key, it’s not the whole message that is verified but a hash of the message. The hash or short version of the message will change with a slight change in the original message and is different if signed by the sender who does not have the right private key. So, the receiver can use the public key also to verify that the right private key signed the message by looking at the hash of that message.

I hope this is not too confusing when I talk about signing a message with a private key and the person with the public key being able to verify that the message is from the right person. Remember that the key is what is used to encrypt or decrypt. So a message, though it may use the same encryption algorithm as another’s, will not produce the same cipher-text and so plaintext if it uses another person’s key. As a side note, each encryption algorithm has a set number of keys and the security of the encryption algorithm is dependent on this, as a hacker could attempt to decipher a message by going through all possible keys. But the encryption algorithms used by blockchain technologies such as bitcoin have such a large amount of possible keys that it would take a really really really long time, basically infinite in human terms, to go through them all with current day computer processing power. 

Bitcoin and cryptocurrencies as you will figure out once you begin to use them, use public-key encryption, where a wallet which is simply a storage of data is created having both a public key and a private key. The individual with the correct private key has the right to send and receive ‘messages’ or ‘funds’ from their wallet. But anyone with their public key can send funds to them. Each party can verify who is sending and receiving and whether or not the transaction information can be recorded to the blockchain. If this is all too confusing, you basically have to keep your private key very safe, only using it to access your wallet, but you can share your public key in order to send and receive funds.

I really like to think of cryptography as the solution to a problem. If I wanted to send a message to a specific person using the video format on YouTube, I would be aware that everyone watching, and also the people on YouTube who are not the intended recipients of the message, are able to listen. So, I am communicating across an open channel. Just thinking of the top of my head, maybe I could somehow scramble my message, maybe by speaking in reverse. What I do can’t be random, because it has to have a formula so that the original message can be heard accurately. But it also has to be a bit more secure then being played in reverse, so it would change slightly depending on who the recipient is, and they would be the only one able to actually play the original message, even if the basic algorithm is that it is played in reverse. Other real world applications of securing an open channel include uploading or download packets of data over the internet or when using cell phone networks.  We can encrypt data on our hard drives or phones using software. There are also messaging apps like WhatsApp and Signal that focus on encrypted communication. But data security goes beyond this to preventing social engineering attacks such as phishing emails to collect the necessary information like passwords to unlock or un-encrypt data. So the whole system has to be looked at and not solely rely on cryptography. Cryptography solves the problem of communication over an insecure channel.

EDIT: After writing and sharing this video, a reddit user clearly clarified for me that  encryption is only one of the branches of cryptography. This can be compared to examples such as signing, commitment schemes, private information retrieval, offline digital cash, etc., which also fall under cryptography.

Learn more:  https://media.ccc.de/v/SHA2017-494-cryptography_beyond_encryption_and_signatures