Blockchain analysis, or why the mixer broke?
Blockchain analysis, or why the mixer broke?
Based on materials from my report at the “Digital Transformation” conference in Moscow on April 16, 2018
I'm interested in how blockchain works. Not only what algorithms, cryptography, platforms and cryptocurrencies are there. For me, blockchain is not only a technology, but also a new type of life, a new universe. If you doubt this, take a look at this Aragon token sale graph:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2F3f%2Fea%2F1i%2F3fea1izwdoaypv4s-j61shhxrl8.png)
All these addresses, smart contracts, tokens constantly interact with each other, and behind them are the actions of people, organizations and robots. Without this interaction, blockchain and cryptocurrencies would have no meaning or value.
How businesses operate on the blockchain, what people and robots do there - these questions forced me to start researching the blockchain.
Problem and solutions
The blockchain network, and we are talking specifically about public blockchain networks, is actually completely open. You can read absolutely any information about blocks, addresses and transactions. For programmers, there are APIs for this (for example, Web3 [1]), and for mere mortals there are blockchain researchers, for example Etherscan [2]. In addition, any full blockchain node downloads to a local disk all the blocks from the beginning of time with complete information inside, as this is required to verify the correctness of transactions, and God forbid, mining. That is, any blockchain node is its complete copy, and even with access interfaces and detailed documentation.
It seems that everything is there for analysis, but that was not the case. Blockchain vs. Let's remember what the word blockchain means in Russian: a chain of blocks. Blocks store transaction records and meta information to ensure integrity and coherence. To find something in the blockchain, you need to know the block number or the transaction hash, or at least the address. There are no indexes other than those listed on the node.
Etherscan is also not much better. It shows the same thing that is available through the API, only in the form of web pages. And also, to find something, you must know in advance either the address, or the transaction hash, or the block number. You see the blockchain through a narrow window limited to these entities. It’s like studying the universe with a microscope; existing tools are completely unsuitable for analysis “in the big picture”.
To get philosophical, I even drew this diagram, which shows the essence of the problem:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2Fil%2Ft6%2Fea%2Filt6eahszfh2ss5uachuvwehtzw.png)
Everything is more or less clear with cryptocurrencies; long-known methods and tools of exchange trading are used for their analytics. You can get reliable and objective information about all parameters of cryptocurrency on many websites on the Internet.
The same cannot be said about blockchain yet. The information is mainly either purely technical for those who understand (like “Etherscan”), or fiction about the ICO [3], DAO [4] projects, which has an obvious subjective bias and is not verifiable by mathematical methods.
Blockchain is not transparent in general, although all information is widely available, so we will work on it!
Technical tools for blockchain analytics
Let's first understand the scale of the problem. There are many blockchain networks, and also many different platforms on which they are built. You have to start somewhere and I chose the Ethereum Foundation network for several reasons:
- Many participants
- The capitalization of all network currencies, including tokens, is perhaps the largest of all
- Smart contracts [5] and DAO [4], expanding the possible analysis and making it much more meaningful and useful
Even selecting one network, we get quite a lot of data (as of June 15, 2018):
| Number of cryptocurrency transfers, total | 267 mln | 
| Cryptocurrency transfers per day, on average | 750 thousand | 
| Number of valid addresses | 44 mln | 
| Number of smart contracts | 6.8 mln | 
| Number of issued tokens | 48 thousand | 
| Smart contract calls per day, on average | 690 thousand | 
| Approximate amount of compressed data for a full node | 117 GBytes | 
Initially, there was a desire for the analysis to be as close as possible to the real state of the network, that is, in real time. This has two technical aspects:
- Information from the blockchain must enter the database as quickly as possible as soon as a new block is created. We want to see current information, not an archive;
- We want to receive reports quickly, within a second or faster, so as not to lose interest in asking a lot of questions.
The choice fell on the Clickhouse database [6], an open source project from Yandex. I had not used this system before, and the guys from Altinity [7] helped me figure it out, for which special thanks to them.
The general structure of the system is as follows:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2Fjg%2Fbs%2Fmm%2Fjgbsmmdxs4m32db_8ut5l-xr-oy.png)
The raw data is read from a full node of the Ethereum network by an ETL (Extract - Transform - Load) process, which parses the data within the block and writes it to several tables in the Clickhouse database. The process starts as soon as a new block reaches the network node and runs continuously.
On the right side of the chart, current data users:
- SuperSet analytics tool [8]. With it, you can make cool diagrams and quickly combine queries to get answers to analysis questions;
- Python Jupiter [9] for a more complete analysis of machine learning tools and the application of statistical algorithms;
- Bloxy website and API [10] for public use of information.
It took some time to index the Ethereum database, after all, there are already almost 6 million blocks, and each one needs to be read from a node and processed, but this work is behind us, and we can finally enjoy the full power of the analytical database, especially since the data is simply m Mmm, how delicious!
Tokens
Let's start with tokens, since this is the most popular application of smart contracts on the Ethereum network, one might say, the purpose and meaning of its creation. Tokens are cryptocurrencies that anyone can issue using a certain type of smart contract. The main token standard is ERC20 [11], but as we will now see, everything is not limited to it.
Now, with our analytics base and SuperSet in place, we can see what tokens are being issued, how they are being used, and what is popular right now:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2F-p%2Fmo%2Fec%2F-pmoec7ziylhi2yyf8av4ubthsi.png)
The data is given for the entire existence of Ethereum. The pie chart shows that ERC20 tokens have overwhelming adoption compared to other types. The graph of the number of actively used tokens during transfers has been growing steadily so far, and this means that the enthusiasm for ICOs is not subsiding, but even vice versa. In fact, sometimes several hundred new tokens, read crypto-currencies, are created per day, but not all are included in this chart, only those that are actively used.
The graph below does not show such rapid growth over time, it is the number of token transfer transactions per day. Somewhere in the spring of 2018, it approximately stopped at 400 thousand transactions per day and is not growing. Essentially, this means that new tokens account for significantly fewer transfers than before.
There are two anomalies in this graph: the peak of ERC20 token transfers in November 2017 and a less pronounced “hump” of growth in ERC721 token transfers in December.
The November peak is associated with the InsPromo token, which was distributed to almost a million addresses “just like that”; it was an “airdrop” type advertising campaign [12]. This method of attracting ICO clients has been used before and after many times, but the scale of the spread of free “coins” in 1 day is record-breaking!
The December interest in the ERC721 token was entirely related to the CryptoKitties game, with people buying and growing digital kitties with great enthusiasm. The graph shows a rapid increase in the turnover of Crypto Kitties and a decrease in transactions of other tokens, apparently people have forgotten that there are some other tokens.
Crypto-Beasts and more
ERC721 tokens [14] appeared, in fact, with the advent of crypto-cats [13], although their potential use is much wider. If the ERC20 standard gave everyone the opportunity to issue a cryptocurrency measured in a certain amount, then ERC721 gave everyone the opportunity to designate the ownership rights of any object, virtual, real or even intellectual world.
Technically, each ERC721 token stores an identifier that is unique within the smart contract. This identifier could represent a crit cat, a golden sword, a piece of land, or a patent for an invention. Ownership of the identifier is secured in the blockchain. Since there is a standard for exchanging ERC721 tokens, they can be seen in a wallet, traded on an exchange, and other general transactions.
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2Fv3%2Fi0%2Fpm%2Fv3i0pmtkyp5i4sx-rz5z-bewbnw.png)
The top graph shows the number of transactions of all ERC721 tokens. December 2017 saw a big increase, 100% from the game Crypto Kitties. Interest in the game lasted throughout December, then gradually subsided.
The bottom graph shows the number of different ERC721 standard tokens in circulation, read projects, based on this technology. In December there was only one Crypto Kitties, but in February there were already several dozen of them. The names of the tokens are shown on the left, the greater the number of transactions, the larger the font, so far kittens with the CK symbol are in first place.
Why do we need a mixer?
Analyzing the entire blockchain makes it possible to find patterns and anomalies that are not visible at the micro level of transactions, addresses and blocks. One striking example is a “mixer” of thousands of bots running on the Ethereum network.
Let's start by looking for an anomaly in the distribution of addresses by the number of recipients and senders of cryptocurrencies:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2F4l%2Fz8%2Ff-%2F4lz8f-qef2vrkz9shfumgj2o_da.png)
On the horizontal axis - the number of addresses from which money was transferred to the address, on the vertical axis - the number of addresses to which money was transferred from the address. The size of the circle is the number of such addresses.
The left diagram is taken for December 2016, and the maximum circle falls on addresses with one recipient and one sender, slightly less than addresses without recipients and also with one sender. This is quite understandable, most addresses receive currency from one source and spend it in one place or do not spend it at all, but store it.
But in December 2017, the circle with two senders and three recipients grew abnormally in size. And there are several million such addresses! To understand the situation, let’s select one of these addresses from the squeezed circle and build a translation graph:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2Fsi%2F1o%2Fg9%2Fsi1og922-6jklgi2ny1yp1dowog.png)
It can be seen that all these addresses are connected into a giant mixer that transfers money within itself. Since on average there are more recipients for each address than senders, a huge number of recipients are obtained from the original sender in a few steps. Of course, this is not done by people, but by robots, since there are more than 4 million such addresses and they work smoothly and very quickly, sending money further within minutes.
We assessed the volume of work of this huge robot by separating transactions of these addresses from other transactions in the network:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2Fkm%2Fxk%2Fua%2Fkmxkuajh3ziym_sz_5xkkkwaopo.png)
In terms of the volume of transfers, the mixer (orange columns) in some months exceeds all other transfers in the network (green columns) several times. It is necessary, of course, to take into account that it transfers currency within itself and its external turnover is not so significant, no more than 17 million ether (today $10 billion).
Mixer transactions have been consuming a significant portion of the Etehereum network bandwidth for many months. The maximum of its activity occurred at the beginning of 2018, when every fourth Ethereum transfer transaction was initiated by this robot, as can be seen from the blue graph of the share of mixer transactions in the total number:
:extract_focal()/https%3A%2F%2Fhabrastorage.org%2Fwebt%2Fhd%2F2z%2F4x%2Fhd2z4xkprso30vi4amj8iabwkbe.png)
But suddenly at the end of February 2018 it stopped working. Since we do not know the reasons for its use, we can only guess at the reasons for its life and sudden death. Or maybe he didn’t die, but changed the algorithm and simply left our radars?
I believe in blockchain
I believe in blockchain. Businesses, people and communities benefit from its use. To use it, you need to understand how it works as a phenomenon. According to what laws does it develop, what internal anomalies, trends, recessions and ascents are there?
A more transparent blockchain will allow businesses to operate efficiently with their eyes open. Ordinary users will better understand what exactly they are doing, what they are participating in, be more secure and happier.
At the end of the day, blockchain is not so much about networks, platforms, blocks and transactions as it is about people and communities. The success of the development of this technology depends entirely on the public's acceptance of it, and transparency is important in this process.
Literature
[1] Web 3: A platform for decentralized apps
[2] Etherscan
[3] ICO
[4] KNIFE
[5] Smart contracts of the Ethereum network
[7] Altinity
[8] SuperSet
[9] Python Jupyter
[10] Bloxy
[11] ERC20
[12] WTF is an Airdrop? A Detailed Guide to Free Cryptocurrency
[13] CryptoKitties
[14] ERC721 standard
 
 
Коментарі
Дописати коментар