Toms Blog

Where I talk about Bitcoin and Technology

How Flowee managed to make the wallet rescan speed 10x

2018-02-12 bitcoin cash flowee

In Bitcoin (BCH) a wallet doesn’t store a certain amount, instead it stores all the incoming payments you received before. Much like cash where your wallet stores the bills you received.

A side-effect of this design is that if you want to check how much a backed-up paper wallet is worth, the wallet application is required to check for matching transactions in the block-chain. A typical needle in a haystack. The worst case scenario is when you have to rescan the entire history of Bitcoin Cash. Some 160 GB of block data holding 247,570,933 transactions.

A week or two ago I managed to import and sell some Bitcoin Gold coins, and I was again reminded of how slow this process was. It took almost an hour (53 minutes from the logs) to scan the whole chain.

For Flowee the Hub, I spent the time to fix that and now a full rescan on Bitcoin Cash took me 5 minutes. 

The rest of this post is dedicated to explaining how I managed to speed up parsing ten times so a sub-1000 USD machine can finish the entire 9 years of history in 5 minutes. I’ll try to be only moderately technical.

There are two huge problems in the Satoshi code for handling of blocks. First of all, it ends up reading every block and all its transactions into internal memory buffers. Lots of lists of lists of things. I researched this before and noticed that a 100 MB block ends up taking 400 MB of allocations, still holding approx 300 MB at the finish of loading.

So the first problem can be solved easy in Flowee because this is something I attacked some weeks ago by introducing memory-mapped blocks. Loading any block of any size takes (near) zero bytes of memory and only one allocation (malloc).

A lot of code in Flowee doesn’t take full advantage of this yet because basic access to transaction data was missing. The work I did last week was to allow me to access each transaction and each little part of each transaction from that one memory-mapped block..

I wrote an iterator to allow this;

FastBlock block = Blocks::DB::instance()->loadBlock(
          blockindex->GetBlockPos());
Tx::Iterator iter(block);
bool addTx = false;
while (true) {
  Tx::Component type = iter.next();
  if (type == Tx::End) { // end of transaction
    // on addTx==true use iter.prevTx();
    if (oneEnd) // the second end means end of block
       break;
  }
  oneEnd = type == Tx::End;
  if (!addTx && type == Tx::OutputScript) {
    CScript script(iter.byteData());
    addTx = ::IsMine(*m_wallet, script);
  }
}

The beauty of this solution is that we can end up iterating over the entire blockchain without ever copying the data into additional buffers. Even CScript just uses the external memory-mapped data to do the comparison.

This solves the first part, large memory consumption and slowness due to copying of block data.


I said there were two huge problems, the second was that the original code only used 1 core. Which effectively means the CPU never raises above 12 12 percent of busyness for my 8 core machine.

Just making N blocks be checked at the same time on N cores would be the obvious solution, but that would be wrong. The problem may be that we end up checking a block without having the full information.

For instance, you receive a payment in block 100, and you update the database with that info. At the same time 7 other blocks were being checked for incoming transactions and those blocks will not know about this tx I just received. This is relevant because there may be another transaction that spends the first, in just the next block. So you’d miss that one.

The solution I chose is based on the knowledge that no wallet has transactions in each of the 510,000 blocks. Most blocks are “empty” to a wallet..

As such, I optimistically check all blocks in parallel until such a moment that I find a transaction. In that case I just rescan the blocks that were already being scanned in parallel. Doing a little extra work to allow the vast majority of work in parallel.

Running a full rescan now ends up taking about 90% CPU time, across all cores. A substantial part of that (about 40%) is waiting for the SSD drive, but I guess that was to be expected.


This blog opens with the time benefit, and that is a really important part of this. It goes to show that massive adoption on-chain really is possible. But the main reason this is important is memory usage. If we reach gigabyte blocks any Satoshi based client will need huge amounts of memory to process them. Especially when we want to do them in parallel. The solution in Flowee completely eliminates this need.

Please visit our project page; http://FloweeTheHub.com