I used stylometric methods to uncover the identity of Bitcoin’s founder Satoshi Nakamoto and found Gavin Andresen to have a similar writing style to Satoshi.
The founder of Bitcoin, Satoshi Nakamoto, is still yet to have his identity uncovered. Satoshi released the Bitcoin whitepaper which helped develop the Bitcoin system, and then disappeared around 2011. Many people have wondered if it is possible to identify Satoshi using methods of textual analysis.
Stylometry is a method of identifying an author based on statistical patterns of writing style. There are many methods of stylometry and a popular one is Burrows’ Delta. The researcher Eder created an R library called stylo which allows for easy comparison of stylometric methods including those of Argamon, Eder, Burrows, and Craig’s Zeta.
The possible suspects for Satoshi’s identity include Nick Szabo, Wei Dai, Gavin Andresen, Craig Wright, and Hal Finney. I collected their writings online so that I could chuck them through Eder’s stylometry library. Eder suggests that optimal results are achieved when 3,000 words are analysed as a bare minimum.
For this reason, I made sure my samples were above 3,000 words and chunked smaller samples together to reach this threshold. Eder also suggests bag-of-words methods, a form of random sampling, can improve the results and so I employed this technique too.
I used 60 Most Frequent Words (MFW) because Burrows has suggested that this is an optimal selection of features for analysis. I also limited the amount of intruder texts as accuracy decreases with the increase of intruders when using Burrows’ Principal Component Analysis (PCA) technique.
I also wanted to limit the amount of MFW to optimize the amount of function words (connecting words) used in the analysis. It has been suggested that Wei Dai and Hal Finney cluster close together because they both wrote on a forum titled Less Wrong which had similar philosophy words used. I wanted to account for this possible confound as much as possible.
I found that Gavin Andresen’s writing style resembled Satoshi’s most of all and consistently clustered with Satoshi over different techniques. Nick Szabo consistently clustered separately from Satoshi and this came as a surprise because many people have suggested he has a similar writing style.
Szabo seemed like a good fit for Satoshi because he coined the term BitGold and was confident with writing academic papers. Wei Dai clustered separately too and this came as a surprise for me because I noticed similar patterns between Dai and Satoshi. Dai, for example, talked a lot about game theory in his forum posts and this was a focus in Satoshi’s whitepaper.
I found Gavin Andresen to cluster close to Satoshi’s samples quite consistently even when I tried different techniques. I found it fascinating that even his Twitter page, which I copy-pasted arduously till I reached the 3,000 word mark, loaded onto the Satoshi paper! I also used his GitHub documents and blog posts as a corpus.
I also tested the corpus with 5,000 4-gram features with random sampling to see if I would get similar results. The 4-gram feature is a sequence of 4 characters and is a more fine grained measure than word frequencies.
Eder suggests that n-grams can be a good way detect results in noisy datasets such as with textual errors, which can be found in OCR datasets. I also removed the Craig Wright dataset to simplify the results. I continued to find a definite cluster between Gavin Andresen and Satoshi Nakamoto.
Key words that distinguished Gavin from the other contenders included his use of the words ‘of’, ‘in’, ‘be’, and ‘this’. I put together a simple heatmap of the top MFW so that you can visualise how Gavin’s writing style fits like a glove with Satoshi’s. Gavin’s comparison to Satoshi is the first block on the left.
Andresen was a lead developer for Bitcoin since around 2010 and has been involved in cryptocurrencies ever since. In 2011 he was designated as lead successor of the project from Satoshi himself and continued to develop Bitcoin core. He also created the Bitcoin Foundation which started around 2012.
In late 2016 Andresen backed Craig Wright, an Australian business man, in his statement of being the real Satoshi. Andresen was also heavily featured in the Netflix documentary Banking on Bitcoin. He is still heavily involved in cryptocurrencies and gets involved in seminars and workshops.