In my last blog I gave a brief introduction to the scalability challenges we face in securing the IoT. Today I shall dive deeper into that, explaining how we address them from a systems design and implementation perspective.
We operate in secure data center environments – ‘à la Mission Impossible’, but quite less glamorous looking - in order to prevent key material mismanagement that could compromise the trust chain with disastrous consequences to the internet. Our limited real estate puts a firm upper bound on the number of systems we can run our platform on, our only alternative being scaling in two axes, combining computing approaches from the big iron days with modern, distributed architectures.
With hardware, the rule of thumb is that we get the most out of a single node before adding a second one. If, for instance, we hit a bottleneck with either CPU, memory or I/O, we first address it with a hardware upgrade of that node. If we then hit another (different) bottleneck later on, or if the cost of a second upgrade similar to the first one does not bring a sizeable benefit, we add a second node to spread the load.
Building platforms with limited sets of technologies from a handful of “traditional” vendors is common practice in many industries, including ours. For example, we have often seen various database systems smashed to pieces by orthogonal resource intensive workloads, leaving DBAs with little if any choice besides expensive hardware upgrades. We have avoided shoehorning every kind of data into the same store and opted for solutions specialized in time series, text search, queuing and event logging, respectively. Going one level below that, we use, for instance, on the fly filesystem compression at the database tier, where we have plenty of CPU to spare, considerably lowering our storage requirements and the I/O and that is only available in specific OS/filesystem combinations that we extensively benchmarked and tested for failure. We insist on using the right tool for each job despite the additional complexity cost, which we offset with complete systems automation.
As for our in-house software, we simply write the best code we can, using system resources conservatively. Our whole platform is asynchronous and non-blocking, avoiding many wasteful common situations such as recurring retry logic or numerous threads context switching in and out while mostly sleep waiting for synchronous calls to other nodes to complete. As we started from scratch, we were also able to go shopping with a list of features we wanted from our systems development language:
- High speed (statically typed, ideally compiled ahead-of-time)
- Good concurrency model
- Agility (fast compilation, good tools, simple dependency management)
- Portability (at least on POSIX operating systems)
- Reasonable complexity (good learning curve and not many obscure features)
- Maturity (stable language definition with backwards compatibility)
- Good community support (documentation, wide adoption, libraries)
Go vs. C, C++, D, Java, Scala and Rust
I have summarized the output of our selection process in the table below, skipping the full story that would have taken days to unfold:
|
C/C++ |
D |
Java |
Scala |
Rust |
Go |
Speed |
***** |
**** |
**** |
**** |
**** |
**** |
Concurrency |
** |
**** |
** |
**** |
**** |
***** |
Agility |
* |
** |
** |
** |
** |
**** |
Portability |
**** |
*** |
***** |
***** |
**** |
***** |
Complexity |
** |
*** |
*** |
** |
** |
***** |
Maturity |
***** |
** |
***** |
*** |
* |
***** |
Community |
***** |
** |
***** |
**** |
*** |
***** |
Why We Chose GoLang
After careful consideration, we settled on Go, which despite lacking some features we liked in other languages, it has provided a few strategic advantages:
- The concurrency model addresses the vast majority of our pressure points. The runtime handles I/O and multiprocessing under the hood in a manner that suits us very well, so we have been a lot more productive by doing away with boilerplate code and focusing on application logic.
- The nice learning curve makes it much easier to hire people with relevant skills. We can confidently say that any competent software engineer who has witnessed enough sophistication in C/C++/Java backend projects should be able to quickly pick up Go and become just as productive in this ecosystem in a very short time (days to weeks).
- The relatively low complexity of the language and the high quality of the standard library make it fun to use. We can advance from proof-of-concept to production grade code through simple refactoring, as opposed to doing the proof-of-concept in a high-level language such as Ruby or Python, then rewriting that in a low-level language for speed.
- The community bias works in our favor, at least for now and in the near future. Other language adopters build systems and solve problems quite similar to ours. Any kind of problem we have, someone else is bound to have had it already and put some thought into it, which is not always the case with more general-purpose languages. It also means that if we find a bug in Go itself (and we have), there is going to be enough push to get it fixed in a future release.
Will We Use GoLang for Everything at GlobalSign?
Most definitely not, as there are other areas of interest where we are better off with a more expressive language. However, for our core platform, which needs to run like clockwork, we are overall happy with using Go, even if we have to change our mindset from time to time.