please use static analysis

31 Dec, 2023

Running a static analyzer against your codebase is a very quick way to spot bugs. I'd guess that over half of all non-trivial codebases that don't use static analyzers contain serious bugs that could immediately be spotted by running one.

Here's a >2,000-star repo I grabbed off of GitHub's trending Go repos for today: https://github.com/francoismichel/ssh3. It describes itself as:

SSH3: faster and rich secure shell using HTTP/3
SSH3 is a complete revisit of the SSH protocol, mapping its semantics on top of the HTTP mechanisms. In a nutshell, SSH3 uses QUIC+TLS1.3 for secure channel establishment and the HTTP Authorization mechanisms for user authentication.

...

🔒 SSH3 is secure
While SSHv2 defines its own protocols for user authentication and secure channel establishment, SSH3 relies on the robust and time-tested mechanisms of TLS 1.3, QUIC and HTTP. These protocols are already extensively used to secure security-critical applications on the Internet such as e-commerce and Internet banking.

Here's what happens if you run Go's built-in static analyzer (no need to even install anything!) on the repo:

$ go vet ./...
# github.com/francoismichel/ssh3/util/unix_util
util/unix_util/non_password_auth_user.go:23:3: (*github.com/rs/zerolog.Event).Msgf format %s reads arg #1, but call has 0 args
util/unix_util/non_password_auth_user.go:28:3: (*github.com/rs/zerolog.Event).Msgf format %s reads arg #1, but call has 0 args

Lines 23 and 28 are both missing an argument, which means invalid IDs won't be logged:

// util/unix_util/non_password_auth_user.go
uid, err := strconv.Atoi(u.Uid)
if err != nil {
	log.Error().Msgf("could not convert uid %s into an int") // L23
	return nil, err
}
gid, err := strconv.Atoi(u.Uid) // L26
if err != nil {
	log.Error().Msgf("could not convert gid %s into an int") // L28
	return nil, err
}

A PR to fix this bug would probably draw attention to the much more serious bug on line 26, where a Unix user's UID (User Identifier) is mistakenly treated as their GID (Group Identifier) too. Skimming the repo, it seems like that wrong GID could end up in at least a syscall.Credential struct and an os.Chown call. Not great!

Someone might complain that go vet ./... didn't actually spot the GID bug — or, even more strongly, that static analyzers can't really infer design intent, making them unsuitable for spotting that class of bug anyway.¹ I think that's a fair criticism. However, I also don't think it's a coincidence that, out of thousands of lines of Go code, the only go vet errors were immediately next to a serious security mistake. A big chunk of static analyzers' value (when run in yet-to-be-analyzed repos) comes from pointing the developer towards "areas" of code that are under-tested, under-maintained, or poorly written. The file flagged above is nested under multiple util folders and only compiled if specific build flags — unix && (!linux || disable_password_auth) — are set. Developers' or maintainers' interactions with the codebase usually focus on particular design components much more than others (due to limited resources, project complexity, etc.), but static analyzers don't hold those preconceptions and can trivially scale in a way a human or team of humans can't.

If you work with a non-trivial codebase (if you're not sure whether it's trivial or not, it isn't) and don't use static analyzers yet, please start! Pick a static analyzer, run it in your repo, and fix each warning. For well-written code in a medium-size repo, this shouldn't take more than an hour, and you'll almost certainly get decent returns.

Appendix

Some good static analyzers² for Go code are:

analyzer	checks for	false positives
staticcheck	common mistakes	few
nilaway	potential nil panics	very many
gosec	security issues	very many
deadcode	unreachable code	none

I don't have enough experience with other languages' static analyzers to outright recommend any for production code (although I've really enjoyed Rust's Clippy), but it should be easy enough to Google a few. If you're considering using them at work, asking someone from your company's Security team is also a great idea.

There's a third complaint here that goes something like, "using separate types for UIDs and GIDs (rather than shared int) makes it hard to confuse them and obviates the need for static analysis". If that's appropriate for the problem and language you're working with, I think that's a great technique; however, there are definitely other static analysis checks that can't be eliminated via type systems alone.↩
I omitted golangci-lint — probably the most popular Go meta-analyzer — because I personally haven't found it very useful. It's agonizingly slow on giant (read: shitty enterprise) codebases and doesn't contain all the analyzers I typically intend to use anyway.↩

#code