For every user who has wondered “did someone backdoor this?”, there should be a developer ensuring that the code they released can be verified, and tampering detected. Package maintainers and users must also exercise diligence in order to avoid running untrusted code. This article walks through each step in the chain of custody between a hypothetical open source software project’s developers and users, covering vulnerabilities that can occur at each step, as well as some ways to mitigate them. Targeted at moderate to experienced Linux users who may never have been deeply involved in distributing code or binary packages through experienced developers and package maintainers.
Susan Sons is a Senior Systems Analyst at the CACR and serves as the staff director of the Internet Civil Engineering Institute, a nonprofit dedicated to supporting and securing the common software infrastructure we all depend on. Starting in December of 2015, Sons began penning a semi-monthly article for Linux Journal. Her first in this series, “Chain of Custody,” discussed points of vulnerability in the path software takes from developer to consumer, and how those points may be secured. It is our pleasure to feature it on The CACR Supplement.
There’s a great deal to be said for secure coding practices. However, if the program the user receives is not the one the developer created–complete and unchanged–those secure coding practices may not matter. In this article, we’ll follow the paths that a hypothetical piece of software, foobard, may take from its development team to its users, describing how that path can be exploited and how it can be protected.
Alice and Bob are great at coding. They maintain a robust test suite and only accept patches that pass all tests. They regularly fuzz test the application as a whole, and use static analysis tools to alert them of potential flaws in their code. Their architecture is extremely well thought-out, and their choices in dependencies are sane. Throughout these examples, we’ll assume that foobard, as written by Alice and Bob, does not present any unknown security risks. Unfortunately, there are many places that this can fall apart before foobard reaches the user.
Alice and Bob are using CVS to maintain foobard. After all, it’s not a huge project and CVS is what they have always used. The server that hosts the foobard CVS repo, however, was compromised and began serving up spyware tarballs on one of its web pages. Alice and Bob don’t know exactly what access the attacker achieved, or how long ago the compromise happened, so they can’t trust their server backups to have an unmodified copy of the repo. CVS offers no built-in integrity checking mechanism for the code itself, and modifying CVS history is trivial. Alice and Bob can try to cobble together—from whatever is stored on their laptops or in other theoretically-safe locations—enough data to spot-check the foobard repo and ensure that none of their code has been changed by the attacker. However, spot-checking provides little guarantee about the code’s overall integrity and won’t help reconstruct a full, known-good history.
If that sounds far fetched, consider that even the server owner may be the attacker. You may remember that popular open source host Sourceforge was caught changing a hosted project’s installer to install malware in addition to the requested software.
This could have been prevented if Alice and Bob were using a modern source code management tool such as git or mercurial, both of which use hashes to identify commits, and both of which allow code signing: in git, you gpg sign a tag, and in mercurial you gpg sign the manifest key in a changelog entry. In either case, that signature can be used not just to verify the integrity of one commit, but of that commit and all of its ancestors. This doesn’t mean there is no way to corrupt the authoritative repository on the server, but when best practices are used, it becomes astronomically difficult for an attacker to hide that corruption, requiring a timed compromise of multiple machines.
This protection, of course, relies in part on the secrecy of the private GPG key(s) used for signing tags (or manifests). If Alice or Bob loses a copy of such a private key, it must be revoked and replaced as soon as possible, before an attacker has had time to brute force the key’s passphrase.
Now that we have that sorted, with Alice and Bob migrating to git and tagging releases with GPG-signed tags, we’ve increased the security of one link in the chain. I’ll go so far as to assume that, having learned this lesson, Alice and Bob also learned to sign any release tarballs they offer. By changing these two practices, Alice and Bob have also mitigated some risks from unreliable DNS (when one can verify the code itself, one need not care if it came from the expected URL), and potential SSL issues (for the same reason: we’re checking the code not trusting its origin). Another member of the open source community, Carol, can now get a known-good copy of the foobard source. Of course, before she can use foobard, Carol needs to build it.
The build scripts for foobard include checking for, and if need be retrieving, several dependencies. While these dependencies were well-chosen, foobard’s build system will blindly retrieve and build these packages without checking their integrity at all. This is arbitrary code execution with the permissions of whatever user ran foobard’s build script. Users’ ISPs are already injecting ads into websites using their position between users and the internet, so there is no reason to believe that they (or a state actor, or a DNS registrar, or a router manufacturer, or a server compromise) will never cause you to grab something other than the dependencies you expected.
To solve this, Alice and Bob have two choices:
- Ensure that the build script exits with an explanatory error when a dependency is not found locally so that Carol can get dependencies in her usual, probably sane, way.
- Ensure that the build script does appropriate integrity checking of any dependencies it downloads, AND that any dependencies’ build scripts do the same, all the way down the dependency tree.
Let’s assume that Alice and Bob chose option one, as it’s by far the least laborious. Now, in theory, Carol can get a known-good copy of foobard and build it without running or installing software of unknown origin on her machine. This is good, because once the machine doing the compiling is compromised, the binary cannot be trusted (nor can anything else on that system). We are depending on either Carol or some tool she runs to check the signatures on the code she downloads.
Carol, it turns out, is a package maintainer for a binary Linux distribution. It doesn’t matter which one for our purposes. Now that she has gotten a known-good copy of foobard, gotten known-good copies of all relevant dependencies, and built foobard, Carol is packaging it up for a repository that will provide the prebuilt binary to thousands of users. She should, in turn, ensure that the packages she generates are signed before being passed on to package mirrors.
The state of things at the time of this writing (mid-September 2015) is that binary Linux distributions vary in how they check the integrity of the software that they package. Major distributions such as RedHat, Fedora, and Debian, for example, do cryptographically sign official packages, and their package managers reject packages with bad signatures. Gentoo uses a git-backed package management strategy which signs commit hashes rather than individual packages, achieving the same general effect plus protection of the package metadata and prevention of metadata replay attacks. However, the source code those ebuilds retrieve is not checked as far as I can tell.
None of these Linux distributions have published policies that I can find which would bar the signing and distribution of packages for code that was not signed by its developer, or that pulls in unsigned code or binaries at build time. In short, most package managers are verifying the authenticity of packages, but package management teams don’t seem to be differentiating between packages made from known-good code and packages made from code they cannot verify the integrity of.
To the best of my knowledge, current package managers still consider “valid code signing key’’ to be a binary property. That is, a code signing key is either considered valid by your package manager for signing any package, or not considered valid at all. As such, someone who maintains a portage overlay (or deb/rpm repository) with your favorite game in it could sign (or their compromised key could sign) binutils or sudo. So, a package maintainer who thinks their packages’ importance is not high enough to merit a diligent approach to information security may cause your system to replace crucial system utilities typically run as root, or capable of mediating root access.
Linux and other open source software is used around the world: in medical care, the power grid, the internet, and countless other bits of infrastructure that we rely on every day. Luckily, it’s possible to make the kinds of software supply chain attacks described here incredibly difficult to pull off. Doing so will take concerted effort by developers, distribution maintainers (both packagers and maintainers of the packaging systems), as well as users.
- Use a source control system with integrated integrity checking, such as git or mercurial, for managing all projects.
- Cryptographically sign each release in the source control system (via tag or equivalent), and each release tarball.
- Carefully safeguard their private keys: both code signing keys, and the SSH keys used to commit code.
- Rapidly revoke and replace keys that may be compromised. Remember: new GPG/SSH keys are free, the damage to your project’s reputation if compromised code goes out with your valid signature is irreversible.
- Ensure that the build system generates errors for missing dependencies, rather than blindly downloading and building them without integrity checking.
- Get their GPG keys signed by other developers, and in turn sign those developers’ keys, so that users have a better idea of which GPG keys to trust.
- Choose dependencies with similarly good distribution practices, and file bugs with dependencies that are not following these recommendations.
Linux Distributions Should:
- Use caution in obtaining source code for generating packages, checking that the code is signed by a trusted key and not building against any untrusted code such as something downloaded by the build system without integrity checking.
- Make contact with upstream developers when public git/mercurial history changes, to ensure that the change was expected and not a sign of tampering.
- File bugs with upstream developers who do not use modern source control systems and/or don’t cryptographically sign releases.
- Never accept packages that are not cryptographically signed by the package maintainer.
- Set a date to stop packaging code that was not signed by its development team, communicate that date upstream, and stick to it.
- Ensure that the package manager checks signatures on all packages it retrieves, and that it checks for revocation of package signing keys.
- Check the cryptographic signatures of any additional files that a package may download.
- Ensure that the package manager warns the user if a package’s integrity cannot be verified, either due to a failed signature check, or to the package relying on some resource (such as a proprietary blob from a third-party site) that is not signed at all.
- Design package management tools that allow a particular package signing key to be valid only for certain packages.
- Be suspicious of any program not signed by its developer (or package maintainer), whether that software is open source or being distributed as a compiled binary. Ideally, one would never run unsigned code at all. However, in applications that are not life-critical, one may need to compromise at minimizing the amount of unsigned code in use, and not running unsigned code as root.
- Exercise due diligence in obtaining source code to compile: check that the code is signed by a reasonably trusted key, and does not download anything at build time without authenticating it.
- File bugs with developers who do not use modern source control systems and/or do not cryptographically sign releases.
- Not enable package repositories if those repositories’ maintainers are not signing packages, or the maintainers’ keys can’t be verified.
Some of these things are being done most of the time, and the overall picture is improving. Running software inevitably involves trust, as no one has both the time and the skill to audit every piece of code running on their system(s). We can do a better job of making sure that we only trust code that came from the people we think it came from.
Susan Sons, Center for Applied Cybersecurity Research, Indiana University
Originally published in the December 2015 issue of Linux Journal.