How secret are your secrets?
You have a bunch of machines and a bunch of credentials they (and no one else) need access to, so you use a secrets store to store the credentials and distribute them as needed.
Now only the machines that need a specific secret can access it, so you know the credentials are safe — or are they?
The whole concept of secrets in software is weird. We run systems composed of millions of lines of code that we've never looked at, expect to be attacked and hope to be resilient against the attackers, and generally can't trust the machines that run our code to be secure. How can we expect to keep anything important secret in this environment?
I'd argue that we can't, and that most secrets stores are little better than distributing text files containing credentials to machines. As an industry, we've couched our practices around secrets in a set of security jargon (audit log, encryption, identity-based access, etc) that allows us to ignore the glaring holes rather than raise our standards. I think we can do better, and will discuss some places where current approaches go wrong as well as a few improvements and alternatives we should be using more broadly.
Secrets in a low-trust environment
If you have a secure system where you know what code is running and don't allow humans (including attackers) to do arbitrary things, our current approaches to secrets work pretty well. We've moved to identity-based access, ensuring machines (or services on them) provide a proof of identity when accessing secrets and allowing us to restrict secrets access to the minimal set of services that need them. We can keep an audit log of all times secrets are accessed, allowing us to automatically remove access permissions from services that stop needing them, as well as detecting behavioural anomalies around access patterns. We even use encryption, both in transit and at rest, to make it harder of attackers to intercept our secrets. Unfortunately, our original assumption rarely holds.
We're typically writing code that runs in a significantly lower-trust environment than hoped for. We use copious amounts of dependencies, an operating system that is wildly insecure1 (and probably always will be), and are constantly under attack from well-funded attackers. Once an attacker can execute code on our machines, they can swiftly pivot to root, and all bets around secrecy of our secrets are off.
Let's reexamine the three previous things we mentioned our secrets stores can do, but this time under the context of an attacker with root access to a machine. The attacker can prove their identity to the secrets store — they can obviously prove to be the machine, but can also pretend to be a specific service if needed. This allows the attacker to get access to their desired secrets in a way that's likely indistinguishable from the machine's standard behaviour, bypassing the fancy encryption controls. The audit log is more interesting — depending on which service the attackers are impersonating, we may be able to detect anomalies around the secret access and reveal their presence.
Luckily (for them at least), the attackers probably don't even need to hit our secrets store! Because they have root access, they can peek directly into our service's memory and extract the secret without leaving any trace. Our services could mitigate this some by dropping their copies of secrets after they're done using them, and memfd_secret() gives us some tools to make this harder, but neither of these approaches are foolproof or always applicable.
Overall, it's impossible to ensure things are kept secret in an environment where attackers have root permissions, and most of the industry runs their code in such an environment. Without revamping all of our underlying systems, what can we do about this?
Smaller scopes
A general philosophy in security is to reduce the scope of systems. If a machine is hacked, it's much worse for it to have access to all databases than to just one of them. If a customer support agent is compromised, the fallout will be much smaller if they can only operate against a few customers or types of data rather than everything. However, this principle often goes out the window with secrets. We'll talk about using principle of least privilege to give access to a Github secret to only the services that need it, then turn around and create said Github secret with full admin credentials. This is made even worse by some vendors with insane permission levels — because you want the ability to perform one innocuous action, the vendor requires you to have permission to all actions.
That being said, probably the easiest way to reduce the impact of compromised secrets is to reduce what the secrets can do. When possible, you should create secrets with the minimal set of permissions they need. It's a lot cheaper to create a handful of similar secrets with varying permissions than it is to recover from someone abusing an over-permissioned one. When this isn't possible, we should push the problematic vendors (cough Okta, Hashicorp, Github, etc) to improve the granularity of their access control models, at least for programmatic actors.
Rotate early, rotate often
A common scenario is for a secret to contain an API key to some other service. An attacker that gains access to said secret will want to exfiltrate the secret to a machine they own, then abuse your access in some fashion. Using the extracted secret from their own machine makes it easier for them to play around and explore with it, and makes it less likely that they'll be discovered inside your infrastructure.
An easy way to mitigate this is to have any credentials stored in secrets expire and be rotated frequently. If a secret expires after 15 minutes, that practically forces an attacker to stay present and visible on your infrastructure in order to use your credentials. AWS Secrets Manager exposes a pretty convenient system for doing this, and whichever secrets store you're using probably does too. Unfortunately, while it's feasible to implement this right now, it can be pretty annoying, and feels like too much work to foist onto arbitrary developers that just want to store credentials securely.
I'd like to see the industry progress here — vendors should make it easy to rotate credentials (and hard to not rotate them!). All vendors should expose an API that takes some seed credential (acquired once by a human), which is then used to generate actual credentials. If this was standardized enough, secrets stores could store only the seed credential, then automatically fetch temporary credentials whenever secrets are requested by a client. In the meantime, consider implementing something like this for credentials to services that are either important or prolific with their credentials.
Building high-trust environments
We've already talked about how we can't trust most of our machines, but maybe we can trust some of them. If we had machines capable of running small amounts of code that we trust, we could safely use our secrets store from those machines. Luckily, we have various options here! This doesn't directly solve our problems — we don't want small amounts of code we trust to access secrets, we want our services to! However, it gives us a base we can build a more secure approach on top of.
In the common scenario for secrets, we're storing API keys for some service. Machines fetch a secret containing credentials for some service, then build an HTTP request that uses the credentials as some form of authentication — probably via the dubiously named Authorization header. This might look something like the following:
$ TOKEN=$(aws secretsmanager get-secret-value --secret-id fakevendor-admin-creds | jq .SecretString)
$ curl -H "Authorization: Bearer ${TOKEN}" https://api.fakevendor.com/whoami
{'user': 'admin'}
Rather than having the machine make this request directly using the credentials, let's create a proxy service that's able to inject credentials into requests2. This allows us to only give the proxy service access to the credentials, rather than the machine that previously had access. From our machine, a request under this new approach is pretty simple, since the proxy handles all of the interaction with secrets:
$ curl -H "Authorization: INTENTIONALLY_LEFT_BLANK" https://fakevendor.secretsproxy.internal/whoami
# The proxy intercepts this request, validates the sending machine,
# overwrites the Authorization header, and forwards it to the original API.
{'user': 'admin'}
To the machine making the request, everything functions the same. However, attackers on our main machines can no longer extract secrets, since they're never able to see them in the first place. We've moved secrets access from a machine we don't really trust that's running huge amounts of code with a small proxy service which can be implemented in a few hundred lines of code. While the proxy service may have vulnerabilities, there's both a lot less code to exploit and a much smaller surface areas to do it with, making it far harder for malicious actors to steal secrets.
This approach has a few other notable benefits.
Auditability: Now that secrets can't be silently extracted via a malicious root user or side channels, we can trust our audit log. Additionally, we can associate some information about the request that was made, giving us the ability to not just see that a secret was used but also how it was used.
Smaller scopes: This can pair well with the Smaller Scopes proposal above. If you have credentials that are too powerful for the operations that are needed, the secrets proxy can be expanded to do validation on requests. For example, you could turn a read+write admin credential into a read-only admin credential by only allowing GET requests, or restrict it to certain operations by whitelisting specific API paths.
Network isolation: If you use some form of network isolation, where your machines can only talk to a whitelisted set of services, the secrets proxy can provide better isolation guarantees. For example, if github.com was whitelisted to allow machines to work with your codebase, nothing stops an attacker from using Github with their own credentials to exfiltrate important data from your environment by committing it to their repository. With the secrets proxy in place, your machines can only access Github with your credentials, rather than allowing the attacker to bring their own.
Combining the above
If we could combine the above approaches, I think the industry would be in a much better position around secrets. Rather than closing our eyes and pretending everything is secret, we'd be able to confidently say that for many of our secrets, only the machines that need access can ever use them. For secrets where a proxying approach is infeasible, we'd at least be able to say that attackers need to be active on our machines to use them due to their short lifetimes.
None of these ideas are novel or even particularly difficult to implement. However, making these approaches scale across all credentials will require vendors to support some new features, like standardized credential rotation and smaller scopes. If you're in a position where you're implementing an API with credentials, please keep these ideas in mind!
Footnotes
[1] You know something has gone wrong when it costs more to buy a car than a local privilege escalation exploit that'll work against billions of machines.
[2] I don't see any published prior art for a secrets proxy like this, so credit for showing its viability goes to Eli Skeggs, who got the original idea from Trey Tacon.