Cloud Security: The Challenges with Key Management in the Cloud (and everywhere else)

This is part of a series of blog posts on cloud security by Carlos Cardenas, our Director of Solutions Engineering. Carlos is a security expert who came to Joyent from The Institute for Cyber Security ICS at the University of Texas, San Antonio. While at ICS, Carlos worked under Ravi Sandhu, PhD, one of the leading security experts in the world.

In my The Four Keys of Cloud Security, Confidentiality post, I mentioned some challenges of key management in the cloud. In this post, I talk more about key management, its issues in general, and how it affects cloud computing.

Background

Key management is the system or act of managing cryptographic keys in a system which includes: generation, destruction, replacement, use, and distribution. This is different from key scheduling as key management is only between the user and the system vs between internal components within the cryptosystem.

Key management is hard. In fact, it’s considered the hardest part to get right in cryptosystems as they not only encompass the various protocols but also system policy, user training, and humans (the most flawed part of the equation).

Issues with Key Management

While on the subject of key management complications, lets briefly talk about the main issues that arise with key management

General security issues
- Keys being stolen
- Keys being vulnerable to attack or compromise
Management of all keys
- Single point of failure
- Needs to scale linearly to handle lots of keys
Availability
- Allowing authorized users access to their data
Governance
- The policy that defines proper access, usage, and protection
- This is probably the hardest portion to get right

How it relates to Cloud Computing

Ok, so how does this affect you as a cloud computing user? Regardless of which provider you use, you have at the bare minimum a username/password combination that allows access to at least the user portal and either username/password or keys (typically ssh) to access your resources like compute instances, data storage, API access, etc…

In the following, we are using a traditional username/password for portal access and ssh keys for API and resource access (this is how Joyent performs authentication and authorization).

In order to use the user portal, you provide your credentials (passwords will be salted and hashed into a form of a cryptographic key) and these are provided to the authentication system for validation. If validated, the credentials are then used in combination with the action requested to the authorization system to ensure the user performing the action. This entire process is known as PEI (Policy, Enforcement, and Implementation) in security research circles.

To use the API or access a compute resource, ssh keys must be used. For API access, the request is signed using the private key and sent along with the request. The request is then validated using the public key of the account (authentication step). The action of the request is then checked (authorization) before completing the action. For ssh access using the ssh protocol RFC 4253, the standard protocol is used as is except when it comes to the key verification; for this step, an ssh gateway is used which takes into account the IP address of the resource. The IP address of the compute instance is indexed to the user’s account which has a list of authorized ssh public keys to be verified against.

What’s wrong with the scenario

Well, let’s review this setup using the Issues of Key Management as the foundation.

General security issues
- If you are using the same ssh keypair on all machines that you use, it only takes one of those machines to compromise all of your cloud resources. Let alone, not having the ability to know which machine made the request since the same keypair is used.
- Using a weak passphrase to protect your portal account is the same level of risk as using a weak passphrase on any other online site. Once the username is known, the attacker can then perform a brute force attack attempting all possible passwords based in their dictionaries or rules to try.
Management of all keys
- Each account could have N ssh public keys and indexing into the proper account and it’s key list needs to be fast.
Availability
- If the system goes offline, how does a customer ssh into their compute instances? Is there a key cache somewhere?
Governance
- If an account fails to successfully authenticate within X attempts, does the account become locked? (Prevent brute force attack described above)
- If the passphrase for an account is approaching a certain age, is the account restricted until the user changes their passphrase?
- If the same ssh public key is in X amount of different accounts, does this trigger an account lock down or extra validation step? (Prevent fraudsters from creating bogus accounts).

Given these issues, what can be done?

There are a couple ways to handle this situation: one policy based and the other is technology based.

Ever wonder why an employer or the DMV requires 2 forms of identification (exception being a passport which is based on multiple forms of identification)? It’s because 2 is better than 1. It’s hard to fake a driver’s license with matching social security card as the two independent systems need to corroborate with each other to provide verification.

The same is true for computer systems.

There’s a solution that exists called 2-Factor Authentication (2FA). Using this system, the standard authentication is used in conjunction with another, independent system using either a One-Time Passphrase (OTP), challenge response (OAUTH token, smart card, other hardware token), or verification with a phone either being a voice call or SMS. The 2FA system we use here at Joyent is DuoSecurity. Disclaimer, Duo is also a partner of Joyent Using Duo, we can use a mobile phone as the hardware validator using Mobile Push (Duo’s core differentiator against all others in this area), a voice call, SMS, OTP, or hardware token using the venerable Yubikey.

This technical solution combined with policy makes up a hardened solution that protects not only infrastructure but data as well.

For my next post, I’ll be talking about privacy and what is around the corner to provide next generation services while taking a hard line on leaking data.

Post written by Carlos Cardenas