Overhauling AWS account access with Terraform, Granted, and GitOps

As part of our engagements with clients, we need to access their AWS accounts to see what’s what. When we first started our work in 2019, we followed AWS’s recommendation of using role assumption, which worked great, but we had a ludicrously broad access scope on the policy attached: the AWS-managed ReadOnly policy plus a few specific grants that weren’t in the AWS-managed policy. We later scoped that down to the AWS-managed ViewOnly policy after some customer feedback about the scope, which worked just fine for us.

From time to time, some of our more security-conscious clients would scope our access down even more. I’ve always thought their changes were great and often implemented them into our standard setup. Lately, that got me thinking about how we could redo our access permissions entirely so that we had the absolute least access needed to do the job.

Internally, we started discussing a new problem we were facing: as we brought in more specialized contractors for one-off consultations on our client accounts, we needed a way to grant a person access only to the client accounts they were working on. Our existing setup worked well for our staff, who had access to all clients, but it didn’t really support limiting access on a per-client basis. We eventually cobbled together an MVP that required the manual creation of roles and policies, but the checklist to set up a new contractor was two pages long! Definitely not a good solution.

To make matters even more complicated, we had recently set up AWS SSO, but we were still using the built-in AWS identity store. And, on top of that, we found ourselves having to pass AWS CLI config files back and forth constantly.

Our method of managing access to client accounts really left a lot to be desired.

A new vision for the future

We ultimately decided to take a step back and really think through exactly what we wanted of a new approach. We came to these key needs:

Employee access and one-time contractor access should be managed the same way
Authentication and authorization to AWS should be governed by a single source: Duckbill’s Google user/group directory.
Use Granted.dev for assuming roles, and have a programmatic generation of the Granted config. No config files should ever be passed around again.
Read client access details (account ID, external ID, internal name) from a central database. These aren’t secrets and shouldn’t be treated as such.
Granting/revoking access should be easily done and traceable/auditable.
A user should only be able to access clients for which they have been explicitly granted access.

One look at this list and we quickly realized we were going to need some expert help.

An aside about confidentiality

One key thing we realized we had to decide on was the level of confidentiality of the data involved. We’ve always treated client data as the crown jewels, but different people on the team had different ideas about what constituted client data and what didn’t. We finally decided to properly define our levels of confidentiality, resulting in three levels: Public, Confidential, and Client Data.

Public is exactly as the name implies: the information is public information. This mainly applies to documents that contain content for a blog after it’s been published. For example, this document will be Confidential during editing and Public after it’s approved for release.

Confidential is a normal level of confidentiality. We don’t take steps to hide it from team members who aren’t working on clients, but we do expect it to be treated as non-public, not-to-be-shared information. For example, Metadata about a client’s relationship with Duckbill is a normal level of confidentiality. Metadata includes bits of information like account IDs, external IDs, the name of the client, and so on.

“Client Data” is a distinct category of confidentiality, requiring need-to-know. For example, the details of a client’s AWS spend is considered Client Data.

The way we talk about it in our client security packet is basically: “The existence of a meeting and the attendees is Confidential, while the content of the meeting is Client Data.”

Making this decision was a key issue for us to implement a solution here. Treating even the name of a client as Client Data would mean obfuscating everything, which could lead to a scenario where we’re accidentally granting the wrong access or assuming the wrong role–and all for very questionable upside. The potential downside from poor UX outweighed the upside of increasing the confidentiality. Remember, folks, security is always a bunch of tradeoffs.

Enter Chris Farris, IAM wrangler extraordinaire

We, of course, did the obvious thing and hired Chris Farris to design and implement a solution for us. I’ll let Chris tell the rest of this story!

Everything Mike and Corey sought fit well within the best practices I’ve implemented for my employers and clients: centralize your identities, use least-privilege roles, and grant access to humans only as necessary. The solution for Duckbill is three-fold.

Part one is a custom Terraform module and pipeline to create the required AWS Identity Center (hereafter referred to as SSO) elements for each client.

The second part is a simple way to ensure all the cloud economists are working from the same AWS config file and that the necessary account IDs and external IDs are communicated. Both part one and part were implemented as part of an automated CI/CD pipeline.

Part three is right-sizing the permissions to ensure that the security issues of ReadOnlyAccess and limitations of ViewOnlyAccess are addressed.

Part one: Terraform

With AWS SSO, access is granted via the confluence of three elements: 1) an identity and 2) a Permission Set are 3) assigned to an account.

Each Duckbill client config boils down to two input elements: the internal client name and the Duckbill users who should have access to the client. A Lambda function fetches the other critical details from Duckbill’s CRM, such as external_id and account_id.

When deployed, the terraform module will:

Invoke a Lambda function to look up the name, payer_account_id, and external_id from the CRM based on the provided client_id.
Create a new client_access group in AWS SSO.
Add the provided users to the client_access group.
Create the Identity Center Permission Set. This includes creating the role policy allowing the cloud economist to assume the DuckbillGroupRole if they pass the client’s unique external_id.
Assign the Permission Set to the SSO Group in the Duckbill Client Access AWS account.
Generate the Granted config based on the data from the CRM and commit that to the Duckbill Group’s Granted Registry in GitHub.

The AWS Identity Store is a managed user directory that offers a basic active directory feature set. If you read the identity store boto3 docs, all API calls require obscure identifiers. Doing anything via the command line is painful, so leveraging Terraform for all this is much more straightforward than scripting.

The Terraform code boils down to:

# Create the Group
resource "aws_identitystore_group" "client_access_group" {
  display_name      = var.customer_code
  description       = local.customer_name
  identity_store_id = var.identity_store_id
}

# Add members to the group
resource "aws_identitystore_group_membership" "members" {
  identity_store_id = var.identity_store_id
  count             = length(var.users)
  group_id          = aws_identitystore_group.client_access_group.group_id
  member_id         = data.aws_identitystore_user.users[count.index].user_id
}

# Create the permission set
resource "aws_ssoadmin_permission_set" "client_permission_set" {
  name             = var.customer_code
  description      = local.customer_name
  instance_arn     = var.instance_arn
  relay_state      = "https://s3.console.aws.amazon.com/s3/home?region=us-east-1#"
  session_duration = "PT6H"
}

# Add inline policy to the permission set
resource "aws_ssoadmin_permission_set_inline_policy" "client_assume_role_policy" {
  inline_policy      = data.aws_iam_policy_document.client_assume_role_policy.json
  instance_arn       = var.instance_arn
  permission_set_arn = aws_ssoadmin_permission_set.client_permission_set.arn
}

# Assign the Permission Set and Group to the Account
resource "aws_ssoadmin_account_assignment" "client_assignment" {
  depends_on         = [aws_identitystore_group.client_access_group]
  instance_arn       = var.instance_arn
  permission_set_arn = aws_ssoadmin_permission_set.client_permission_set.arn

  principal_id   = aws_identitystore_group.client_access_group.group_id
  principal_type = "GROUP"

  target_id   = var.duckbill_clientaccess_account_id
  target_type = "AWS_ACCOUNT"
}

Managing access

But now we’re left with a problem. We only know the payer account_id and not the account_ids of all the client’s non-payer accounts, which we need to run Duckbill’s analysis tooling. Since we don’t want to have to update the AWS SSO Permission Set each time a new client account is discovered, we can limit access in the cloud economist’s identity policy via the external ID like so:

{
    "Action": "sts:AssumeRole",
    "Condition": {
        "StringEquals": {"sts:ExternalId": "d61ea58c-foo" }
    },
    "Effect": "Allow",
    "Resource": "arn:aws:iam::*:role/DuckbillGroupRole"
}

In the above policy, the cloud economist can assume any DuckbillGroupRole, but only if the cloud economist passes the external_id that matches the client. If they pass a different external_id for another client, their identity policy will not allow the action. If they attempt to assume a role for a different customer but provide the wrong external_id, the client’s trust policy will deny the action. One advantage to this approach is that we don’t need to treat the external ID as a secret. As long as permissions to assume DuckbillGroupRole are locked down in the trusted Duckbill Client Access account, even with the external ID, a cloud economist cannot assume a role into a client account they’re not authorized for.

CodePipeline

GitOps is all the rage these days, but delegating these sensitive IAM Permissions outside the AWS account introduces an additional risk factor we wanted to avoid. When pushing files to S3 or deploying a Lambda, you can tightly scope the policies granted to your GitHub action. For our purposes, however, we are delegating the permissions to decide who has permissions, so extending the trust boundary beyond AWS into GitHub isn’t ideal for this scenario. Luckily, AWS has an underrated service that, while not as slick as GitHub Actions, does the job and keeps the scope of trust limited to just the AWS Identity Center account: CodePipeline.

The basic pattern is to create a CodePipeline with four stages. Stage one downloads the source, and stage two calls CodeBuild to run a Terraform plan. At stage three, the pipeline pauses and requires a human to review and approve the plan before executing the final stage: terraform apply.

Part two: Account IDs and external IDs

Granted

Granted is a tool to simplify accessing AWS accounts in a seamless manner, allowing a cloud economist to be logged into multiple AWS accounts in the same browser window thanks to browser containers. It supports SSO, chained roles, and much more. (Mike: we used to use aws-vault for this same purpose, but Granted is so much easier and feature-rich for our use cases.) Like aws-vault, Granted allows a cloud economist to either access an account in the CLI through setting the right session variables via STS or logging into the account in the browser, with a single CLI command for either. Certainly not something supported via the normal awscli means.

Rather than let each cloud economist roll their own configuration solution, Duckbill is leveraging Granted’s Profile Registries. The custom Terraform module centralizes the creation of config files for each client, and each person has a personalized config file they leverage when setting up the registry. We don’t keep the full set of client profiles in git, but rather just the basic configuration—the Granted profile registry sets up all the profiles dynamically based on what AWS SSO grants the cloud economist access to.

Google Workspace and AWS Identity Center

The Duckbill Group’s business is two-fold: cloud finance consulting and the media properties, each with separate staff. While the above solution is focused on the security of their consulting clients, the media side also requires certain staff and contractors to access AWS for specific uses that aren’t client related. The “traditional” method manages these by assigning users to groups inside the Google Workspace console. We needed to enable SCIM provisioning from Google Workspace to AWS SSO to provide this capability.

Sadly, AWS’s integration with Google’s identity store left something to be desired. While it was reasonably straightforward to configure the AWS SSO redirect to Google for authentication, creating users and groups in AWS SSO requires using the ssosync Lambda function from the AWS Labs GitHub.

Halfway through the project, AWS & Google released official SCIM support. Unfortunately, this turned out to be a half-baked integration. The Google-managed SCIM provisioning doesn’t support Google Workspace Groups! When SCIM is enabled, AWS console management of users and groups is mostly disabled. Under this new method, you can’t manage groups in Google or AWS unless you use the convoluted AWS IdentityStore APIs. That would be one hell of a yak shave, so hopefully AWS finishes their SCIM support in the future.

Part three: Properly scoping Duckbill’s permissions

Lastly, we needed to scope down the permissions of the role that gets deployed on the client side. Here, they were at the mercy of the AWS teams that manage the pre-canned AWS policies. ReadOnlyAccess is clearly over-permissive, and customers were right to ask for a more limited set of permissions. However, the AWS recommended alternative, the ViewOnlyAccess policy, is likewise insufficient.

Of the 261 AWS Services referenced in ReadOnlyAccess, only 150 are referenced in ViewOnlyAccess. There are over 110 services in ReadOnlyAccess that ViewOnlyAccess does not cover, and ReadOnlyAccess does not cover nine services present in ViewOnlyAccess.

As a third-party auditing a client, you need to see everything about the account and its resources but not read any data inside of those resources. For example, knowing how many objects are in an S3 bucket, their age, their access patterns, and their storage tier are important for analysis purposes, but we need to make sure the contents of the object can’t be viewed by Duckbill. Unfortunately, AWS provides no distinction between data and metadata in their Get, List, and Describe API calls, so we had to find them.

Rather than enumerate the IAM Actions a cloud economist would need—a likely never-ending task—I decided to identify which IAM Actions provide access to customer data and credentials and then explicitly deny those. As part of this, I created the Sensitive IAM Actions collection to provide a source for the cloud security community to define which actions provide access to data, expose credentials, or permit privilege escalation. Ian McKay, Kinnard McQuade, and Scott Piper had done a lot of work to identify permissions that led to privilege escalation, credential, and resource exposure, so I built off their work to generate a list of the permissions that allowed access to data.

Some IAM Actions are in a gray area: lambda:GetFunction is required to show a function’s runtime, memory, and duration, but that call also returns a pre-signed URL with access to the code zip file. The first three are critical for a cloud economist, but the code zip could be considered client-sensitive data. Denying access to that Action meant not being able to advise on Lambda, which would mean not being able to optimize Lambda, so we kept that action.

We’ve also carved out an exception in S3: A cloud economist needs access to the billing CUR reports, which are just S3 objects and an exception to the rule above. By customizing the policies with an Effect=Deny on a NotResource of the CUR bucket, the DuckbillGroupRole could access only the required data. (For my fellow security nerds, this pattern would work quite well for CloudTrail event logs, too!)

What’s next

We’re pretty thrilled with our new setup and certainly have a lot more confidence in our security around client access now. That said, there’s one big thing we’d like to do for the next iteration: auto-discovery and configuration of non-payer accounts. Our new setup only configures payer accounts, which handles 80% of what we need, but we do still need non-payer accounts for our automated tooling. We’re not sure yet exactly how we want to solve this, so for now, it’s a manual configuration.

Last but certainly not least, a huge thanks to Chris Farris for helping us sort out this mess. We had some pretty complicated requirements but I don’t think we stumped Chris even once. If you’re looking for help on AWS security, Chris is great. Check out his services at https://primeharbor.com.