Apr 9, 2024

We discovered an AWS access vulnerability

It’s not AWS.

There’s no way it’s AWS.

It was AWS.

We use AWS IAM extensively throughout our codebase. Last year, we extended our use of IAM to build and enforce role-based access control (RBAC) for our customers using AWS Security Token Service (STS), an IAM service you can use to provide temporary access to AWS resources. Along the way, we discovered a vulnerability in STS that caused role trust policy statements to be evaluated incorrectly.

Yes, you read that right – during the development process, we found an edge case that would have allowed certain users to gain unauthorized access to their AWS accounts.

We caught it before rolling out our RBAC product, and AWS has since corrected the issue and notified all affected users, so you don’t need to hit the panic button. However, we wanted to share how we discovered this vulnerability, our disclosure process with AWS, and what we learned from the experience.

How Stedi uses IAM and STS

To understand how we found the bug, you need to know a bit about Stedi’s architecture.

Behind the scenes, we assign a dedicated AWS account per tenant – that is, each customer account in the Stedi platform is attached to its own separate AWS account that contains Stedi resources, such as transactions and trading partner configurations. Our customers usually aren’t even aware that the underlying AWS resources exist or that they have a dedicated AWS account assigned to them, but using a dedicated AWS account as the tenancy boundary helps ensure data isolation (which is important for regulated industries like healthcare) and also eliminates potential noisy neighbor problems (which is important for high-volume customers).

When a customer takes an action in their account, it triggers a call to a resource using a Stedi API, or by calling the underlying AWS resource directly. One example is filtering processed transactions on the Stedi dashboard – when a customer applies a filter, the browser makes a direct request to an AWS database that contains the customer’s transaction data. This approach significantly reduces the code we need to write and maintain (since we don’t need to rebuild existing AWS APIs) and allows us to focus on shipping features and fixes faster.

To facilitate these requests, Stedi uses AWS STS to provide temporary access to AWS IAM policies, allowing the user’s browser session to access their corresponding AWS account. Specifically, we use the STS AssumeRoleWithWebIdentity operation, which allows federated users to temporarily assume an IAM role in their AWS account with a specific set of permissions.

IAM tags

Our IAM role trust policies use tags to control who can view and interact with resources.

A tag is a custom attribute label (a key:value pair) you can add to an AWS resource. There are three tag types you can use to control access in IAM policies:

Request: A tag added to a resource during an operation. You can use the aws:RequestTag/key-name condition key to specify what tags can be added, changed, or removed from an IAM user or role.
Resource: An existing tag on an AWS resource, such as a tag describing a resource’s environment (“environment: production”). You can use the aws:ResourceTag/key-name condition key to specify which tag key-value pair must be attached to the resource to perform an operation.
Principal tag: A tag on a user or role performing an operation. You can use the aws:PrincipalTag/key-name condition key to specify what tags must be attached to the user or role before the operation is allowed.

Assuming roles

Here’s how we set up RBAC for Stedi accounts.

We give Stedi users a JSON Web Token (JWT) containing the following AWS-specific principal tags:

"https://aws.amazon.com/tags": {
    "principal_tags": {
      "StediAccountId": [
        "39b2f40d-dc59-4j0c-a5e9-37df5d1e6417"
      ],
      "MemberRole": [
        "stedi:readonly"
      ]
    }
  }

The user can assume a role (specifically, they’re granted a time-bound role session) in their assigned AWS account if the following conditions are true:

The token is issued by the referenced federation service and has the appropriate audience set.
The role has a trust relationship granting access to the specified StediAccountId.
The role has a trust relationship granting access to the specified MemberRole.

The following snippet from our role trust policy evaluates these requirements in the Condition object. For example, we check whether the StediAccountId tag in the JWT token is equal to the MappedStediAccountId tag on the AWS account.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["sts:AssumeRoleWithWebIdentity", "sts:TagSession"],
      "Principal": {
        "Federated": {
          "Ref": "ProdOidcProvider"
        }
      },
      "Condition": {
        "StringEquals": {
          "tokens.saas.stedi.com/v1:aud": "tenants",
          "aws:RequestTag/StediAccountId": "${iam:ResourceTag/MappedStediAccountId}",
          "aws:RequestTag/MemberRole": "${iam:ResourceTag/MemberRole}"
        }
      }
    }
  ]
}

Caption: If the IAM role's resource tag for MappedStediAccountId and MemberRole matches the StediAccountId and MemberRole request tag (the JWT token principal tag), the user can access this role. Otherwise, role access is denied.

When assuming a role from a JWT token (or with SAML), STS reads the token claims under the principal_tags object and adds them to the role session as principal tags.

However, during the AssumeRoleWithWebIdentity operation (within the policy logic), you must reference the principal tags from the JWT token as request tags because the IAM principal isn’t the one making the request, instead the tags are being added to a resource. Existing tags on the role are referenced as resource tags because they are tags on the subject of the operation.

These naming conventions are a bit confusing – more on that later.

Discovering the vulnerability

We set up our role trust policy based on this AWS tutorial, using JWT tokens instead of SAML. Another difference from the tutorial is that our policy uses variables to reference tags instead of hardcoding the values into the condition statements.

For example, "${aws:RequestTag/StediAccountId}": "${iam:ResourceTag/MappedStediAccountId}" instead of "${aws:RequestTag/StediAccountId}": 39b2f40d-dc59-4j0c-a5e9-37df5d1e6417".

During development, we began testing to determine whether our fine-grained access controls were working as expected. They were not.

Finding the bug

Again and again, our tests gained access to roles above their designated authorization level.

We scoured the documentation to find the source of the error. The different tag types, IAM statement templating, and different (aws vs. iam) prefixes caused extra confusion, and we kept thinking we weren’t reading the instructions correctly. We attempted to use the IAM policy simulator but found it lacked support for evaluating role trust policies.

Eventually, we resorted to systematically experimenting with dozens of configuration changes. For every update, we had to wait minutes for our changes to propagate due to the eventual consistency of IAM. Four team members worked for several hours until we finally made a surprising discovery – the tag variable names affected whether trust policy conditions were evaluated correctly.

If the request tag referenced a principal tag called MemberRole in the JWT token, and the IAM role referenced a resource tag with the same variable name, the condition was always evaluated as true, regardless of whether the tag's values actually matched. This is how test users with stedi:readonly permissions in Stedi gained unauthorized admin access to their AWS accounts.

Changing one of the tag variable names appeared to fix the issue. For example, the snippet below changes the resource tag variable name to MemberRole2. The policy only functioned properly when the variable names for the request and resource tags were different.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["sts:AssumeRoleWithWebIdentity", "sts:TagSession"],
      "Principal": {
        "Federated": {
          "Ref": "ProdOidcProvider"
        }
      },
      "Condition": {
        "StringEquals": {
          "tokens.saas.stedi.com/v1:aud": "tenants",
          "aws:RequestTag/StediAccountId": "${iam:ResourceTag/MappedStediAccountId}",
          "aws:RequestTag/MemberRole": "${iam:ResourceTag/MemberRole2}"
        }
      }
    }
  ]
}

Caption: Initial IAM vulnerability workaround – ensuring request tag and resource tag names did not match.

Alerting AWS

We used the documentation to construct a model of the role assumption process and contacted AWS Support and AWS Security on June 20, 2023 with our findings. We also contacted Chris Munns, Tech Lead for the AWS Startups team, who engaged directly with AWS Security and escalated the issue internally.

AWS was initially skeptical that the problem was with STS/IAM, which is understandable – we were too. They first suggested that we used the wrong prefixes in our condition statements (aws vs. iam), but we confirmed the issue occurred with both prefixes. Then, they suggested that the tag types in our condition statements were incorrect. After some back and forth, we ruled that out as well, once again noting that the tag naming conventions for the AssumeRoleWithWebIdentity operation are confusing.

In the following days, we investigated the issue further and found we could trigger the bug with STS AssumeRole calls, meaning the vulnerability was not limited to assuming roles with web identity or SAML. We also found that hard-coding one of the tag values in the policy statement did not expose the vulnerability. Only role trust policies that used a variable substitution for both the request tag and the resource tag in the policy statement resulted in the policy evaluating incorrectly.

We implemented a workaround (changing one of the variable names), confirmed our tests passed, and kept building.

Resolution

On July 6th, we received an email from AWS stating that their engineering team had reproduced the bug and was working on a fix. On October 30th, STS AssumeRole operations for all new IAM roles used an updated tag handling implementation, which provided the corrected tag input values into the logic to fix the role evaluation issue. This same change was then deployed for existing roles on January 9, 2024. AWS typically rolls out changes in this manner to avoid unexpectedly breaking customer workflows.

AWS also discovered the issue was not limited to role trust policies, which are just resource policies for IAM roles (as a resource) – it also extended to statements within IAM boundary policies and SCP policies that contained the same pattern of STS role assumption with tag-based conditions.

AWS notified customers with existing problematic roles, SCP trust policies, and boundary policies that had usage in the past 30 days. They also displayed a list of affected resources in each customer’s AWS Health Dashboard.

Timeline

2023-06-20 - Role access issue discovered, AWS alerted
2023-06-21 - Minimal reproduction steps provided using STS assume role, AWS acknowledges report and the issue is picked up by an engineer
2023-07-06 - AWS acknowledges issue and determines root cause
2023-10-30 - STS tag handling implementation updated for new IAM roles
2024-01-09 - STS tag handling implementation updated for IAM roles for customers impacted in a 30-day window

What we learned

After we implemented our workaround, we conducted a retrospective. Here are our key takeaways:

Even the most established software has bugs.

This might seem obvious, but we think it’s an important reminder. We spent a lot of time second-guessing ourselves when discovering and diagnosing this bug. We were well aware of IAM’s provable security via automated reasoning, and the documentation is so comprehensive (and intimidating at times) that we were sure it had to be our fault. Of course, you should do your due diligence before reporting issues, but no system is infallible. Sometimes, it is AWS.

Glossaries and indexes are underrated.

Defining service-specific terminology in a single location can be game-changing for users onboarding to a new product and can dramatically speed up the debugging process.

We struggled to understand the difference between global condition keys with the “aws:” namespace and service-specific keys with the “iam:” namespace. We were further confused by how these keys can overlap; the “iam:ResourceTag” and “aws:ResourceTag” resolve to the same value. Finally, it was hard to keep track of the lifecycle from a jwt principal tag becoming a request tag before finally being a resource tag.

The AWS documentation provides all this information, but we lacked the proper vocabulary to search for it. A comprehensive glossary would have saved us significant time and effort. We’re now adding one to the Stedi docs to better serve our own users.

We need better tools for testing IAM policies.

The IAM policy simulator does not support role trust policy evaluation. Proving the security of a system to grant federated identities access to IAM roles continues to rely on both positive and negative end-to-end tests with long test cycles. Developing more mature tooling would massively improve the developer experience, and we hope AWS will consider investing in this area moving forward.

Thank you to all the Stedi team members who contributed to uncovering this issue and the AWS team for working with us to find a solution.

Twitter

The unwritten laws of engineering at Stedi

About

Company

Resources

Legal

Privacy Notice

Backed by

228 Park Ave S, PMB 58460, New York, NY 10003, USA

Stedi is a registered trademark of Stedi, Inc. All names, logos, and brands of third parties listed on our site are trademarks of their respective owners (including “X12”, which is a trademark of X12 Incorporated). Stedi, Inc. and its products and services are not endorsed by, sponsored by, or affiliated with these third parties. Our use of these names, logos, and brands is for identification purposes only, and does not imply any such endorsement, sponsorship, or affiliation.