Skip to main content
Security

Infrastructure Security Audit: What to Expect

What happens during an infrastructure security audit, how to prepare, and what the deliverables look like. A guide for engineering leaders at growing SaaS companies.

Illicus Team · · 9 min read

A security audit is one of those things that sounds straightforward until you’re the one being asked to schedule one. What exactly will the auditors look at? What do you need to prepare? And when the report lands, how do you turn a list of 40 findings into a prioritized remediation plan?

This guide covers what to expect from an infrastructure security audit: scope, process, common findings, and how to act on the output. If you want the broader infrastructure checklist that also covers reliability, cost, and operations, see The Infrastructure Audit Checklist.

What an infrastructure security audit covers

An infrastructure security audit is not a penetration test. A pentest tries to exploit weaknesses to demonstrate impact. A security audit assesses your controls, configurations, and practices against a defined standard or risk model.

Scope typically includes:

  • Identity and access management: Who can do what, under what conditions, with what audit trail
  • Network controls: Segmentation, ingress/egress rules, public exposure inventory
  • Data protection: Encryption at rest and in transit, key management, backup integrity
  • Secrets and credentials: How secrets are stored, rotated, and scoped
  • Logging and monitoring: What is captured, where it goes, how long it’s retained, and whether anyone reads it
  • Change management: How infrastructure changes are proposed, reviewed, and applied
  • Dependency and supply chain posture: Base image hygiene, container scanning, third-party library risk

What it does not cover (unless explicitly scoped): application-layer vulnerabilities, social engineering, physical access, or red team exercises.

If you’re preparing for SOC 2 Type II or a customer security review, a security audit is usually the right starting point. It gives you a documented picture of your current posture before an auditor or prospect asks for one.

Pre-audit preparation

The difference between an audit that takes two weeks and one that drags on for six often comes down to preparation. Show up ready with:

Architecture diagrams: Current-state cloud diagrams showing VPCs, subnets, services, and trust boundaries. Stale diagrams are fine as a starting point: auditors expect drift: but having something beats building from scratch during the engagement.

Access credentials and read permissions: Auditors typically need read-only access to your cloud accounts (IAM, Security Hub, Config, CloudTrail) and, where applicable, your identity provider. Work with your security team to provision this before the kickoff call.

Runbooks and operational docs: Your incident response runbook, on-call rotation setup, and any documented break-glass procedures. If these don’t exist yet, note it: that gap will likely appear in the findings anyway.

Incident history: A log of significant incidents over the past 12 months, including what happened, how long it took to detect and contain, and what changed afterward. Auditors use this to calibrate detection and response maturity.

Current tooling inventory: What security tools are you running? GuardDuty, Security Hub, Falco, Snyk, Dependabot? Which are alerting to humans versus writing to a log no one reads?

You don’t need all of this to be polished. The audit will surface gaps. What matters is that you’ve gathered what exists so the auditors aren’t blocked waiting on access or context.

The audit process

Most infrastructure security audits follow a similar arc across four phases:

1) Discovery

Auditors map what’s actually running. This includes automated scanning (cloud configuration scanners like Prowler or ScoutSuite, dependency scanners, Dockerfile linting) and manual review of architecture and access patterns. The goal is to understand the real state of your environment, not just what the diagrams say.

This phase typically turns up the most surprises. Teams often discover forgotten environments, services with overly broad IAM roles that were “temporary” two years ago, or logging pipelines that are configured but writing to a bucket no one has looked at.

2) Assessment

Auditors evaluate findings against a risk model: often CIS Benchmarks, AWS Foundational Security Best Practices, or a custom framework aligned to your compliance requirements. Each finding is mapped to a control, given a severity rating, and documented with evidence.

This is also where context matters. An unencrypted development database that holds only synthetic test data is a different risk from an unencrypted production database holding PII. Good auditors ask about business context before assigning severity.

3) Analysis

Raw findings get filtered, deduplicated, and prioritized. A misconfigured S3 bucket that’s already behind a private VPC endpoint is a different risk profile from one that’s publicly accessible with no bucket policy. This analysis pass converts a list of issues into a risk-ordered set of recommendations.

4) Reporting

The final deliverable is a written report with an executive summary (suitable for sharing with a board or a customer security questionnaire), a detailed findings section, and a remediation guidance section. Good reports include evidence (screenshots, API responses, config excerpts) and reproducible steps so your engineers can verify fixes independently.

Common findings

After reviewing infrastructure across dozens of SaaS teams, certain findings appear consistently. Not because teams are careless, but because these patterns accumulate over years of iterative growth.

Overly permissive IAM roles: An EC2 instance that needs to write to one S3 bucket has s3:* on *. A Lambda function that reads from DynamoDB has AdministratorAccess because it was quicker at the time. These get found in every audit. The fix is almost always straightforward; the work is doing it systematically.

Unencrypted data at rest: RDS instances with encryption disabled, S3 buckets without default encryption, EBS volumes attached to instances that predate the team’s encryption policy. Often inherited from before encryption was default-on in AWS.

Missing or incomplete logging: CloudTrail enabled in one region but not all. S3 access logging turned off. VPC Flow Logs disabled. GuardDuty running but alerts routing to an email alias nobody monitors. The logging architecture exists but has gaps that would make incident investigation painful.

Public S3 buckets: Usually one of two failure modes: a bucket that was intentionally public for static assets but has Block Public Access disabled at the account level, creating risk for any future bucket, or a bucket that was made public for debugging and never locked back down.

Outdated base images: Docker images built from ubuntu:18.04 or node:14 that haven’t been updated in 18 months. The base image CVE count is often startling when first surfaced. The fix requires a build pipeline that pulls fresh base images on a schedule, not on a “when someone notices” cadence.

No secrets rotation: Database passwords set at launch, never rotated. API keys for third-party services stored in environment variables and unchanged for years. Secrets Manager or Vault is often already in the environment for some secrets, but coverage is partial.

Reading the report

When the report lands, the first instinct is often to start on the longest list. Resist that. The severity ratings are there for a reason.

Critical: Requires immediate action. A publicly accessible database with no authentication, an exposed AWS key with admin access in a public GitHub repo, an S3 bucket with customer data open to the internet. Stop what you’re doing.

High: Address within two to four weeks. These are real risks with real likelihood of exploitation or impact, but not actively on fire. Overly permissive IAM roles that could be used for privilege escalation. Missing encryption on a production data store. No MFA on accounts with production access.

Medium: Schedule into the next sprint or quarter. These are meaningful but lower-probability issues: a security group with an overly broad ingress rule on a port that isn’t actually in use, or logging gaps that slow investigation but don’t create direct exposure.

Low / Informational: Track but don’t rush. Configuration drift, missing documentation, recommended practices that aren’t blocking anything today.

One term that trips people up: “risk accepted.” This means a finding has been reviewed and a decision was made that the risk is acceptable given business context or compensating controls. It is not the same as “we haven’t gotten to it yet.” If you mark something risk accepted, document who made that decision, why, and when to revisit it. Undocumented “risk accepted” is just technical debt with extra steps.

A practical approach: take the critical and high findings, assign an owner and a target date for each, and put them in your issue tracker before the end of the week the report arrives. Medium and low findings go into a tracked backlog. Don’t let the report become a PDF that ages in someone’s Downloads folder.

After the audit

The report is not the end. What you do with it determines whether the engagement was worth the time.

Remediation planning: For each high-severity finding, create a ticket with the evidence from the report, the remediation steps, and a definition of done. This lets engineers work independently without going back to the report repeatedly.

Retesting: Most engagements include a retest window: typically 30 to 60 days after the initial report. Auditors verify that critical and high findings have been addressed. Take this seriously. Remediating 90% of criticals and then closing the engagement is a different outcome than addressing them all.

Building ongoing practices: The most durable outcome of a security audit isn’t the fixes you make immediately: it’s the practices that prevent the same findings from reappearing. Automated IAM policy linting in CI. A quarterly review of S3 bucket policies. A rotation schedule for long-lived credentials. A base image update pipeline that runs on a cadence.

Teams we’ve worked with often find the audit most valuable not for what it surfaces (most findings aren’t surprising in hindsight) but for the prioritization: now you have a defensible, evidence-backed list of what to fix in what order. For engineering leaders trying to justify security investment to a board or a CFO, that documentation matters.

One real example: a team building on Kubernetes adopted structured container scanning and base image update automation as a direct result of their audit findings: going from ad-hoc updates to a repeatable process. See how that played out in the Kubernetes adoption case study.


If you’re preparing for an infrastructure security audit or want to understand your current posture before a customer security review, our infrastructure audit service covers security alongside reliability, cost, and operational practices in a single engagement. Get in touch to talk through what’s right for your situation.

Need help with this?

We help engineering teams implement these practices in production—without unnecessary complexity.

No prep required. We'll share a plan within 48 hours.

Book a 20-minute discovery call