Kenneth Kasuba
Director of Security, AI Research
When I took over application security at my previous company, the state of affairs was familiar: thousands of microservices, zero SAST coverage, a vulnerability backlog that engineering had learned to ignore, and a CISO who needed board-ready metrics by end of quarter. No AppSec team. No budget yet approved. Just a mandate and a blank whiteboard.
I've built AppSec programs from zero at three organizations now, ranging from a 200-engineer fintech to a 2,000-engineer enterprise SaaS platform. The playbook I'm sharing here is distilled from those experiences, the failures that taught me the most, and the frameworks that actually survived contact with reality.
Here's the operational playbook: specific tools, timelines, and the political navigation that no conference talk covers.
Week 1-2: Discovery Is Not Optional
The mistake I see most leaders make is jumping straight to tool procurement. They read a Gartner report, pick a SAST vendor, and start scanning everything. Three months later, they have 40,000 findings, zero credibility with engineering, and a pipeline that developers route around with every commit.
Discovery comes first. Always.
The Asset Inventory Nobody Has
In my experience, no organization I've joined has had a complete picture of their application portfolio. The first thing I do is build what I call the Application Security Surface Map. This isn't an architecture diagram. It's a living document that answers four questions for every service:
- What data does it touch? PII, PCI, PHI, credentials, or none of the above. This drives your risk tiering.
- Who owns it? Not the team name in a wiki that has not been updated since 2021. The actual humans who deploy it.
- What is the tech stack? Language, framework, dependencies, and build system. This determines which tools even work.
- What is the deployment model? Containers, serverless, VMs, third-party managed. This shapes your runtime security strategy.
I pull this from a combination of sources: CI/CD metadata, cloud resource tags, service mesh configurations, and a lot of conversations with engineering leads over coffee. If you are inheriting an environment with more than 50 services, plan for this to take the full first two weeks.
The OWASP SAMM (Software Assurance Maturity Model) is invaluable here. I run a lightweight SAMM assessment across three dimensions, Governance, Design, and Implementation, not as a compliance exercise, but as a calibration tool. It tells me where the organization thinks it is versus where it actually is. That gap is your roadmap.
Stakeholder Mapping: The Political Landscape
Before you write a single policy, you need to understand the political terrain. I map every stakeholder into four categories:
- Champions: Engineering leaders who already care about security. These become your force multipliers.
- Neutrals: They will support you if you don't slow them down. Most engineering managers live here.
- Skeptics: They have been burned by security teams before. Usually the most senior engineers. Win them over and your program succeeds.
- Blockers: Rare, but they exist. Often tied to legacy systems that can't be changed without significant investment.
Your first 30 days should convert at least two skeptics into neutrals. The way you do this is simple: fix something for them. Find a real vulnerability in their code, write the patch yourself, and hand it to them as a pull request. Not a Jira ticket. A working fix.
Week 2-4: Foundation: Tools, but the Right Ones
Now you pick tools. But you do it with the context from discovery, not from a vendor slide deck.
SAST: Start with Developer Experience, Not Coverage
Here is a controversial position: I no longer start with traditional SAST. In every program I've built recently, I start with Semgrep.
The reason is simple. Traditional SAST tools like Checkmarx are powerful: Checkmarx in particular has deep dataflow analysis and excellent taint tracking for Java and .NET: but they generate enormous volumes of findings, many of which are false positives in modern frameworks. When you are building trust with engineering in your first 90 days, a 15% false positive rate is a program killer.
Semgrep lets me start with high-confidence, low-noise rules that target the patterns I actually care about. I can write custom rules in an afternoon that match our specific framework patterns. Here is a real example: a rule I wrote to catch a common authorization bypass pattern in Express.js applications where developers forget middleware on sensitive routes:
rules:
- id: missing-auth-middleware
patterns:
- pattern: |
app.$METHOD($PATH, (req, res) => {
...
})
- metavariable-regex:
metavariable: $PATH
regex: .*(admin|internal|management|api/v[0-9]+/private).*
- pattern-not: |
app.$METHOD($PATH, requireAuth, ...)
- pattern-not: |
app.$METHOD($PATH, authenticate, ...)
- pattern-not: |
app.$METHOD($PATH, authMiddleware, ...)
message: |
Sensitive route '$PATH' lacks authentication middleware.
All admin/internal routes must use requireAuth or equivalent.
severity: ERROR
languages: [javascript, typescript]
metadata:
category: security
subcategory:
- vuln
confidence: HIGH
impact: HIGH
cwe: "CWE-862: Missing Authorization"This is the kind of rule that finds real bugs, not theoretical ones. In my last deployment, rules like this caught 23 genuine authorization gaps in the first week of scanning: issues that had been in production for months.
GitHub Advanced Security with CodeQL is my second-tier recommendation for organizations already on GitHub Enterprise. CodeQL's query language is more complex than Semgrep's, but its interprocedural analysis is genuinely superior for certain vulnerability classes, particularly SQL injection and path traversal in compiled languages. The trade-off is that CodeQL scans are slower and the rule-writing learning curve is steeper.
SonarQube occupies a different niche. I position it as a code quality tool that happens to have security rules, not a security tool. It's excellent for enforcing coding standards and catching maintainability issues, but I wouldn't rely on it as a primary SAST engine for a mature AppSec program. Its security rule coverage is broad but shallow.
SCA: The Fastest Win You Will Ever Get
If you do nothing else in your first 90 days, deploy Software Composition Analysis. In my experience, starting with SCA reduces mean-time-to-remediate by 40% in the first quarter, purely because dependency vulnerabilities have the clearest fix path: update the package.
Snyk is my default recommendation for commercial SCA. Its developer experience is best-in-class: the IDE integrations are genuinely good, the fix PRs are usually accurate, and the vulnerability database is well-curated. The licensing model can get expensive at scale, but for your first year, the speed of deployment justifies the cost.
Dependabot is free and already integrated if you are on GitHub. For startups and smaller teams, it is often sufficient. The limitation is that Dependabot's vulnerability database is less comprehensive than Snyk's, and it lacks the reachability analysis that tells you whether a vulnerable function is actually called in your code. That reachability signal is the difference between 200 SCA findings and 15 actionable ones.
Grype (from Anchore) is my pick for container image scanning specifically. It is open source, fast, and integrates cleanly into CI pipelines. I pair it with Syft for SBOM generation: a combination that gives you both vulnerability detection and software supply chain visibility without a commercial license.
DAST and IAST: Phase Two Tools
I deliberately don't deploy DAST or IAST in the first 90 days. This is not because they lack value, they are critical, but because they require a level of infrastructure maturity (stable staging environments, consistent test data, runtime instrumentation) that most organizations don't have when you first arrive.
Contrast Security for IAST is the tool I've seen produce the best signal-to-noise ratio in runtime analysis. Unlike DAST, which tests from the outside, IAST instruments the application and observes actual execution paths. The result is dramatically fewer false positives. The trade-off is deployment complexity: you are adding an agent to every application, which means working with every team's deployment pipeline.
Plan DAST and IAST for month 4-6, after you have established trust and have pipeline integration patterns that teams can copy.
Week 4-8: The CI/CD Security Pipeline
This is where your program becomes real. Everything up to this point has been preparation. Now you are putting security checks into the path that code takes to production.
The Golden Rule of Pipeline Security
I learned this the hard way at my second AppSec build: never ship a blocking gate on day one. Your first pipeline integration must be audit-mode only: scan, report, but don't block. Give teams 30 days to see the findings, understand the noise level, and build confidence that you're not going to break their Friday deployments.
After 30 days of audit mode, you enable blocking: but only for severity Critical and High, and only for new findings (not existing technical debt). This is the pattern I call progressive enforcement, and it is the single most important political decision you will make.
A Real Pipeline: GitHub Actions Security Workflow
Here is the actual GitHub Actions workflow I use as a starting template. This is not a toy example: this is production-grade, with caching, failure handling, and SARIF upload for GitHub Security tab integration:
name: Security Scan Pipeline
on:
pull_request:
branches: [main, develop]
push:
branches: [main]
permissions:
contents: read
security-events: write
pull-requests: write
jobs:
sast-scan:
name: SAST - Semgrep
runs-on: ubuntu-latest
container:
image: semgrep/semgrep:latest
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
run: |
semgrep scan \
--config "p/default" \
--config "p/owasp-top-ten" \
--config ".semgrep/" \
--sarif \
--output semgrep-results.sarif \
--error \
--severity ERROR
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: semgrep-results.sarif
category: semgrep
sca-scan:
name: SCA - Snyk
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Snyk
uses: snyk/actions/node@master
continue-on-error: true # Audit mode - remove after 30 days
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high --sarif-file-output=snyk.sarif
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: snyk.sarif
category: snyk-sca
secrets-scan:
name: Secrets Detection
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Gitleaks
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITLEAKS_LICENSE: ${{ secrets.GITLEAKS_LICENSE }}
container-scan:
name: Container - Grype
runs-on: ubuntu-latest
if: github.event_name == 'push'
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t app:${{ github.sha }} .
- name: Generate SBOM
uses: anchore/sbom-action@v0
with:
image: app:${{ github.sha }}
output-file: sbom.spdx.json
- name: Scan with Grype
uses: anchore/scan-action@v4
with:
image: app:${{ github.sha }}
severity-cutoff: high
fail-build: false # Audit mode
output-format: sarif
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
category: grype-containerNotice the continue-on-error: true and fail-build: false flags. That is audit mode. When you are ready to enforce, you remove those flags and add a PR comment step that explains to the developer exactly what failed and how to fix it. The developer experience of the failure message matters more than the scan itself.
Month 2-3: Threat Modeling and the Security Champion Program
Threat Modeling That Engineers Actually Do
I'm going to be direct: most threat modeling programs fail because they are built by security people who think every engineer wants to draw DFDs on a whiteboard for two hours. They don't.
The approach I use is heavily influenced by Adam Shostack's Threat Modeling: Designing for Security: still the definitive reference on the subject: but adapted for velocity. I call it Lightweight Threat Assessment, and it works like this:
- Trigger: Any PR that introduces a new service, a new external integration, a new authentication flow, or a change to data classification triggers a threat review. This is automated via PR labels and a bot that checks for infrastructure-as-code changes.
- Template: The developer fills out a one-page template: What are we building? What data does it touch? What can go wrong? (Using STRIDE categories but without calling them STRIDE: engineers respond better to "What if an attacker can..." than to "Consider spoofing threats.")
- Review: A security champion or AppSec engineer reviews asynchronously in the PR. Ninety percent of reviews complete in under 30 minutes.
- Artifacts: Threats become issues in the backlog, tagged and tracked. No separate threat model document that rots in Confluence.
This approach generates roughly one-fifth the overhead of formal threat modeling while catching the architectural issues that actually matter. In my last program, we conducted 340 lightweight threat assessments in year one and identified 47 design-level issues that no scanner would have found: including a token-forwarding pattern that would have allowed horizontal privilege escalation across tenant boundaries.
The Security Champion Program
Your AppSec team will never scale to cover every team. At the ratios I've seen work, one AppSec engineer per 50-80 developers, you need force multipliers. That is what security champions are.
Here is the structure I deploy:
| Tier | Role | Time Commitment | Requirements | Benefits |
|---|---|---|---|---|
| Bronze | Security Aware Developer | 2 hrs/month | Complete secure coding training, attend monthly briefing | Recognition, security swag, conference budget consideration |
| Silver | Team Security Champion | 4-6 hrs/month | Bronze + conduct peer code reviews for security, triage team findings | All Bronze + dedicated training budget, security tool early access |
| Gold | Security Guild Lead | 8-10 hrs/month | Silver + write custom security rules, lead threat modeling, mentor Bronze champions | All Silver + conference speaking opportunities, career path to security engineering |
The critical success factor is management buy-in for the time commitment. I negotiate this directly with engineering directors: security champion work counts toward performance reviews, not as extracurricular activity. If it is volunteer-only, it dies within two quarters.
Start with one champion per team of 8-10 developers. In my experience, you will get genuine volunteers from about 30% of teams. For the remaining teams, the engineering manager nominates someone: and surprisingly, those nominated champions often become your most engaged participants once they see the training and the tooling access they get.
Metrics: What You Measure and Who You Measure It For
The BSIMM (Building Security In Maturity Model) is the best external benchmark for understanding where your program sits relative to peers. I use BSIMM annually for board-level maturity reporting. But BSIMM is a lagging indicator. You need leading indicators for day-to-day management.
I maintain two metric dashboards. They share some data but tell very different stories, because the board and your engineering partners need different things.
Board-Level Metrics (Monthly/Quarterly)
Executives and board members care about risk posture and trend direction, not tool output. These are the metrics I present:
- Mean Time to Remediate (MTTR) by severity: Critical: target <72 hours. High: target <30 days. This is the single most important metric because it measures organizational response capability, not just detection.
- Vulnerability Density Trend: New critical/high findings per 1,000 lines of code, tracked monthly. The absolute number matters less than the trend. A decreasing trend means your preventive controls are working.
- Security Debt Ratio: Open vulnerability count weighted by severity and age, expressed as a percentage of total backlog. Boards understand debt ratios from financial contexts.
- Coverage: Percentage of production applications with active SAST, SCA, and DAST scanning. Target 80% by end of year one, 95% by end of year two.
- Third-Party Risk Posture: Critical findings in supply chain dependencies, SBOM completeness percentage, mean time to patch known-exploited vulnerabilities (referencing CISA KEV catalog).
Engineering Metrics (Weekly/Sprint)
Engineers care about signal quality and friction. These metrics keep you honest:
- False Positive Rate: Percentage of findings marked "not applicable" or "false positive" by developers. If this exceeds 20%, your tooling is eroding trust. In my programs, I target under 10% by month 6 through aggressive rule tuning.
- Mean Time to Triage: How long from scan finding to human decision (true positive, false positive, risk accepted). Target: under 48 hours for Critical, under 1 week for High.
- Pipeline Impact: Additional time security scans add to CI/CD. If you are adding more than 5 minutes to a PR pipeline, you need to optimize. Semgrep typically adds 30-90 seconds. Snyk adds 45-120 seconds. These are acceptable.
- Fix PR Acceptance Rate: For automated fix PRs (from Snyk, Dependabot), what percentage are merged without modification? This measures the quality of your automation. I've seen this range from 40% to 85% depending on the ecosystem: npm is typically higher, Java/Maven lower due to breaking changes.
- Security Champion Engagement: Active champions as a percentage of total, measured by participation in reviews, training completion, and rule contributions. Below 60% active means your program needs re-energizing.
The NIST Secure Software Development Framework (SSDF) provides an excellent mapping between these operational metrics and compliance requirements. If your organization needs to demonstrate SSDF alignment, increasingly common for government contract work, your AppSec metrics can do double duty.
The First 90 Days: A Week-by-Week Execution Plan
Let me synthesize everything above into the actual execution cadence. This is roughly what my calendar looks like for the first three months:
Weeks 1-2: Listen and Map
- Complete the application security surface map
- Run the OWASP SAMM baseline assessment
- Meet every engineering director and VP individually
- Identify your first three security champion candidates
- Audit existing CI/CD pipelines: understand what you are integrating into
Weeks 3-4: Quick Wins
- Deploy SCA (Snyk or Dependabot) in audit mode across top 10 repositories by risk
- Fix 5-10 real vulnerabilities yourself: hand the PRs to teams as introductions
- Present initial findings to CISO/CTO: frame as "here is what we found and here is the plan"
- Draft the AppSec policy (keep it under 5 pages or nobody reads it)
Weeks 5-6: SAST Rollout
- Deploy Semgrep in audit mode across the same top 10 repositories
- Write 5-10 custom rules targeting patterns specific to your tech stack
- Tune aggressively: review every finding from the first week, disable noisy rules
- Launch security champion Bronze tier: first training session
Weeks 7-8: Pipeline Integration
- Ship the GitHub Actions security workflow (or equivalent for your CI) as a reusable template
- Enable PR comments for security findings: make them actionable, not just "vulnerability found"
- Begin secrets scanning with Gitleaks or GitHub secret scanning
- First security champion office hours session
Weeks 9-10: Enforcement Begins
- Enable blocking for Critical findings on new code (not backlog)
- Publish the vulnerability SLA: Critical = 72 hours, High = 30 days, Medium = 90 days
- Set up the vulnerability management dashboard (I use Defect Dojo or a custom Grafana setup)
- Expand scanning to top 25 repositories
Weeks 11-12: Measurement and Reporting
- Deliver the first board-level AppSec metrics report
- Conduct retrospective with engineering leads: what is working, what is generating noise
- Plan phase two: DAST/IAST deployment, threat modeling rollout, full repository coverage
- Promote your first Silver-tier security champions
Common Failure Modes
I will close with the patterns I've seen kill AppSec programs. Avoid these and your odds of success increase dramatically:
The Big Bang Rollout
Deploying SAST, SCA, DAST, secrets scanning, and container scanning across every repository in month one. Engineering teams get flooded with thousands of findings, lose trust in the tooling, and start ignoring every alert. I have seen this pattern kill three AppSec programs. The fix is simple: start with 10 repositories, prove value, expand from there.
The Compliance-First Trap
Building the entire program around audit requirements instead of actual risk reduction. You end up with a program that can produce SOC 2 evidence but cannot catch a SQL injection. Compliance is a byproduct of good security, not the other way around. Lead with engineering value, and compliance evidence will follow naturally from the telemetry you are already collecting.
The Tool-Without-Tuning Problem
Buying an enterprise SAST platform with 2,000 default rules enabled and wondering why developers revolt. Every security tool ships with rules designed for maximum coverage, not maximum signal. Your first two weeks with any tool should be spent disabling rules that do not apply to your stack, writing custom rules that do, and tuning severity thresholds until the false positive rate drops below 15%. If developers do not trust the findings, the tool is useless regardless of its detection capabilities.
The Ivory Tower AppSec Team
Filing Jira tickets from a separate floor and expecting engineering teams to fix them. If you are not submitting pull requests, pair-programming fixes with developers, and attending sprint planning meetings, you are an auditor, not a partner. The most effective AppSec engineers I have worked with spend at least 30% of their time writing code alongside development teams. That is how you build the relationships that make a security program actually work.
The Bottom Line
Building an AppSec program from zero is not primarily a technical challenge. The tooling exists, the frameworks are mature, and the integration patterns are well-documented. The hard part is organizational: earning engineering trust, demonstrating value before demanding compliance, and building a feedback loop where security findings translate into better code rather than slower deployments.
The 90-day framework I have outlined here is not theoretical. It is the playbook I have executed across multiple organizations, refined through the failures and successes of each engagement. The specific tools will evolve, the CI/CD platforms will change, but the principles remain constant: start small, prove value fast, measure everything, and never forget that your customers are the engineering teams who have to live with your decisions every day.
If your organization is at zero today, you can be at a functioning, measured, developer-trusted AppSec program in 90 days. Not perfect. Not complete. But real, operational, and producing the kind of risk reduction that justifies continued investment.
That is the foundation everything else builds on.
Get security research in your inbox
AI security, cloud architecture, threat analysis. No spam.