Stack Evaluation Scorecard: The 12 Questions I Ask Every SaaS Vendor in 2026
Quick Answer
The vendor evaluation form most companies still use was designed in 2017. None of its criteria predict whether a tool works in an AI-augmented operation. The 12-question scorecard I actually use at Gardenpatch and The Cooling Co.
Get weekly growth frameworks — free
One tactical breakdown every Tuesday. Join The Growth Spurt.
● Key Topics
The vendor evaluation form most companies still use was designed in 2017. It scores features, demo quality, integration list, support response time, and price. Almost none of those criteria predict whether a tool will work in an AI-augmented operation. The scorecard needs a rewrite.
This post is the practical companion to the Tech Stack flagship — that piece names the six shifts, this one gives you the actual twelve-question scorecard I use to evaluate every vendor at both Gardenpatch and The Cooling Co.
What you can stop scoring
Before the new questions: the things that used to matter and matter less now.
UI quality. Still matters for the human exception layer, but the team isn't in the UI most of the time. Stop scoring polish; start scoring API surface area.
Onboarding experience. The vendor's onboarding flow was designed for a human user. Your team's onboarding to the tool is mostly agent configuration now. Demos and onboarding sessions are misleading proxies for real fit.
Customer support quality. Still important, but for a different audience — your agents won't be calling support. Score support quality as a tiebreaker, not a top-five criterion.
Integration list. The 2019 evaluation gushed about how many other tools the vendor integrated with. The list is now mostly fluff. Vendors that integrate with Zapier integrate with everything in the integration sense; the question is the depth of that integration, not the breadth.
The 12 questions
1. Is every operation in the UI also in the API?
This is the single most important question. If a tool has features the API can't reach, those features are unusable to your agent layer. A vendor that proudly demos a polished UI but ships a 60% API is shipping you 60% of a useful product. The answer to this question lives in the API documentation — not in the vendor's marketing copy.
2. Does the auth model allow per-agent credentials with minimum-scope permissions?
If your only option is a single team-wide API key with everything-or-nothing access, you can't safely give your billing agent access without also giving it access to customer records, exports, admin operations, and so on. Modern tools support fine-grained, scope-limited credentials (per-resource read/write, per-action permissions). Tools that don't are a structural risk.
3. Are events emitted as webhooks, or only available via polling?
Polling stacks up across many agents and burns rate limits. Webhooks let your agents stay quiet until something actually happens. Webhook-first tools are first-class citizens; polling-only tools force expensive architecture compromises.
4. What are the rate limits — and are they per-credential or shared?
Rate limits that are shared across the whole tenant mean one heavy agent can starve every other agent. Per-credential limits let you isolate workloads. If the vendor can't tell you the rate limit cleanly, that's a sign their infrastructure isn't built for the agent era.
5. Does the data model expose audit logs your agent can read?
When something goes wrong — and it will — you need to know what the agent did, what state it found, and why it made the choice it made. Tools that hide their audit trail behind dashboards (not API) make incident investigation expensive. Audit-log-via-API is a structural feature.
6. What's the read-write latency on the API?
Eventual consistency that took 90 seconds was fine in 2019 because humans were slow. Agents loop in milliseconds; a 90-second write delay creates race conditions and double-actions. Find out the latency of "I wrote this, can I read it back?" before you commit.
7. Are the error responses machine-readable with retryability hints?
"500 Internal Server Error" with no body tells the agent nothing. "429 rate-limited, retry in 30 seconds" or "409 conflict, fetch and re-apply" tells the agent how to recover. Tools that ship cryptic errors are tools that produce agent failures in production.
8. Does the SDK or API support batch operations?
An agent that wants to update 50 records in one logical operation should be able to send one batch call, not 50 individual calls. Tools without batch support force agents to either be slow (50 sequential calls) or fragile (no transactional guarantees across the batch).
9. What's the price model — per-seat, per-API-call, or hybrid?
Per-seat pricing made sense when humans were the users. With agents replacing seats, per-seat pricing either becomes very cheap (one operator seat handles what 8 used to) or very expensive (some vendors are charging per agent as a separate seat type). Per-API-call pricing aligns better with agent usage but can produce surprise bills. Hybrid is most common in 2026. Score what fits your usage pattern.
10. Can you export your full data via the API at any time?
Lock-in via "data hostage" is the oldest SaaS trick. In the agent era, it's also a deal-breaker because your agent rules and prompts are deeply coupled to the vendor's data shape. If you can't export, you can't safely commit. "Yes, we have a CSV export from the admin UI" doesn't count — you need programmatic data extraction.
11. What's the vendor's stance on agent / AI access?
Some vendors are agent-friendly (clear policies, supported, documented). Some are agent-hostile (terms of service forbid automated access at certain volume, captchas everywhere, IP blocking). Find out before you buy. A vendor that bans agent access is selling you a 2019 product even if the brochure says 2026.
12. Is there a public roadmap, and does it mention agent-specific features?
Vendors who are explicitly building for agents (event emission, fine-grained scopes, agent identities, observability primitives, batch endpoints) are betting on the same future you're operating in. Vendors who aren't are going to be displaced or have to scramble. Public roadmap is the cheapest signal of strategic alignment.
Run This, Don't Just Read It
Tech Strategy in the AI Era — A Playbook
The playbook version of what you're reading — rewritten for the AI era. 69 pages of exercises, scoring frameworks, and templates. Walk away with a complete action plan that accounts for your agents, not just your team.
How to score
Each question is 0/1/2. Zero is "no/missing/hostile." One is "partial/workable with effort." Two is "yes/excellent/aligned." Maximum score is 24.
My internal threshold at both companies: any tool below 16/24 is structurally unfit and not worth integrating no matter how good the demo. Tools at 16–19 are workable with explicit gap mitigation. Tools at 20+ become candidates for deep integration.
This scoring is harsh. Many popular SaaS tools score 12–14 today because they were architected before the agent era and bolted on an API afterward. That's fine for some workflows; it's a deal-breaker for any tool that becomes structural to your operation.
The vendors I've migrated away from since 2024
Without naming names: about 40% of the 2023 SaaS stack at The Cooling Co got replaced over 18 months. Not because the tools were bad in 2023 — they were fine — but because they scored 10/24 on the new scorecard. The replacement tools usually weren't as polished in the UI; they were leagues better in API surface, auth granularity, and event coverage.
The replacement was expensive in switching cost (see the Tech flagship on switching cost vs migration cost), but the agents we built on top of the new tools couldn't have been built on the old ones. That capability lift paid for the migration within a year.
The vendors we kept were the ones that aggressively shipped API improvements during the migration window. They saw what was happening and adapted. The vendors we churned didn't.
What this looks like in practice
Every new vendor evaluation at Gardenpatch or The Cooling Co now runs through this scorecard before any demo gets scheduled. Most demos don't happen because the API documentation alone tells me the tool isn't going to score well. That saves a lot of calendar time on both sides.
The vendors that pass the scorecard then get a demo, but the demo is half about the human exception layer (the small part of work humans still touch in the UI) and half about API examples (running real calls live, not reading documentation). A vendor that can't run real API calls in their own demo is signaling something.
Where to start
If your stack is full of 2019-shaped tools and you're not sure where to begin replacing them: take the 90-second AI-Era Operator Audit first — Tech Strategy is one of the six scored disciplines. If your tech score is low, that's where the highest leverage hides.
If you know tech strategy is the gap, the Tech Strategy in the AI Era playbook is the full 27-module version — including the complete vendor scorecard above, build-vs-buy decision frameworks, migration planning templates, observability blueprints, and the security architecture for agent-era operations. $27. Free 30-minute strategy call with me. Money-back in 30 days.
If you'd rather see the broader thesis first, the AI-Era Operator Manifesto lays out the nine beliefs underneath every playbook. Free, no email gate.
And if your stack is going to need work in multiple disciplines at once — the Complete Bundle is $99 for all six playbooks (saves $63 vs buying individually).
The vendor scorecard is the single highest-leverage hour of work most operators can do this month. Find out which of your tools wouldn't get re-bought. Then plan the replacements. The frameworks are here.