Selected builds, systems, and tradeoffs. Raintree product pages live at raintree.technology.
Selected projects
17 projects
Agent Safety & Evals(4)
Apps(4)
Context & Developer Tools(3)
Data & Intelligence Systems(1)
Product SaaS(2)
Systems & Low-Level(1)
Music & Creative Systems(2)
Refund Agent
RA
Agent safety / personal case study
Policy engine controls payouts
What I built
Policy-bound AI support demo where code, not the model, makes binding refund decisions.
hard gate
Policy engine
observability
Trace UI
idempotency
Stripe
policy tests
Vitest
Stack
Next.js 16
React 19
TypeScript
Vercel AI SDK
Claude
Zod
Stripe
Vitest
Runtime
Customer chat posts to a Next.js API route that runs a raw function-calling agent loop with typed tools.
The only payout path is `issue_refund`, which re-runs the policy engine before touching payment or in-memory order state.
Admin dashboard reads trace API output to inspect every step, tool call, and binding decision.
Data model
Synthetic CRM includes 15 customers, order histories, policy branches, threshold cases, already-refunded orders, digital goods, final-sale items, and delivery-window cases.
Written refund policy is mirrored into typed policy thresholds and deterministic eligibility results.
Trace records capture assistant text, tool inputs, tool outputs, blocked calls, token usage, latency, retries, finish reason, and final decision.
Techniques
Model instructions are treated as soft guidance while policy-engine checks are the hard guarantee.
Tool-layer ownership checks prevent a customer from seeing or refunding another customer's order.
Stripe refunds use idempotency keys when real payment credentials exist; otherwise the demo falls back to simulated payouts.
Admin auth, per-IP rate limiting, hashed user identifiers, and email scrubbing guard the demo surface.
Verification
Prompt-injection scenarios still fail because invalid refunds are blocked at tool execution.
Scenario matrix covers approved, escalated, denied, already refunded, not delivered, expired, final-sale, digital, gift-card, and unknown-order paths.
Vitest policy-engine tests validate the binding decision layer.
Trace UI exposes every tool input/output, token cost, latency, retry, blocked call, and binding decision instead of hiding the agent loop behind a chat transcript.