AI Agents and Applications

You ship a multi step agent with tool use, retrieval, observability, and a safety story running on your stack of choice.

What is inside

AI Agents and Applications is coming soon.

Leave your email and we will tell you the moment it opens. You will also get the newsletter in the meantime: new lessons and the occasional essay, sent rarely.

What is inside.

The shape of the course, module by module. Open any module to read what it covers and the lessons inside it.

Foundations fast track

This module is a fast and friendly pass back through the Individuals and Organisations material, written for engineers, so that you set up your environment and pick up the mental model of treating prompts as software and evals as tests, which is the frame that carries the whole course.

01
Individuals course in 8 minutes
You will move quickly through the core mental model, the way grounding keeps hallucinations in check, and a simple taxonomy of tools, all framed for an engineer who wants the gist fast.
02
Organisations course in 12 minutes
You will get the business context, the baseline of Australian regulation that shapes what you build, and the six configuration layers, all seen through an engineering lens.
03
Setting up your environment
You will set up access to the Anthropic API, sort out key management and billing, lay down a minimal repository structure, and run your first hello world call against Claude.
04
Prompts as software, evals as tests
You will sit with the mental model that frames the rest of the course, which is treating prompts as software and evals as tests, and you will see why writing prompts on vibes alone tends to produce agents that are fragile.
05
What this course will ship
You will get a preview of the capstone you will build, a module by module map of where the course is heading, and a look at the optional Office add in branch, and you will commit to seeing the capstone through.

Programming and computer science foundations

Here you will pick up the mental models behind code that make every later module easier, covering how programs actually run, the data structures and algorithms that genuinely matter, concurrency, the systems underneath it all, and how to read code well in an age where a great deal of it is being written for you.

01
How code actually runs
You will build a clear picture of the runtime, the regions of memory, the event loop, and the boundaries a program runs against, along with a four question mental model you can apply to any program you read.
02
Data structures that matter
You will get to know arrays, hash tables, sets, stacks, queues, heaps, vectors, and trees, focusing on the small set of structures that genuinely earn their place in production AI code.
03
Algorithms in plain English
You will work through big O notation, sorting, binary search, hashing, recursion, greedy approaches, dynamic programming, and approximate algorithms, keeping to the patterns that actually show up in production.
04
Concurrency, async, and what blocks what
You will learn the difference between concurrency and parallelism, how the async and await pattern works alongside Promise.all and the event loop, and how streaming, race conditions, and cancellation play out in practice.
05
Systems thinking, memory, networks, files
You will build an intuition for the hierarchy of memory, disk, and network, the ways networks fail, filesystems that do not persist, clocks, processes, and what the CAP theorem really tells you.
06
Reading code well
You will pick up a three pass approach to reading any codebase, learn to lean on names, comments, and tests as the contract, and get comfortable reading code that an AI has written.

The Claude API end to end

You will get to genuine fluency with the Anthropic SDK, working through messages, system prompts, streaming, structured outputs, prompt caching, the batch APIs, and inputs like vision and PDFs.

01
Anthropic SDK: messages, system prompts, parameters
You will make conversations that span many turns with full control over the parameters, getting hands on with model selection, temperature, the maximum number of tokens, and how to hold conversation state.
02
Streaming responses
You will work through how responses stream over server sent events, how to parse partial JSON as it arrives, what streaming does to the feel of an interface, and the cases where streaming is not worth the added complexity.
03
Structured outputs and JSON mode
You will use tool calls as a way to get structured output, validate the results with Zod and Pydantic, and put in place patterns that retry cleanly when the model returns something invalid.
04
Prompt caching and batch APIs
You will learn how to cut token cost by somewhere between fifty and ninety percent on the right kinds of workload, using cache markers and the discount the batch API offers, as a first taste of engineering for cost.
05
Vision and PDFs
You will send image inputs and process PDFs, using the model in place of a separate OCR step, and you will weigh up the cost so you know when to reach for vision rather than a service like Textract.

Evals as a discipline

Pulled forward in the spirit of Anthropic Academy and Andrew Ng, this module treats evals as a real discipline, taking you through error analysis, custom judges that catch the problems that actually occur, trajectory evaluation for agents, evals that span many turns, and eval driven development as a working habit.

01
Error analysis: the workflow nobody teaches
You will learn the error analysis workflow that Hamel Husain teaches, reading around fifty traces, labelling them, sorting them into categories, and prioritising the fixes, and you will see why this beats almost every other way of improving a system.
02
Custom judges that catch real problems
You will see why a generic helpfulness judge catches almost nothing, learn to turn your real error categories into judge prompts, and keep your judges under version control as they evolve.
03
Trajectory evaluation for agents
You will learn why judging the final output alone misses so many agent bugs, and you will write rubrics that score each step and then build up to scoring the whole trajectory.
04
Multi-turn and long-horizon evals
You will see why an eval that looks at a single turn misses sycophancy that compounds and goals that drift, and you will build canned adversaries and regression suites that span many turns.
05
Eval-driven development as a discipline
You will put the whole loop together, wire your evals into continuous integration, run them quietly in the background as a shadow eval, and adopt the rule that nothing ships without an eval behind it.

Tools, MCP, and Skills

This module treats tool design as a craft and walks you through Model Context Protocol servers, Anthropic Skills, and subagents, so that you learn with confidence when to reach for each of these four primitives.

01
Tool design as craft
You will treat tool design as a craft, writing clear descriptions, keeping scope narrow, making tools idempotent, and returning structured errors, all measured against whether a junior intern could read the tool and use it correctly.
02
Model Context Protocol (MCP): connecting agents to systems
You will build and host a Model Context Protocol server in both Python and TypeScript, get to know its three primitives, and think carefully about the security that comes with connecting agents to real systems.
03
Anthropic Skills: procedural know-how as artefacts
You will learn how Anthropic Skills differ from system prompts and from MCP, write a Skill of your own, and come to appreciate what may be the most under appreciated primitive to arrive in 2025.
04
Subagents and delegation
You will learn when subagents genuinely help and when they do not, how delegation works as a primitive, what it costs, and the ways it tends to fail.
05
When to use which: Tools vs MCP vs Skills vs Subagents
You will follow a clear decision tree for choosing between tools, MCP, Skills, and subagents, working through real examples drawn from a CRM, a document workflow, code review, and a research agent.

Agent design patterns

You will go deep on workflow agents, autonomous agents, and the multi agent approach, building reflection, tool use, planning, parallelisation, chaining, and routing in working code, putting the six configuration layers into practice, and reaching for frameworks last rather than first.

01
Workflow vs autonomous vs multi-agent (deep)
You will implement workflow agents, autonomous agents, and the multi agent approach from scratch, and feel the trade offs between cost, capability, and reliability in your own code.
02
Reflection, tool use, planning (Andrew Ng canon)
You will implement the four canonical patterns that Andrew Ng describes, reflection, tool use, planning, and the rest, and learn to tell when each one earns its keep and when it only adds cost.
03
Parallelisation, chaining, routing
You will compose agents into richer workflows by combining patterns, ending with a routed workflow that fans work out in parallel and brings the results back together.
04
The six configuration layers in working code
You will put the six configuration layers into working code, covering task scoping, whitelisting tools, structured input and output, cost budgets, wiring in an eval harness, and observability, and then audit your own agent against them.
05
Frameworks: when to use which (and why)
You will get to know LangGraph, CrewAI, the OpenAI Agents SDK, Pydantic AI, and the Claude Agent SDK, and come to see a framework as a way of compressing patterns you already understand rather than a shortcut around understanding them.

Retrieval and context engineering

This module frames context engineering as the wider picture and then takes you through the fundamentals of retrieval, hybrid search and reordering the results, memory architectures, and a gentle introduction to knowledge graphs for retrieval.

01
Context engineering: the wider frame
You will take on context engineering as the wider frame, using Andrej Karpathy's picture of the language model as an operating system and the context window as its working memory, and you will see why trying to stuff everything in falls apart at scale.
02
RAG fundamentals (the proper way)
You will learn to do retrieval properly, working through chunking strategies, embeddings, vector databases, and retrieval methods, starting from a naive baseline and seeing clearly why it is not enough on its own.
03
Hybrid search and re-ranking
You will see why pure vector search stumbles on names and codes, combine BM25 with vector search, add a step that reorders the results, and rewrite queries to retrieve more reliably.
04
Memory architectures
You will work through working, episodic, and semantic memory, implement memory in a system built on Claude, and give your agent a real sense of what it remembers.
05
Knowledge graphs for RAG (intro)
You will learn where pure vector retrieval breaks down on relational questions, and you will get an introduction to GraphRAG and HybridRAG so you can decide between them and the naive approach.

Production: cost, observability, safety

Here you will work on what it takes to run in production, covering token economics and cost engineering, observability and tracing, the OWASP LLM Top 10, prompt injection both direct and indirect, the Agents Rule of Two, and the reliability of agents that run over a long horizon with a person in the loop.

01
Token economics and cost engineering
You will model cost on a per task basis, route work to the right model, take caching strategy to real depth, and learn to reason about the unit economics of cost per resolution.
02
Observability and tracing
You will get hands on with Langfuse, Helicone, and Arize, attribute cost to each individual trace, keep track of prompt versions inside your traces, and internalise that without observability there is no improvement.
03
OWASP LLM Top 10: the security curriculum
You will walk through each of the ten risks in the OWASP LLM Top 10 with a defence pattern for every one, and then run an OWASP audit against your own agent.
04
Prompt injection: direct, indirect, and defence
You will study real stories behind Anthropic security disclosures, learn the Agents Rule of Two, gate capabilities deliberately, and require a person to confirm any action that cannot be undone.
05
Long-horizon reliability and human-in-the-loop
You will work through the maths of how failure probability compounds over a long run, watch for goal drift, add checkpoints, verify sub goals along the way, and design approvals that ask for real judgement rather than a rubber stamp.

Shipping

This closing module is about shipping, with deployment on Vercel and Supabase, Claude through Bedrock in the AWS Sydney region for Australian data residency, Azure in the Australia East region as an alternative, Office add ins as an optional branch, and a final checklist that ties your evals, your safety work, and your Australian compliance together.

01
Deployment on Vercel + Supabase (the Resonance reference stack)
You will deploy on the Resonance reference stack, using the Next.js App Router for streaming, choosing between Vercel functions and the edge, and leaning on Supabase for authentication, the database, and pgvector.
02
AWS Bedrock Sydney for AU compliance
You will deploy Claude through Bedrock in the AWS Sydney region for Australian data residency, use PrivateLink to keep traffic inside your virtual private cloud, and weigh the cost against calling Anthropic directly.
03
Azure OpenAI in Australia East (as an alternative)
You will learn when Azure is the better fit than Bedrock, such as an existing Microsoft estate, Australian government work, or a large enterprise, and you will see how the deployment differs.
04
Office add-ins (Outlook, Word, Excel) — optional branch
You will learn the architecture of an Office add in, including the manifest and the task pane, and look closely at the prompt injection threat surface that Outlook add ins bring in particular.
05
The ship checklist: AU compliance + production readiness
You will run the final checklist before you ship, confirming your evals are green, your OWASP audit is done, your Australian regulatory mapping is in place, your cost budget is set, and your kill switches, observability, and incident response are all ready.

What you leave with.

Nine modules from programming foundations to a production agent, with tool use, retrieval, evaluations, and the operational discipline that separates a demo from a system.

Modules: 9
Lessons: 46
Capstone: 1
Certificate: 1

Ship a production AI agent

Ship a deployed, working agent. Code in a repo, deployment URL, evidence of evals, observability, and safety review. Multi step. Tool using. With a safety story you would put in front of an Australian customer or regulator.

0 tasks, read and graded against a rubric, with the certificate issued the moment you pass.