Coaching Context — Stuart Frisby

Expertise made
queryable.

How I encoded fifteen years of design leadership thinking into a portable knowledge base that AI can retrieve and reason from.

Experiment — in progress Model Context Protocol Design Leadership

at://mrfrisby.com/coach

01 Why this exists

New Chat

◈

No context loaded

How do design leaders build trust?

What makes a good design critique?

Message... ↑

cold start · no context · averaged knowledge

mrfrisby / coach — get_frameworks

coach.json / get_frameworks

CALIBRATION DRIFT

The gap between what a design leader believes is happening on the ground and what is actually happening. Accumulates slowly; often invisible until something breaks.

THE LATE PRESENTATION PROBLEM

When design is brought into a discussion after the substance of a decision has already been made in other rooms. The critique session becomes performance, not input.

canonical · authored · versioned · owned

AI conversations start cold. Your chosen model knows nothing about who you are, what you believe, or how you make decisions. For most people that's fine — tasks are discrete and context is implicit. For me it became a persistent friction: the same base context re-entered over and over, profile additions ignored or stale, every new conversation starting from zero.

I write and think about design stuff a lot. My newsletter has 70+ posts covering topics which span all of the work of design leadership. They have named frameworks, stated positions, views on taste and ethics, and are all written in a particular voice. Every time I started a new conversation to draft something, pressure-test an idea, or get a second opinion, I was spending the first part of the conversation re-establishing who I was. The model was always meeting me for the first time.

When an AI reasons about design stuff, it synthesises conference talks, blog posts, Medium essays of variable quality, generic management stuff, things it thinks are about design, but are actually about topiary, etc. The output is averaged and flattened. What I wanted was a tool that reasons from my corpus. Owned input, not inherited noise. I want AI tools to treat my content like a knowledge base rather than a prompt — structured, queryable domains that a model retrieves selectively depending on what it's actually doing.

This is an experiment without a conclusion. I'm not sure it works in the ways I hope, and I haven't fully tested the limits. But that's the point. My view on AI tools comes from use, and this use is novel (to me). Building this has taught me lots about what an MCP server is and isn't, what context actually means in practice, and where the interesting problems are.

02 The data

zsh — coach.json

coach@mrfrisby.com ~ % tree coach --group-by-layer

coach/

├── active/

│ └── critique_design_decision calls LLM · returns structured critique

├── corpus/

│ ├── get_dlm_writing [70,341 chars]

│ └── get_dlm_links [6,094 chars]

├── knowledge/

│ ├── get_design_worldview [4,840 chars]

│ ├── get_design_philosophy [2,319 chars]

│ ├── get_stated_positions [3,494 chars]

│ ├── get_frameworks [2,799 chars]

│ └── get_taste [1,647 chars]

├── voice/

│ ├── get_writing_style [4,553 chars]

│ └── get_voice_anti_patterns [998 chars]

└── context/

├── get_career [2,253 chars]

├── get_working_style [2,839 chars]

├── get_personal_context [958 chars]

└── get_ai_instructions [1,286 chars]

13 tools · 1 active · 105,806 chars total

The knowledge base is a flat JSON file with thirteen tools. Each one is a named domain of expertise. The model calls the tool, gets the string, uses it as context. The tools split into three broad categories: content, action, and instruction.

The two DLM content tools are the largest, and they exist as a pair for a specific reason. get_dlm_links gives the model a compressed view of what I've written about; get_dlm_writing gives it the actual text — around 70,000 characters of published posts. I kept both because they serve different retrieval needs. Synthesis tasks want the summary. Voice and position tasks want the source. The unedited writing is the most direct signal of how my thinking has actually developed, rather than how I would characterise it if asked.

There are also content tools which are highly specific caches of content which describe more philosophical positions — my views on taste, my 'design worldview', etc. These provide bounds for models to reason within — because without them, it might fall back on what Mark Zuckerburg thinks about design, and no one needs to internalise that, not even a robot.

Instruction tools tell the model how to behave — they cover my writing style, or at least what I think my writing style is, the anti-patterns which tell the model what not to write, those telltale AI constructions which make everyone wince.

— Tools

coach.json

// 13 tools — 12 data, 1 active

get_writing_style

Voice, register, vocabulary, sentence structure, argumentative shape, editorial instincts

get_voice_anti_patterns

Rhetorical habits that signal generated commentary — what not to do

get_design_worldview

Core beliefs: design as argument, taste as judgment, system as design, craft vs process

get_design_philosophy

Applied positions: problem framing, design reviews, when and how to critique

get_taste

What taste is, how it erodes, its relationship to ethics, simplicity, and hiring

get_stated_positions

Working views on design systems, AI, growth design, vision, hiring, org leadership

get_frameworks

Named mental models: calibration drift, the late presentation problem, force multiplier

get_career

Professional history, areas of expertise, writing and speaking record

get_working_style

Decision defaults, communication style, what energises and drains, people management

get_personal_context

Background, interests, values, and motivating factors

get_ai_instructions

Defaults: format preferences, what to escalate, tone calibration by context

get_dlm_writing

Full text of all published Design Leadership Memos — primary source of voice and position

get_dlm_links

Titles, URLs, dates, and summaries of all published DLM posts

03 The active tool

✦ V F P

Checkout Flow — Design Review

Layers

▶ Checkout

▶ Frame 1

Nav

Form

CTA

Loading State

▶ Onboarding

Frame 1 — Checkout

Pay now

critique_design_decision

We're presenting the redesigned checkout flow to stakeholders Thursday. Engineering has been building against the wireframes for six weeks...

FRAMEWORKS THAT APPLY

This is the late presentation problem in its clearest form. A stakeholder review after six weeks of engineering work isn't a review — it's a reveal. The window for feedback to change anything meaningful has already closed.

WHERE THE REASONING HOLDS

The review still has value for surface-level issues — copy, visual polish, edge cases that can change without re-architecting the build.

WHERE I'D PUSH BACK

If the goal is genuine input on direction, the meeting is too late. The fix isn't a better presentation — it's moving the review to before engineering commits. That's a process problem, not a design problem.

critique_design_decision is the only tool that actually creates something — the other twelve return stored text and let the model work with it; this one calls a language model. You give it a description of a design decision or rationale, and it returns a structured critique: which of my frameworks apply, where the reasoning holds, and where I'd push back.

The frameworks invocation is what makes it useful. The verdict rarely surprises me — I usually have a sense of where I'd land. What I don't always have is the vocabulary for which mental model applies, surfaced immediately. Naming "the late presentation problem" before the deliberation starts is different from naming it after. The frame is in the room before the argument begins; it shapes what gets said and what doesn't. A thinking accelerant, not an oracle.

The implementation is simple by design. The critique prompt embeds my worldview, philosophy, frameworks, positions, and taste in full. The three-section structure is enforced in the prompt — instructed, not parsed. A smaller, faster model handles this reliably: the prompt is explicit enough that instruction-following matters more than raw capability. If this is being called frequently by an agent working through a stack of decisions, the cost difference adds up.

There's a public version at /ask — a chatbot interface that queries the same server directly. The intended use is agents and tools, not conversation; but it is a quick way of validating that the MCP server works, and is offering at least superficially useful output.

04 What it can't do

What AI tools do with the MCP server is approximation. Claude arrives somewhere more specific and actionable using my MCP server than it does reasoning from whatever it absorbed from the open web. That gap is the point — it's why this is worth doing, and why I think a personal MCP server is an interesting avenue to explore quite aside from the broad debate about the usefulness of LLMs.

The main limitation is the gap between what I've published and what I think. Most thoughts don't warrant a newsletter (or a podcast, please pass this fact on to the millennial white men in your life). So the model gets the map but not the territory — polished posts that represent concluded positions, not the live friction. The half-formed ideas, the things I'm uncertain about, the positions I've quietly reversed — none of that makes it in. The knowledge base is always slightly behind, and slightly simplified. Perhaps I need a way to capture that unrefined stuff too.

The harder limit is what declared positions can represent at all. A lot of how good design leaders think lives in the application, not the declaration — in the work, in the specific context, in pattern recognition that doesn't reduce to a framework. That part doesn't compress into JSON. And obviously you can't replace a highly experienced person with a synthesis of their views and expect it to do the same job. But I do know some things — about how good design leadership works, where it fails, and how those lessons were earned. Making that knowledge available to tools that are already in the room when design decisions get made feels like a worthy experiment.

AI is increasingly part of how design decisions get made, how work gets reviewed, and how design thinking gets communicated. The only way to form a useful view on what that means for design leadership is to build with these tools, not just observe them. This is how I do that.

Looking for actual coaching?

This is the experiment.
Outpost is the work.

Strategic design consultancy, fractional design leadership, and coaching for design leaders. If you found this page looking for the real thing, it's at outpost.works.

Visit outpost.works ↗

This is the experiment.Outpost is the work.

This is the experiment.
Outpost is the work.