Alan Boyce - Sprout Social
What Happens When the Invisible Contract Breaks? Rebuilding Engineering for a Non-Deterministic World
The Moment the Playbook Stopped Working
The product was called Trellis. The goal was straightforward enough on paper: build a unified framework and studio for agents and AI that would let any role inside a customer’s company, not just social media managers, unlock insights from four petabytes of social data. Two billion messages processed per day. Thirty thousand customers across every vertical from retail to fintech.
The technology was not the hard part.
What kept coming back to Alan Boyce was a simpler, more uncomfortable realization: the entire operating agreement between product and engineering, the one his teams had relied on for 15 years, had quietly become obsolete. Traditional roadmaps were broken and even the definition of “done” couldn’t be agreed on. The deterministic contract that product and engineering had always made with each other, the shared assumption that a spec produces a predictable output on a predictable timeline, could not hold in a world where the product itself behaves non-deterministically.
“The question isn’t what model do we use to build Trellis. It’s how do we restructure teams around a product that behaves non-deterministically. And so that non-determinism breaks the contract between product and engineering that we’ve always had with each other.”
That realization is the throughline of everything Sprout Social is building right now, and it shapes how Alan thinks about data architecture, org design, hiring, and the future of the CTO role itself.
The Inverted Stack
Alan joined Sprout Social as its first backend engineer roughly 15 years ago, when the company was pre-revenue and still explaining to customers what social media was. His first assignment was Facebook data ingestion, and it was a fast education in engineering under chaos. Facebook engineers were changing their JSON on an hourly basis.
“It really got me into thinking about data pipelines, data governance, risk, trying to figure out when fields got added, fields got removed, mapping in states of high chaos and it made me a better engineer at doing everything because of that.”
That early experience with high-volume, unreliable data at the edges of someone else’s platform shaped how Sprout’s infrastructure was built over the next decade and a half. Today, that infrastructure looks nothing like a conventional stack.
“Our stack is completely inverted from typical stacks. A typical stack, you’ve got a couple of databases, and you scale them with web heads and a bunch of caching. We have a bunch of database servers and a few web heads.”
The reason is the nature of the customer Sprout serves. A Fortune 10 company’s social media manager is one of a very small number of users receiving millions of inbound messages per day. The data flow runs toward a narrow human surface, not away from it. Processing that volume, ranging from 25,000 to 45,000 messages per second, requires thinking through dead letter queues, enrichment pipeline failures, and what happens when one of 50 model enrichments goes offline mid-stream. Getting the data in reliably and safely, before applying any AI to it, consumed years of what Alan calls “unglamorous engineering.”
“My favorite definition of big data is when the size of your data becomes part of the problem. Simple-sounding features end up being a pain. But it also provides you the ability to tell customers what’s the most important thing they should be focused on.”
Fifteen years of structured, governed, production-grade social signal built across 30,000 customers and every major vertical is what makes Sprout’s AI work something other than a thin layer on top of a commodity model. Competitors cannot replicate that asset by standing up a demo in a week.
Two Models, Two Completely Different Systems
Alan draws a firm line between two operating models for AI inside an established company. In the first model, AI is a feature. It augments an existing product and workflow. The data architecture underneath it is built for analytics and reporting, and AI features query it. This is where most companies start, and there is nothing wrong with it as an entry point.
The second model is fundamentally different. In it, AI is the value delivery mechanism. The product’s job is to produce inference-driven output, and everything else, data pipelines, pricing, support, user experience, is organized around making that inference reliable, fast, and trustworthy at scale.
“In model two, your data architecture is built for inference, which means it’s optimized for retrieval latency and freshness and lineage and feedback loops. Our data team’s job is to change from serving dashboards to serving production inference.”
The more consequential question is when those architectural choices get made, because locking into one model or the other happens earlier than most teams realize. Unit of work, pricing, packaging, enablement: all of it flows from that foundational decision. And most companies do not realize they have made it until their go-to-market team is still thinking in features, still building three-month roadmaps around feature one and feature two, while engineering has already moved into orchestration layers, partial failure states, and cost accumulation across multi-step agent chains.
“Until your groups internally realize that shift and that they need a different system, you’re going to still be building the old-fashioned way: Crystal balls and 12-month timelines and your enablement teams.”
Sprout is mid-transition into model two. Trellis is the product expression of that shift, a framework for agentic workflows that extends from crisis detection through the NewsWhip integration, all the way to content publishing and an agent builder where customers can see how all of the pieces connect.
The Decision Layer Nobody Talks About
Building Trellis required more than a new data architecture. It required Sprout to rebuild how decisions get made across the entire research and development (R&D) organization.
Alan identified a problem that rarely makes it into engineering blog posts: every function in a company carries an implicit, unreconciled standard for what “good” looks like. Engineers have a bar. Product has a bar. Legal has a bar. Go-to-market has a bar. In a deterministic product, those gaps stay manageable. In a non-deterministic product, they create invisible fault lines that surface at critical moments.
His solution was to make three things explicit that most organizations leave implicit. The first is decision ownership. Someone has to be clearly identified as the person who decides when something is good enough, not by committee, not by consensus drift, but by name. The second is reversibility classification. Sprout uses a classification framework Alan describes as hats, haircuts, and tattoos. A hat is quick to change. A haircut takes time to grow back. A tattoo may be permanent.
“Most AI product decisions are hats and we need to treat them that way.”
The third mechanism is the decision post-mortem, and it starts with a question Alan calls the hardest one to answer in AI product development: where are the seams? What triggers the AI, what does it hand back, and who acts on the result? Every AI workflow has points where the system and the human have to exchange control, and those handoff points are where most product failures actually originate. Getting the seam wrong might not show up as a system error. It shows up as a user who stops trusting the product before they have even properly tested it.
That is exactly what the decision post-mortem is designed to catch. It is distinct from the incident post-mortem most engineering teams already run. “These are where the decision didn’t go right even though the tech went well,” Alan said. The tech shipped cleanly. The customers did not trust it. They would not click the button because they were terrified of what it might do. Sprout has found that teams almost always either overshoot or undershoot the automation boundary, and when the seam is placed wrong, customer trust degrades before the model has even been properly evaluated. Those failures live in the decision layer, not the code layer, and they require their own review process.
Building the Org Before the Technology Moves Again
On hiring, Alan has already started shifting the criteria. When Sprout’s recruiting team came to him a year ago and said they wanted to block candidates from using AI on coding assessments, he reversed the policy. If AI usage is a skill the team needs to grow, it should be something the interview process evaluates, not penalizes.
He has also noticed a split inside his existing engineering teams. Some engineers are outcome-driven: the code is the implementation detail they have to work through on the way to the customer problem. Others are attached to the craft of writing code itself, and those engineers are struggling in a world where AI handles more of the writing. Alan is not framing this as a judgment on either type. He is framing it as a signal for where the hiring criteria need to go: curiosity first, willingness to experiment, and comfort with a role that changes faster than any particular technology.
Deployments at Sprout have tripled in the past three months. That acceleration is putting pressure on everything downstream: code review culture, GitHub Actions pipelines, enablement teams and legal and governance functions. Solving one bottleneck revealed the next one. The engineering team has gotten faster. The rest of the organization is catching up.
Alan sees this as the core job of the CTO right now, and he is precise about where the role has shifted.
“The CTO’s job used to be making the right technical bets. I now think it’s our job to build organizations that can adapt faster than the technology changes. The technology will keep moving. Your org won’t unless you rebuild it now.”
For a public company managing four petabytes of social data, 2 billion daily messages, and a customer base spanning every major industry, the stakes of getting the org wrong are concrete. The model will change in six months. The org has to be ready before it does.












