What Breaks, What Scales, What Lasts: My Replay 2026 Agenda

Replay 2026 is shaping up to be one of the most relevant events of the year for teams building durable AI systems and long-running workflows. Temporal is positioning it around building and operating reliable AI applications.

That framing resonates with us at Xgrid because client conversations have shifted. The question is no longer whether AI workflows are possible, but whether they can survive production reality: retries, crashes, deploys, human approvals, partial failures, and the drag that appears after the demo works.

Closing that gap is where we spend our time. As a Certified Temporal Cloud Partner, Xgrid’s Forward-Deployed Engineers work closely with clients to design, ship, and stabilize production-grade Temporal systems, whether that means modernizing legacy orchestration, migrating to Temporal Cloud, or hardening agentic workflows before they become expensive incidents.

So when I look at the Replay 2026 schedule, these are the five sessions I’m most interested in.

1) Durable Agents Need Durable Execution

Replay session: “Durable Agents: Long-running AI workflows in a flaky world” — Samuel Colvin, Pydantic

This is probably the session closest to where the market is heading.

Samuel Colvin’s talk promises to explore how durable execution breaks AI agents out of the short-lived “chat paradigm,” using Pydantic AI’s Temporal integration to show what long-running, resilient agents can actually do.

Why it matters is simple: most agentic systems do not fail because the model is not smart enough. They fail because the execution layer is brittle. State disappears, tool calls hang, approvals wait forever, and non-deterministic logic breaks replay.

At Xgrid, we have documented 11 recurring production failure patterns in Temporal-based AI agents, which is why our engineers focus so heavily on retries, signals, heartbeats, and observability from day one.

2) The Best Teams Study Mistakes Before They Hit Them

Replay session: “100 Temporal Mistakes (and how to avoid them)” — Jacob LeGrone, Datadog

Some talks are interesting because they are visionary. Others are valuable because they compress years of pain into 45 minutes.

This one looks like the latter, and that is exactly why I want it on my calendar.

Datadog’s session focuses on the common pitfalls teams hit when using Temporal to build production systems, and that matters because production failures in Temporal are usually subtle. A workflow can still be “running” while the business process is stalled. Standard APM traces break at async boundaries. Replay can duplicate logs and distort metrics if you are not instrumenting carefully.

That is why Xgrid emphasizes Search Attributes, visibility queries, debugging patterns, and replay-safe telemetry: clients do not need more workflows; they need fewer blind spots.

3) Migration is Never Just Infrastructure

Replay session: “LinkedIn’s reliable migration of 3 million cpu cores of compute to Kubernetes by using Temporal” — Ankit Rajesh Methwani, LinkedIn

This is the kind of headline that makes infrastructure leaders stop scrolling.

LinkedIn’s talk is about using Temporal workflows to migrate more than 3 million CPU cores across 2,000+ microservices, while sharing the design decisions, integrations, and lessons learned from that effort.

Migration gets framed as a lift-and-shift problem, but long-running workflows turn it into a state-management problem. In-flight executions cannot simply be exported, paused, and resumed elsewhere.

That is why Xgrid keeps pushing traffic-shifting, dual-running, graceful draining, and rollback-first cutovers. We have published that playbook, and in one public case study we helped a scale-up move production AI workflows to Temporal Cloud with zero downtime and 99.99% availability.

4) Scale is as Much Organizational as Technical

Replay session: “The Path to Temporal General Availability at Netflix” — Rob Zienert, Netflix

If LinkedIn’s session is about migration at scale, Netflix’s is about adoption at scale.

Rob Zienert’s talk promises a candid account of how Temporal became generally available in Netflix’s internal paved road after years as a critical technology, including the organizational challenges and lessons learned in driving change inside a large enterprise.

This is a session I am especially curious about because every platform story eventually becomes an organizational story. A few experts knowing Temporal well is not the same as an organization trusting it broadly. The hard problems become standards, enablement, versioning discipline, and operational ownership.

We see the same pattern in client work: getting one workflow into production matters, but creating a repeatable operating model for many teams is where durable execution compounds.

5) Self-Service is the Next Maturity Curve

Replay session: “From Bottlenecks to Self-Service: How Duolingo Built Workflow-as-a-Service with Temporal Nexus” — Zhihao Wang, Duolingo

Duolingo’s talk describes how it built a self-service platform on Temporal Nexus, with cross-namespace secure workflows, human-in-the-loop controls, and a path from one team’s tool to an organization-wide workflow-as-a-service platform.

Once a team proves Temporal on a few high-value use cases, the next question is no longer “Does it work?” It is “How do we make this usable, governed, and scalable across the company?”

That is where workflow architecture turns into platform architecture.

The conversation moves from individual workflows to standards, namespace design, observability, governance, and the guardrails multiple teams need to build safely. Duolingo’s story looks directly relevant to that next stage of maturity.

Why This Matters Beyond the Sessions

What matters now is not just what’s new. It’s what breaks, what scales, and what makes durable execution possible inside production organizations.

That is the lens I’ll be bringing to Replay.

I’m especially looking forward to conversations around agentic AI reliability, workflow observability, zero-downtime migration, and what it really takes to build systems that remain correct long after the demo ends.

If you’ll be at Replay, let’s connect in San Francisco. I’d be glad to discuss your first production workflow, a migration you can’t afford to get wrong, or the agentic AI challenges that are starting to look a lot more like distributed systems problems.

About The Author(s)

Abdullah Shah | CEO / CTO

Established in 2012, Xgrid has a history of delivering a wide range of intelligent and secure cloud infrastructure, user interface and user experience solutions. Our strength lies in our team and its ability to deliver end-to-end solutions using cutting edge technologies.

NAVIGATE

Cloud & DevOps Web & Mobile Apps Temporal Digital Marketing GTM Engineering Marketo Consulting HubSpot Consulting Company Careers Resources

OFFICE ADDRESS

US Address:

Plug and Play Tech Center, 440 N Wolfe Rd, Sunnyvale, CA 94085

Dubai Address:

Dubai Silicon Oasis, DDP, Building A1, Dubai, United Arab Emirates

Pakistan Address:

Xgrid Solutions (Private) Limited, Bldg 96, GCC-11, Civic Center, Gulberg Greens, Islamabad
Xgrid Solutions (Pvt) Ltd, Daftarkhwan (One), Building #254/1, Sector G, Phase 5, DHA, Lahore

What Breaks, What Scales, What Lasts: My Replay 2026 Agenda

1) Durable Agents Need Durable Execution

2) The Best Teams Study Mistakes Before They Hit Them

3) Migration is Never Just Infrastructure

4) Scale is as Much Organizational as Technical