Preserving Workflow History During a Platform Migration: Patterns That Work in Production
What happens to your audit trail, debugging capability, and compliance records when you move Temporal clusters — and the concrete strategies that ensure none of it disappears
|
TL;DR When you migrate a Temporal cluster — from self-hosted to Temporal Cloud, across regions, or between infrastructure providers — your closed workflow history does not move with you. There is no native export-import mechanism for workflow event histories between clusters. This post covers what history preservation actually means in practice, the three strategies that work in production (archival, read-only cluster retention, and external indexing), when each is appropriate, and the compliance and debugging tradeoffs that should drive your decision. |
The History Problem Nobody Plans For
Platform migrations are planned carefully. Worker deployment strategies, namespace provisioning, traffic cutover sequencing, schedule migration — teams that do these well spend weeks on the mechanics. What almost never makes it into the migration plan is a clear answer to a simple question: what happens to our closed workflow history?
The answer, in most migrations, is: it stays on the old cluster until the retention period expires, then it is gone.
For many teams, that is acceptable. For teams in regulated industries, or teams where workflow history is the audit trail for financial transactions, multi-party approvals, or data processing records, losing that history on a cluster decommission schedule is not acceptable. And for any team that has ever debugged a production incident by replaying a workflow’s event history, losing months of closed workflow data at migration time represents a genuine operational risk.
The problem is compounded by timing. Migrations tend to happen when teams are growing, infrastructure costs are rising, or a strategic platform shift is underway. The decision to migrate and the decision to decommission the old cluster often happen in the same planning cycle. By the time someone asks “what happens to our old workflow histories?”, the decommission timeline is already set.
|
The Fundamental Constraint (Again) Temporal has no native mechanism to export a closed workflow’s event history from one cluster and import it into another. Closed workflow histories are stored in the cluster’s persistence layer (Cassandra or PostgreSQL for self-hosted; Temporal’s managed storage for Cloud). They are not portable artifacts. Whatever strategy you choose for history preservation must work around this constraint, not through it. |
What “Workflow History” Actually Contains and Why It Matters
Before choosing a preservation strategy, it is worth being precise about what is in a workflow’s event history and which parts of it different stakeholders actually need.
A Temporal workflow’s event history is an append-only log of every command the workflow issued and every event the system recorded: activity scheduled, activity started, activity completed (with the return value), signals received (with their payloads), timers fired, child workflows started, and so on. It is a complete, deterministic record of everything that happened during the workflow’s execution.
Different stakeholders need different parts of this for different reasons:
|
Stakeholder |
What They Need From History |
How Long They Need It |
| On-call engineer | Full event sequence, activity inputs/outputs, retry timeline | 30–90 days post-incident |
| Support team | Workflow status, key business identifiers, outcome | As long as customer disputes exist |
| Compliance / audit | Proof that a process ran, when, with what outcome | 1–7 years depending on regulation |
| Product / BI | Workflow counts, durations, failure rates by type | Aggregated; raw history not needed |
| Legal / e-discovery | Full record of specific workflows on demand | Indefinite for certain transaction types |
This table matters because the right preservation strategy depends heavily on which stakeholder needs are non-negotiable for your organization. A team that only needs post-incident debugging has very different requirements from a team that needs to produce a full workflow execution record for a financial regulator two years after the fact.
Most teams, when pressed, discover that they actually need different strategies for different workflow types. High-value financial workflows need long-term archival. Background processing workflows need 30 days for debugging. Analytics workflows need aggregated metrics, not raw history. Treating all workflow history with a single preservation strategy is the source of both over-spending and under-protecting.
Strategy 1: Temporal’s Native Archival
Temporal has a built-in archival system designed exactly for this use case. When archival is enabled on a namespace, Temporal automatically exports closed workflow histories to an external storage backend — typically S3 or GCS — when the workflow closes. The exported history is stored as a structured JSON file that can be retrieved and replayed independently of the cluster.
How Archival Works
Archival operates at the namespace level. You configure an archival URI (an S3 bucket path or GCS bucket path) and Temporal’s Archival service writes the workflow event history JSON to that URI when each workflow closes. The filename is deterministic: it encodes the namespace, workflow ID, and run ID, making individual histories retrievable without a search index.
Critically, archival happens when the workflow closes, not when the cluster is decommissioned. This means that if archival is enabled before your migration begins, every workflow that closes during the migration window — on either the old or the new cluster — will have its history preserved automatically. The archival bucket outlives both clusters.
|
TL;DR When you migrate a Temporal cluster — from self-hosted to Temporal Cloud, across regions, or between infrastructure providers — your closed workflow history does not move with you. There is no native export-import mechanism for workflow event histories between clusters. This post covers what history preservation actually means in practice, the three strategies that work in production (archival, read-only cluster retention, and external indexing), when each is appropriate, and the compliance and debugging tradeoffs that should drive your decision. |
Retrieving Archived Histories
Once archived, a workflow history can be retrieved using the Temporal CLI or the workflow describe API, which will transparently fetch from the archival backend if the workflow is no longer in the cluster’s active storage:
|
TL;DR When you migrate a Temporal cluster — from self-hosted to Temporal Cloud, across regions, or between infrastructure providers — your closed workflow history does not move with you. There is no native export-import mechanism for workflow event histories between clusters. This post covers what history preservation actually means in practice, the three strategies that work in production (archival, read-only cluster retention, and external indexing), when each is appropriate, and the compliance and debugging tradeoffs that should drive your decision. |
Archival Limitations
Archival solves the long-term preservation problem cleanly, but it has limitations worth understanding before committing to it as your sole strategy:
- Archival must be enabled before workflows close. You cannot retroactively archive histories of workflows that closed before archival was turned on. This means if you are planning a migration and archival is currently disabled, enabling it today only protects workflows that close from that point forward. Workflows that closed yesterday are not covered.
- Archival on Temporal Cloud has specific constraints. As of the time of writing, Temporal Cloud’s archival support and configuration options differ from self-hosted. Verify the current state against Temporal’s documentation before designing a Cloud-to-Cloud migration strategy that depends on archival.
- Archived histories are not searchable by default. You can retrieve a specific history if you know the workflow ID and run ID. You cannot query “all workflows of type X that closed between date A and date B” against the archival backend without a separate visibility index. This is where Strategy 3 (external indexing) becomes relevant.
Strategy 2: Read-Only Cluster Retention
The simplest strategy for history preservation during migration is also the most operationally conservative: keep the old cluster running in a read-only capacity after the migration cutover, long enough to satisfy your retention and debugging requirements, then decommission it on a schedule driven by business need rather than infrastructure cost pressure.
What Read-Only Retention Means in Practice
After you have completed the migration cutover — all new workflow starts are on the new cluster, the old cluster’s task queues have drained to zero open workflows — you stop the old cluster’s worker pools but keep the Temporal Frontend service running. This gives you:
- Full query access to any closed workflow history on the old cluster via the Temporal UI or CLI
- The ability to retrieve specific event histories for support, audit, or incident investigation
- Temporal visibility queries against closed workflow data (workflow lists, filtered by status, type, or custom search attributes)
- No risk of history loss from an unexpected decommission while engineers are still referencing old histories
|
TL;DR When you migrate a Temporal cluster — from self-hosted to Temporal Cloud, across regions, or between infrastructure providers — your closed workflow history does not move with you. There is no native export-import mechanism for workflow event histories between clusters. This post covers what history preservation actually means in practice, the three strategies that work in production (archival, read-only cluster retention, and external indexing), when each is appropriate, and the compliance and debugging tradeoffs that should drive your decision. |
Cost of Read-Only Retention
The honest tradeoff of read-only cluster retention is cost. Keeping an old cluster’s Frontend service and persistence layer running costs money. For a self-hosted cluster on EKS, the minimal read-only footprint — one Frontend pod and the RDS instance — runs roughly $300–$500 per month. For a Temporal Cloud namespace, you continue paying retained storage charges on all closed workflows until your configured retention period expires.
The question is whether that cost is justified by the business value of the history it protects. For most teams, the answer is yes for 30–90 days post-migration and no for longer than that. The read-only retention period should be explicitly defined in your migration plan as a business decision, not left open-ended.
When to Use Read-Only Retention
- You have no archival configured and cannot enable it before the migration window closes
- Your team’s primary need is debugging access for the 30–90 days immediately post-migration
- Your workflow history sizes are modest and the retained storage cost is acceptable for the retention window you need
- You are in a regulated environment where you need to demonstrate that history was accessible and queryable for a defined period before decommission
Strategy 3: External History Indexing
For teams that need long-term queryable access to workflow history — not just point-in-time retrieval of a known workflow ID — neither archival alone nor read-only cluster retention is sufficient. Archival gives you retrieval by ID. The old cluster gives you Temporal’s visibility queries while it runs. Neither gives you “give me all payment approval workflows for merchant X between January and March 2024” two years after the cluster was decommissioned.
External history indexing solves this by writing workflow execution metadata — and optionally the full event history — to an external store that outlives any cluster. The two most common approaches are an event-driven export pipeline and a completion-time activity hook.
Approach A: Event-Driven Export via Temporal’s Event Exporter
Temporal supports streaming workflow lifecycle events to an external sink. On self-hosted clusters, this is typically done by connecting a consumer to Temporal’s Kafka integration or using a custom visibility store plugin. On Temporal Cloud, the Export feature allows you to stream workflow closed events to an S3 bucket in a structured format.
The exported payload for each closed workflow includes: namespace, workflow type, workflow ID, run ID, start time, close time, close status, and optionally the full event history. Once in S3, you can index this data using Athena, OpenSearch, or a purpose-built data warehouse table for long-term queryability.
|
— Example: Athena table over exported workflow close events — Assumes Temporal Cloud Export writes JSONL to s3://your-bucket/exports/ CREATE EXTERNAL TABLE temporal_workflow_history ( namespace STRING, workflow_type STRING, workflow_id STRING, run_id STRING, start_time TIMESTAMP, close_time TIMESTAMP, close_status STRING, task_queue STRING, search_attributes MAP<STRING, STRING> ) ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’ LOCATION ‘s3://your-bucket/temporal-exports/’ TBLPROPERTIES (‘has_encrypted_data’=’false’); — Query: all failed payment workflows for a merchant in Q1 2024 SELECT workflow_id, run_id, start_time, close_time, close_status FROM temporal_workflow_history WHERE workflow_type = ‘PaymentSagaWorkflow’ AND close_status = ‘Failed’ AND search_attributes[‘MerchantID’] = ‘merchant-abc’ AND close_time BETWEEN TIMESTAMP ‘2024-01-01’ AND TIMESTAMP ‘2024-03-31’ ORDER BY close_time DESC; |
Approach B: Completion-Time History Export Activity
For teams that do not have access to Temporal’s event export feature, or who need the export to happen synchronously as part of the workflow’s own completion, a history export activity at the end of each workflow is a practical alternative.
The pattern: add a final activity to each critical workflow type that serializes the workflow’s own event history and writes it to your external store before the workflow closes.
|
// History export activity // Called as the final step of high-value workflow types func (a *AuditActivities) ExportWorkflowHistory( ctx context.Context, req HistoryExportRequest, ) error { // Fetch the workflow’s own history via the Temporal client iter := a.temporalClient.GetWorkflowHistory( ctx, req.WorkflowID, req.RunID, false, enums.HISTORY_EVENT_FILTER_TYPE_ALL_EVENT, ) var events []*historypb.HistoryEvent for iter.HasNext() { event, err := iter.Next() if err != nil { return fmt.Errorf(“reading history: %w”, err) } events = append(events, event) } // Serialize and write to your external store payload, err := json.Marshal(HistoryExport{ WorkflowID: req.WorkflowID, RunID: req.RunID, WorkflowType: req.WorkflowType, ExportedAt: time.Now().UTC(), Events: events, }) if err != nil { return fmt.Errorf(“serializing history: %w”, err) } key := fmt.Sprintf( “temporal-history/%s/%s/%s.json”, req.WorkflowType, req.WorkflowID, req.RunID, ) return a.s3Client.PutObject(ctx, a.bucket, key, payload) } // In the workflow: call export as the final activity func PaymentSagaWorkflow(ctx workflow.Context, req PaymentRequest) error { // … all business logic activities … // Final step: export history for long-term retention exportCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{ StartToCloseTimeout: 60 * time.Second, RetryPolicy: &temporal.RetryPolicy{MaxAttempts: 5}, }) info := workflow.GetInfo(ctx) _ = workflow.ExecuteActivity(exportCtx, a.ExportWorkflowHistory, HistoryExportRequest{ WorkflowID: info.WorkflowExecution.ID, RunID: info.WorkflowExecution.RunID, WorkflowType: info.WorkflowType.Name, }).Get(exportCtx, nil) // Best-effort: failure here does not fail the workflow return nil } |
|
Important: This Activity Reads Its Own In-Progress History The export activity fetches the workflow history while the workflow is still open (before the final workflow completion event is written). This means the export will not include the final WorkflowExecutionCompleted event. For compliance use cases where the terminal event matters, supplement this with a separate post-close export step or use Temporal’s native archival which runs after closure. |
Choosing the Right Strategy for Your Migration
The three strategies are not mutually exclusive. Most production migrations that take history preservation seriously use a combination. Here is a decision framework:
|
Requirement |
Archival |
Read-Only Retention |
External Indexing |
| Retrieve a specific workflow by ID | ✓ Yes | ✓ Yes | ✓ Yes |
| Query across workflows by type/date/attribute | ✗ No (without extra tooling) | ✓ Yes (while cluster runs) | ✓ Yes (indefinitely) |
| Survives cluster decommission | ✓ Yes | ✗ No | ✓ Yes |
| Covers workflows closed before migration starts | ✗ Only if enabled before close | ✓ Yes | ✗ Only if indexed before close |
| Works for Temporal Cloud namespaces | ✗ Use Export feature | ✓ Yes | ✓ Yes |
| Replay for determinism testing | ✓ Yes (full event history) | ✓ Yes | Partial (depends on export format) |
| Compliance / long-term audit trail | ✓ Yes (structured JSON) | ✗ Time-limited | ✓ Yes (if full history exported) |
| Engineering effort to implement | Low (config change) | None (operational decision) | Medium–High |
The practical recommendation for most migrations:
- Enable archival immediately on all namespaces, before the migration starts. This is a configuration change with low effort and covers all workflows that close from that point forward. It is the baseline that every migration should have.
- Keep the old cluster in read-only mode for 30–90 days post-migration for debugging access. Define the decommission date upfront and communicate it to the support and on-call teams.
- For high-value workflow types (payments, approvals, compliance-sensitive processes), add the external indexing export activity. This is the only strategy that gives you long-term queryable access after the old cluster is gone and independent of Temporal’s archival availability.
The Failure Modes That Catch Teams Off Guard
1. Assuming Archival Was Already On
The single most common history preservation failure in migrations: a team assumes archival was enabled when the namespace was first provisioned. It was not. Archival is disabled by default on both self-hosted and Temporal Cloud namespaces. By the time someone checks, the migration window has opened, workflows have been closing for months without archival, and the histories from those months are unrecoverable once the old cluster is decommissioned.
Mitigation: add “verify archival status on all namespaces” as step one of every migration planning checklist, before any other migration work begins.
2. Decommissioning Before the Compliance Window Closes
Infrastructure teams and finance teams both want old clusters decommissioned quickly after migration. Engineering teams, under pressure, set a decommission date without checking against the compliance retention requirement for the workflows that ran on that cluster. The cluster goes down. Three months later, a regulator requests records for a transaction that completed two months before the migration. The history is gone.
Mitigation: for any workflow type subject to regulatory retention requirements, the old cluster’s decommission date must be approved by compliance, not just engineering. If the compliance window is two years, either the archival strategy or the external indexing strategy must be in place before decommission.
3. Exporting History Without the Codec
Many production Temporal deployments use a custom Data Converter to encode workflow payloads — for encryption, compression, or custom serialization. When you export a workflow history from such an environment, the event payloads in the export are encoded. Without the matching codec server or Data Converter implementation, the exported JSON is unreadable.
Mitigation: before finalizing your archival or export strategy, verify that your exported histories are readable using your codec server. Store the codec server configuration and keys alongside the archived histories, or ensure your codec server is itself a long-lived service that outlives the Temporal cluster.
|
# Verifying an archived history is readable with your codec # Using the Temporal CLI with a codec server endpoint temporal workflow show \ –namespace your-namespace \ –workflow-id payment-wf-abc123 \ –codec-endpoint http://your-codec-server:8080 # If the output shows readable payload fields (not base64 blobs), # your codec is correctly configured and the archive is usable. # If output shows encoded blobs like: # Input: [base64encodedpayload==] # your codec server is not reachable or not configured correctly. # Do not decommission until this is resolved. |
4. Not Testing History Retrieval Before Decommission
Teams validate that the migration is complete by confirming zero open workflows on the old cluster. They do not validate that their preservation strategy actually works by retrieving a representative sample of closed workflow histories through their chosen strategy. The first time they try to retrieve a history is when an incident or audit request comes in after decommission — at which point discovering that archival was misconfigured is a significant problem.
Mitigation: treat history retrieval as a migration exit criterion alongside workflow drain confirmation. Before decommissioning the old cluster, retrieve and verify at least one workflow history per critical workflow type through every retrieval path you plan to use post-decommission.
Pre-Migration History Preservation Checklist
|
# |
Check |
Why It Matters |
| 1 | Verify archival status on every namespace before migration starts | Archival is off by default; enables protection from day one |
| 2 | Identify workflow types with regulatory or compliance retention requirements | Drives decommission timeline; cannot be decided post-migration |
| 3 | Confirm compliance-approved decommission date before migration begins | Prevents early decommission before audit windows close |
| 4 | Enable archival on all namespaces at least 30 days before cutover | Ensures full coverage for workflows closing during the migration window |
| 5 | Verify archival is writing to the storage backend by checking the bucket after a test workflow closes | Configuration errors are silent until you try to retrieve |
| 6 | For encoded payloads: confirm codec server is reachable and returns readable histories | Unreadable archives are functionally equivalent to no archives |
| 7 | For high-value workflow types: implement and deploy external indexing export activity | Provides queryable long-term access after cluster decommission |
| 8 | Define read-only cluster retention window (30 / 60 / 90 days) and communicate to support and on-call | Prevents accidental early decommission while teams still need access |
| 9 | Retrieve and verify at least one closed workflow history per critical type through each planned retrieval path | Validates preservation before decommission, not after |
| 10 | Document where archived histories live and how to retrieve them in your runbooks | On-call engineers need this at 3 AM, not after a ticket to the platform team |
Closing Thoughts
Workflow history preservation is not a glamorous part of a platform migration. It does not show up on the migration success dashboard, nobody cheers when the archival bucket starts filling up, and the teams that do it right are never noticed — because the histories are simply there when someone needs them.
The teams that do not do it right are noticed very specifically: when a support escalation cannot be closed because the workflow records are gone, when a regulator requests transaction history that no longer exists, or when a production incident cannot be root-caused because the event history from six months ago was on a cluster that was decommissioned last quarter.
The three strategies in this post cover the full range of preservation requirements from “30 days of debugging access” to “queryable audit trail for seven years.” The right combination depends on your workflow types, your regulatory environment, and how much engineering investment is justified by the value of the history you are protecting. The checklist above is where to start that conversation, and it should happen before the migration plan is finalized, not after the cutover date is set.
Is your migration plan missing a history preservation strategy?
If the answer to “what happens to our closed workflow histories after we decommission the old cluster?” is still unclear — that is a gap worth closing before the cutover date is set, not after.
Xgrid offers two entry points depending on where you are:
- Temporal Launch Readiness Review — if you are planning a migration and want a structured review of your workflow inventory, history preservation strategy, and migration plan before you start. Delivered as a fixed-scope, 2-week engagement with a concrete go / no-go scorecard.
- Temporal Reliability Partner — for teams that want a named Temporal expert embedded through the migration window and beyond, owning the history preservation layer and ensuring your archival and indexing strategies are validated before the old cluster goes dark.
Both are fixed-scope. No open-ended retainer required to get started.

