Implementation

Salesforce Data Migration Strategy — A Practical Playbook (2026)

Published 12 June 2026 · 13 min read · Advanced

Salesforce data migration is where implementation projects quietly succeed or visibly fail. The configuration can be flawless, but if the cutover loses relationships, double-loads records, or times out on lock contention at 2am, the project is remembered as the migration that went wrong. This playbook is the field-tested sequence — load order, keys, tooling, performance levers, and reconciliation — for moving real data volumes into Salesforce. Current to Summer ‘26.

This is the data side of a build; pair it with the platform-shaping decisions in the architecture decision framework, since data model choices made there dictate everything below.

Phase the migration — never one big push

A migration is a project with phases, not a button:

Profile the source — counts per object, relationship cardinality, data quality, field mappings. You cannot plan a load you have not measured.
Design the mapping — source field → target field, with transformation rules and a documented external-ID strategy per object.
Build and rehearse in a full-copy sandbox at production volume.
Reconcile the dry run — fix mappings, performance, and ordering issues found.
Cutover in production with a tight runbook and a rollback plan.
Reconcile production and re-enable automation.

The teams that skip straight to step 5 are the teams that discover lock contention exists.

Load order follows the dependency graph

Records must be loaded so that every parent exists before the children referencing it. From the relationship diagram, the typical order:

Accounts → Contacts → Opportunities → Opportunity Line Items, with all lookup/master-detail parents loaded ahead of dependents, and junction objects after both of their parents. Map this explicitly. A load that inserts Contacts before Accounts either fails on the required lookup or orphans the records — and orphan cleanup mid-cutover burns the time you do not have.

External IDs make relationships sane

The naive approach loads parents, captures every new Salesforce ID, maps them back to the source, then loads children against the mapped IDs — a fragile two-pass dance. The disciplined approach: put an external ID field on each object holding the source-system key, then upsert children against the parent’s external ID directly.

Account.Legacy_Id__c   = "ACC-10293"   (external ID)
Contact.Account__r.Legacy_Id__c = "ACC-10293"   (resolves the lookup by key)

No Salesforce-ID capture, no remapping pass, and the load becomes idempotent — re-running it updates rather than duplicates. External IDs are the single highest-leverage decision in a migration; make them mandatory on every migrated object. (For how IDs and key prefixes work underneath, see the object key prefixes reference.)

Bulk API 2.0 vs Data Loader

Both run on the same Bulk engine, so this is a tooling-and-scale decision, not a capability one:

Data Loader — desktop, GUI or CLI, uses the Bulk API underneath. Right for one-off loads up to a few million records and for analysts who aren’t writing code.
Bulk API 2.0 directly — for very high volumes, repeatable pipelines, programmatic transformation, and orchestration. Right when the migration is a process, not an event.

Both load in parallelizable jobs; both respect the same limits. Choose Data Loader for a single cutover, Bulk API 2.0 for an ongoing or scripted pipeline.

Performance levers for large volumes

At large data volumes the load itself becomes the engineering problem:

Bypass automation. Triggers, flows, validation rules, and workflow on millions of inserts multiply runtime and limit exposure. Gate automation behind a bypass switch (custom setting or custom permission) the automation checks, flip it for the load, re-enable after. The source data is already validated — re-validating it record-by-record buys nothing.
Defer sharing calculation. A large insert into objects with complex sharing recalculates sharing per record. Deferred sharing calculation suspends that during the load and runs one recalculation afterward — often the largest single speedup available.
Right-size and order batches against lock contention. UNABLE_TO_LOCK_ROW is the signature large-load failure: parallel batches updating children of the same parent contend for the parent lock. Group records so one parent’s children sit in one batch, or run serial mode for the contended object. Serial is slower but finishes; parallel-with-contention fails and restarts.
Plan duplicate rules. Active alert/block duplicate rules can reject legitimately pre-deduplicated source data and stall the load. Decide deliberately — usually relax or bypass during the load (data is clean at source) and enforce going forward.

Reconciliation — the step that earns trust

A load that reports “success” has proven the job ran, not that the data is correct. Reconcile:

Counts per object, source vs target — the first and cheapest check
Field fidelity on a representative sample — dates, currencies, picklist mappings, encoding
Relationship integrity — children resolved to the right parents, no orphans
Rollups and reports — summary fields and key reports match expected totals

Do all of this in the sandbox dry run first. The dry run exists to surface the lock, limit, and timing problems that only appear at production volume — discovering them during the real cutover is the failure mode the whole playbook is designed to prevent.

The cutover runbook

The production cutover is a timed, ordered, rehearsed sequence: freeze source writes, run loads in dependency order with automation bypassed, recalculate sharing, re-enable automation, reconcile, then open the org. Every step has an owner, an expected duration from the dry run, and a rollback trigger. The migration that has been rehearsed end-to-end in a full sandbox is boring on cutover night — and boring is exactly the goal.

Test your knowledge — Implementation

10 questions · Basic to Advanced

0 / 10 correct

Frequently asked questions

What order should I load objects in during a Salesforce data migration?

Load in dependency order: parents before children. Typically Accounts before Contacts before Opportunities, with all lookup and master-detail parents loaded before the records that reference them. Map this from your relationship diagram before writing a single load job.

What is an external ID and why does it matter for migration?

An external ID is a field marked to hold a record's identifier from the source system. It lets you upsert — insert or update by that key — and resolve relationships without Salesforce IDs, so child records can reference parents by their legacy keys instead of requiring a two-pass ID mapping.

Should I use Bulk API 2.0 or Data Loader for migration?

Data Loader uses the Bulk API under the hood and is fine for one-off loads up to a few million records. For very high volumes, repeatable pipelines, or programmatic control, call Bulk API 2.0 directly. The decision is about automation and scale, not capability — both share the same engine.

How do I avoid record lock errors during a large data load?

Group records by their parent so the same parent is not updated by parallel batches, or run the job in serial mode. Lock contention on shared parents (UNABLE_TO_LOCK_ROW) is the most common cause of large-load failures.

Should I disable triggers and flows during data migration?

Usually yes, behind a controlled bypass switch such as a custom setting or custom permission checked by your automation. Migrated data is already validated at source, and running full automation on millions of records multiplies runtime and limit risk. Re-enable and reconcile afterward.

How do I validate a Salesforce data migration?

Reconcile record counts per object between source and target, spot-check field-level fidelity on a sample, verify relationships resolved correctly, and confirm key rollups and reports match expected values. Do this in a full sandbox dry run before the production cutover.