Salesforce data migration is where implementation projects quietly succeed or visibly fail. The configuration can be flawless, but if the cutover loses relationships, double-loads records, or times out on lock contention at 2am, the project is remembered as the migration that went wrong. This playbook is the field-tested sequence — load order, keys, tooling, performance levers, and reconciliation — for moving real data volumes into Salesforce. Current to Summer ‘26.
This is the data side of a build; pair it with the platform-shaping decisions in the architecture decision framework, since data model choices made there dictate everything below.
Phase the migration — never one big push
A migration is a project with phases, not a button:
- Profile the source — counts per object, relationship cardinality, data quality, field mappings. You cannot plan a load you have not measured.
- Design the mapping — source field → target field, with transformation rules and a documented external-ID strategy per object.
- Build and rehearse in a full-copy sandbox at production volume.
- Reconcile the dry run — fix mappings, performance, and ordering issues found.
- Cutover in production with a tight runbook and a rollback plan.
- Reconcile production and re-enable automation.
The teams that skip straight to step 5 are the teams that discover lock contention exists.
Load order follows the dependency graph
Records must be loaded so that every parent exists before the children referencing it. From the relationship diagram, the typical order:
Accounts → Contacts → Opportunities → Opportunity Line Items, with all lookup/master-detail parents loaded ahead of dependents, and junction objects after both of their parents. Map this explicitly. A load that inserts Contacts before Accounts either fails on the required lookup or orphans the records — and orphan cleanup mid-cutover burns the time you do not have.
External IDs make relationships sane
The naive approach loads parents, captures every new Salesforce ID, maps them back to the source, then loads children against the mapped IDs — a fragile two-pass dance. The disciplined approach: put an external ID field on each object holding the source-system key, then upsert children against the parent’s external ID directly.
Account.Legacy_Id__c = "ACC-10293" (external ID)
Contact.Account__r.Legacy_Id__c = "ACC-10293" (resolves the lookup by key)
No Salesforce-ID capture, no remapping pass, and the load becomes idempotent — re-running it updates rather than duplicates. External IDs are the single highest-leverage decision in a migration; make them mandatory on every migrated object. (For how IDs and key prefixes work underneath, see the object key prefixes reference.)
Bulk API 2.0 vs Data Loader
Both run on the same Bulk engine, so this is a tooling-and-scale decision, not a capability one:
- Data Loader — desktop, GUI or CLI, uses the Bulk API underneath. Right for one-off loads up to a few million records and for analysts who aren’t writing code.
- Bulk API 2.0 directly — for very high volumes, repeatable pipelines, programmatic transformation, and orchestration. Right when the migration is a process, not an event.
Both load in parallelizable jobs; both respect the same limits. Choose Data Loader for a single cutover, Bulk API 2.0 for an ongoing or scripted pipeline.
Performance levers for large volumes
At large data volumes the load itself becomes the engineering problem:
- Bypass automation. Triggers, flows, validation rules, and workflow on millions of inserts multiply runtime and limit exposure. Gate automation behind a bypass switch (custom setting or custom permission) the automation checks, flip it for the load, re-enable after. The source data is already validated — re-validating it record-by-record buys nothing.
- Defer sharing calculation. A large insert into objects with complex sharing recalculates sharing per record. Deferred sharing calculation suspends that during the load and runs one recalculation afterward — often the largest single speedup available.
- Right-size and order batches against lock contention.
UNABLE_TO_LOCK_ROWis the signature large-load failure: parallel batches updating children of the same parent contend for the parent lock. Group records so one parent’s children sit in one batch, or run serial mode for the contended object. Serial is slower but finishes; parallel-with-contention fails and restarts. - Plan duplicate rules. Active alert/block duplicate rules can reject legitimately pre-deduplicated source data and stall the load. Decide deliberately — usually relax or bypass during the load (data is clean at source) and enforce going forward.
Reconciliation — the step that earns trust
A load that reports “success” has proven the job ran, not that the data is correct. Reconcile:
- Counts per object, source vs target — the first and cheapest check
- Field fidelity on a representative sample — dates, currencies, picklist mappings, encoding
- Relationship integrity — children resolved to the right parents, no orphans
- Rollups and reports — summary fields and key reports match expected totals
Do all of this in the sandbox dry run first. The dry run exists to surface the lock, limit, and timing problems that only appear at production volume — discovering them during the real cutover is the failure mode the whole playbook is designed to prevent.
The cutover runbook
The production cutover is a timed, ordered, rehearsed sequence: freeze source writes, run loads in dependency order with automation bypassed, recalculate sharing, re-enable automation, reconcile, then open the org. Every step has an owner, an expected duration from the dry run, and a rollback trigger. The migration that has been rehearsed end-to-end in a full sandbox is boring on cutover night — and boring is exactly the goal.
Test your knowledge — Implementation
10 questions · Basic to Advanced