eCommerce Architecture

eCommerce platform architecture and integration

Scaling Magento for Black Friday: Architecture Decisions That Matter
Scaling Magento for Black Friday: Architecture Decisions That Matter 1024 1024 admin

Magento — Adobe Commerce — is capable of handling significant traffic volume when the infrastructure is configured correctly. The qualifier “when configured correctly” does significant work in that sentence.

The issues that cause Magento performance to degrade under peak traffic are well-known in principle and routinely underestimated in practice. Database lock contention during concurrent add-to-cart operations. Session backend failures under unexpected concurrency. PHP process exhaustion when full-page cache miss rates spike. Elasticsearch query latency amplified by the category landing page architecture.

Most of these issues do not appear in load testing because load testing scenarios do not replicate the behavioral patterns of real Black Friday traffic: the spike shape, the specific pages that receive disproportionate traffic, the burst of concurrent checkout operations when a promotional event ends.

This post covers the architectural changes that actually reduce risk on peak traffic days — based on having designed AWS infrastructure for a high-volume Magento deployment and studied what failed and what held.

The Failure Modes Most Teams Underestimate

The two failure modes that bring down Magento deployments on peak traffic days are database lock contention and PHP process exhaustion. Both are predictable. Both are preventable. And both are consistently underestimated because they do not manifest in standard load testing.

Database lock contention arises from Magento’s quote (cart) management. When a large number of customers add items to their carts simultaneously, Magento acquires row-level locks on the quote table. Under concurrent load, these locks queue and wait. The wait times accumulate. PHP processes block waiting for database locks. The queue grows. Eventually, the PHP-FPM process pool is exhausted and new requests time out at the web server.

PHP process exhaustion happens when the per-request PHP execution time increases — due to database wait, Elasticsearch latency, or cache miss rate — and the PHP-FPM pool runs out of workers. At that point, every new request queues at the nginx upstream. If the queue fills, nginx returns 502 errors. The site appears to be down.

Database Architecture: Why the Default Configuration Does Not Scale

Magento’s default database configuration is designed for development environments, not production traffic. The key settings that need tuning for peak load are: innodb_buffer_pool_size (should be 70-80% of available RAM on a dedicated database host), innodb_log_file_size (determines how much write activity can be buffered before a checkpoint), and the maximum connection count.

For the quote table lock contention problem, the architectural solution is to move cart sessions to Redis or another fast key-value store rather than the database. Magento supports this via the quote storage backend configuration. Removing cart writes from the primary database eliminates the lock contention that is the most common cause of Black Friday database failures.

Read replicas help with read-heavy traffic patterns — category pages, product listing pages, search results — but do not help with write contention during checkout. The checkout flow writes to the orders table, the quote table, and the inventory tables in a single transaction. This cannot be distributed across replicas.

Full-Page Cache Architecture and the Cache Invalidation Problem

Magento’s full-page cache — whether built-in or Varnish — is your first line of defense against peak traffic. A cached page response served by Varnish consumes near-zero PHP or database resources. The cache miss rate is the single most important metric to monitor during a peak event.

The cache invalidation problem: Magento invalidates full-page cache entries aggressively when catalog or inventory data changes. A bulk price change for a promotional event can invalidate the cache for every product page and category page simultaneously, causing a cache stampede — every invalidated page is requested at the same time, and all requests hit PHP and the database concurrently.

The mitigation is staggered cache warming. Before your promotional event, run a crawler across your entire catalog to pre-warm the cache. After a bulk invalidation, use a queue-based cache warmer rather than allowing organic traffic to trigger the cache rebuild. This converts a cache stampede into a controlled warm-up.

Session Backend Selection and Configuration

Magento stores session data for every visitor — authenticated or not. The default session backend (files on disk) does not scale beyond a single application server. For multi-server deployments, sessions must be stored in a shared backend — Redis is the standard choice.

The Redis session backend configuration that matters for peak load: disable_locking set to true for session reads, and appropriate connection pool sizing. Magento’s default Redis session implementation acquires a lock on every session write. Under high concurrency, this can queue session writes and contribute to PHP process blocking for authenticated users.

Separate Redis instances for session storage and full-page cache. Under peak load, session and cache traffic compete for Redis connections and memory. Running them on separate instances prevents cache eviction from affecting session availability and prevents session traffic spikes from affecting cache performance.

Queue Architecture for Deferred Processing

Magento’s message queue framework (RabbitMQ or database-backed) allows certain operations to be processed asynchronously. The operations that benefit most from async processing during peak load are: inventory reservation, order status notifications, and stock alert emails.

Inventory reservation — the process of decrementing available stock when an order is placed — is a synchronous operation by default. Under concurrent checkout load, this creates database contention. Switching to asynchronous inventory reservation (available in Magento’s inventory management configuration) moves the decrement operation out of the checkout critical path, reducing checkout latency and database lock contention during concurrent orders.

Ensure your queue consumers are running and scaled appropriately before the event. A queue consumer that falls behind under load will cause order processing delays that persist after the traffic peak passes — you will be working through a backlog of unprocessed inventory operations for hours after the event ends.

AWS Auto-Scaling for Magento

Magento can run on AWS Auto Scaling Groups, but the stateful nature of a Magento deployment requires some architectural care. The application tier is stateless if sessions are in Redis and media assets are on S3 (via a Magento S3 storage module). New application server instances can be added to the Auto Scaling Group without manual configuration.

The database tier is not horizontally scalable in the same way. RDS vertical scaling (instance type change) requires a maintenance window. For Black Friday, the right approach is to be on the correct instance size before the event, not to rely on auto-scaling to get you there during it. Over-provision your database tier and scale back down after the event — the cost difference is small relative to the cost of a Black Friday outage.

Set your Auto Scaling scale-out policy to trigger early. A policy that triggers when CPU reaches 80% will start new instances when you are already under load — the instances take three to five minutes to come up, and you will have been degraded for the entire time. Trigger at 50-60% and accept some over-provisioning in exchange for headroom during the ramp.

Pre-Event Validation

Two weeks before the event: run a full cache warm on your production catalog. Verify that your Redis cluster is sized correctly for expected session volume. Run your monitoring through the alerting paths to confirm alerts are firing correctly. Verify your CDN cache hit rate for static assets.

One week before: run a synthetic load test from a staging environment against production-like infrastructure. The goal is not to simulate the exact Black Friday traffic pattern — it is to confirm that your instrumentation catches the failure modes you have mitigated for, and to identify any configuration changes from the past month that may have introduced regressions.

Day of: review your deployment freeze compliance (no code changes, no configuration changes), verify all queue consumers are running, check Redis memory utilization, and have your DBA available to review slow query logs if database latency increases.

Incident Response: What to Have Ready

Define your runbook before the event. The runbook should cover: how to scale PHP-FPM worker counts without a deployment, how to disable non-essential Magento modules (recommendations, loyalty lookups, third-party analytics) to reduce PHP execution time per request, how to put the site in maintenance mode if a critical failure occurs, and who has authority to make that call.

The most important runbook entry is the escalation decision tree. Who decides to disable a feature to keep the site up? Who decides to roll back a promotional price change if it caused a cache stampede? These decisions need clear ownership before the event, not during it.

Stripe Integration Pitfalls: What We Have Learned from 20+ Implementations
Stripe Integration Pitfalls: What We Have Learned from 20+ Implementations 1024 504 admin

Stripe is, by a significant margin, the best-documented payment API available for eCommerce developers. The documentation is accurate, the test environment is realistic, and the dashboard tooling is genuinely useful.

Stripe integrations still fail. Not because the documentation is wrong, but because the edge cases that cause production failures are not the ones you encounter while reading the guide.

After building and reviewing Stripe integrations across more than twenty eCommerce implementations — on Magento, WooCommerce, Shopify, and custom platforms — the failure patterns are consistent enough to document. This post is a summary of what we have seen and how to avoid it.

Webhook Delivery: Why Idempotency Is Not Optional

Stripe’s webhook delivery guarantees at-least-once delivery, not exactly-once. The same event will be sent multiple times if your webhook endpoint does not respond with a 200 within Stripe’s timeout window, or if Stripe retries due to a delivery failure on their side. This is documented. It is also the source of the most common production bugs we see in Stripe integrations.

If your webhook handler for payment_intent.succeeded triggers order fulfillment, and the same event is delivered twice, you may attempt to fulfill the same order twice. For physical goods, this creates a duplicate shipment. For digital goods, a duplicate delivery. For subscription activations, a duplicate account creation.

Every webhook handler must be idempotent. Before processing an event, check whether you have already processed it — by storing processed event IDs in your database and returning 200 immediately if the event ID is already recorded. This is not an optimization. It is a correctness requirement.

Event Ordering: Why You Cannot Assume Events Arrive in Sequence

Stripe does not guarantee that events arrive in chronological order. A charge.succeeded event may arrive after the charge.refunded event for the same charge. A customer.subscription.updated event may arrive before the customer.subscription.created event that preceded it.

Designing webhook handlers that assume events arrive in order — processing a refund only if the original charge is already recorded, for example — will fail intermittently and in ways that are difficult to reproduce. Instead, design handlers to be order-independent: record each event’s data into your system, then derive the current state from the totality of events received rather than from an assumed sequence.

Alternatively, when an event references an object whose current state matters (a charge, a subscription, a payment intent), fetch the current state from the Stripe API at processing time rather than relying on the event payload — which reflects the object state at event emission time, which may have already changed by the time the event arrives.

PaymentIntent vs Charge and When Each Applies

The PaymentIntents API is Stripe’s current recommendation for most payment flows, and it handles the full lifecycle of a payment including authentication (3DS), authorization, and capture. The older Charges API does not handle authentication flows and is no longer recommended for new integrations.

The nuance that causes confusion: PaymentIntents are not one-to-one with charges. A single PaymentIntent can involve multiple charge attempts — for example, if the first attempt fails authentication and the customer retries with a different card. Your order management logic should be keyed to the PaymentIntent ID, not the Charge ID, to correctly handle multi-attempt payment flows.

For multi-step payment flows — authorize now, capture on shipment — the PaymentIntent capture_method of manual holds the authorization until you explicitly capture it. Authorization holds expire after seven days by default. If your fulfillment SLA exceeds seven days, you need to handle authorization re-capture or the payment will fail at capture time.

Partial Capture and the Reconciliation Problems It Creates

Stripe allows partial capture — capturing less than the authorized amount when the final order total differs from the authorized total. This is useful when items are removed from an order between authorization and fulfillment. It is also a reliable source of reconciliation discrepancies.

The issue: the authorized amount and the captured amount are different. Your accounting integration, your settlement reporting, and your order management system all need to handle this correctly. Systems that assume captured amount equals authorized amount will produce incorrect financial reporting for any order with a partial capture.

If your platform supports order modifications after authorization, explicitly design for partial capture in your accounting and reconciliation flows. Log both the authorized amount and the captured amount. Reconcile against captured amounts, not authorized amounts.

Refund Architecture: Do Not Depend on Automatic Refunds

Stripe processes refunds promptly and reliably under normal conditions. “Under normal conditions” excludes high dispute rate periods, bank processing delays, and the edge case where a refund fails because the original charge was disputed and the funds are held pending dispute resolution.

Integrations that assume refunds always succeed immediately — updating order state to “refunded” before confirming the refund succeeded via the charge.refunded webhook — will occasionally show customers a refunded order that has not actually been refunded. This creates customer service problems and potential chargeback exposure.

Initiate the refund, set the order state to “refund pending,” and confirm the refund completion via the webhook event before updating the order state to “refunded.” This is more complex state management, but it accurately reflects what is actually happening.

Subscription Edge Cases: Trials, Proration, and Dunning

Stripe Billing is powerful, but subscription state management has edge cases that are not apparent from the basic documentation. Trial-to-paid conversion timing, proration behavior when customers change plans mid-cycle, and dunning configuration interact in ways that produce unexpected behavior if not explicitly tested.

Trial end behavior: when a trial ends and the first invoice is generated, the customer.subscription.updated event fires before the invoice is paid. If your activation logic depends on invoice payment, not trial status change, you need to handle the transition between these two events correctly. A customer whose trial converts but whose first payment fails should have a different account state than a customer who was never on a trial.

Dunning: Stripe’s Smart Retries use machine learning to retry failed charges at times with higher success probability. This is generally beneficial, but if your platform notifies customers about failed charges immediately on failure, you may be sending failed payment notifications for charges that Stripe would have successfully retried a few hours later. Coordinate your customer notification timing with Stripe’s retry schedule.

Rate Limiting Under Promotional Traffic Spikes

Stripe’s API rate limits are generous under normal operating conditions and become relevant primarily during promotional events — flash sales, Black Friday, product launches — when API call volume spikes sharply.

Common patterns that generate disproportionate API call volume: polling payment status on the client side rather than using webhooks, loading Stripe customer and payment method data on every page load rather than caching it, and creating a new PaymentIntent for every checkout session initiation rather than reusing existing intents for incomplete payments.

Review your API call patterns before a high-traffic event. Implement exponential backoff on 429 responses. Cache Stripe customer and payment method data on your side rather than fetching it from Stripe on every request. These changes reduce your API call volume by 60-70% under typical eCommerce traffic patterns.

Testing Stripe in Staging: What the Test Environment Does Not Replicate

Stripe’s test environment replicates the API surface accurately. It does not replicate production-level latency, production-level rate limiting behavior, or the specific failure modes that occur under high concurrent payment volume.

Test your error handling with Stripe’s test card numbers for specific failure modes: 4000000000000002 for card declined, 4000000000009995 for insufficient funds, 4000002760003184 for authentication required. Test your webhook handler idempotency by delivering the same event multiple times to your test endpoint. Test your retry logic by simulating API timeouts.

What you cannot test in staging: the latency distribution of real production payment authorizations, the behavior of specific card issuer banks under 3DS authentication, and the edge cases in Stripe’s fraud detection that only manifest with real cardholder data. Production monitoring — specifically, tracking authorization rates by card type and geography — will surface these issues faster than any staging environment.

Event-Driven Architecture for High-Volume eCommerce
Event-Driven Architecture for High-Volume eCommerce 780 779 admin

Event-driven architecture gets recommended frequently in eCommerce contexts, often for reasons that are either vague (“it’s more scalable”) or technically correct but contextually wrong for the system in question. The result is either systems that avoided event-driven architecture when it would have helped, or systems that adopted it and discovered the operational complexity was not worth the benefits they actually needed.

The question is not whether event-driven architecture is good. It is: good for what, and at what cost?

In eCommerce, there are specific scenarios where event-driven approaches produce meaningfully better outcomes — and specific scenarios where they add complexity without commensurate benefit. This post covers how to tell the difference, and what production event-driven eCommerce systems actually require to operate.

The Actual Value Proposition: What Events Buy You in eCommerce

Event-driven architecture buys you decoupling. When system A emits an event rather than calling system B directly, A does not need to know that B exists, does not wait for B to respond, and does not fail when B is unavailable. This is genuinely valuable in eCommerce systems where the order lifecycle touches many downstream systems — inventory management, fulfillment, accounting, marketing automation, loyalty — and where those systems have different availability, scaling, and ownership characteristics.

It also buys you replayability. An event log is a record of what happened. If a downstream consumer fails — or if you need to add a new downstream system after the fact — you can replay events and reconstruct state. For order processing, this is a significant operational benefit: a new accounting integration can be onboarded by replaying the order event history rather than requiring a data migration.

What events do not buy you: simplicity, low latency for synchronous operations, or easy debugging. An event-driven system has more moving parts than a synchronous one, and failures manifest differently — a message in a dead-letter queue is harder to notice than an exception in a request log.

Order Lifecycle Events: Where Event-Driven Helps

The order lifecycle is the canonical use case for event-driven architecture in eCommerce. An order moves through states — placed, paid, allocated, fulfilled, shipped, delivered, returned — and each state transition should trigger actions in multiple downstream systems. Inventory reservation on placement. Fulfillment queue entry on payment confirmation. Carrier integration on fulfillment. Customer notification on each significant state change.

Modeling this as direct synchronous calls creates tight coupling between the order service and every downstream system. When the fulfillment system is slow, the checkout flow slows with it. When a new downstream integration is added, the order service needs to be modified.

Modeling this as events — OrderPlaced, OrderPaid, OrderFulfilled, OrderShipped — allows each downstream system to consume the events it cares about independently. The order service does not know or care how many systems are listening. New integrations subscribe to existing events without modifying the producer.

Inventory Events: Synchronization Without Polling

Multi-channel inventory — a single product catalog distributed across a website, Amazon, a wholesale portal, and potentially a physical POS — has a synchronization problem. Oversell on any channel is a customer service failure. Polling each channel’s inventory API at regular intervals introduces lag and accumulates API quota.

Event-driven inventory synchronization publishes an InventoryAdjusted event whenever stock levels change — on purchase, on return, on warehouse receipt, on manual adjustment. Every channel subscribes and updates its local availability accordingly. The lag between a stock event and channel availability update is a function of event processing time, not polling interval.

The critical requirement: inventory events need to be processed in order per SKU, or the final inventory position can be wrong. If two events for the same SKU arrive out of order — a decrement before an increment, where the increment happened first — the resulting inventory state is incorrect. Partition your event stream by SKU so that events for the same product are processed sequentially.

Marketplace and Fulfillment Event Flows

Marketplace integrations — Amazon, TikTok Shop, Walmart — generate events from the marketplace side that need to flow into your internal systems: new orders, shipment confirmations, returns initiated, ASIN status changes. These are naturally event-shaped data: something happened on the marketplace, and your systems need to react.

The integration pattern is an inbound event queue that receives marketplace webhooks or polling results, normalizes them to your internal event schema, and publishes them to your internal event stream. Downstream consumers — your OMS, your inventory system, your accounting integration — process marketplace events the same way they process events from your own platform.

Normalization at the inbound boundary is important. Amazon’s order schema and Walmart’s order schema differ. Your downstream systems should not need to know which marketplace an order came from to process it correctly. The normalization layer absorbs the marketplace-specific schema differences so the rest of your system deals with a consistent internal model.

The Outbox Pattern: Why It Matters for eCommerce

The dual-write problem: you update your database and publish an event. If the database write succeeds but the event publish fails, your system state is inconsistent with the event stream. If the event publishes but the database write fails, you have an event for a state change that did not actually happen.

The outbox pattern solves this. Rather than writing to the database and publishing to the message broker in two separate operations, write both the business state change and the outgoing event to the database in a single transaction. A separate process polls the outbox table and publishes pending events to the broker, marking them as published on success.

For eCommerce order processing, this is not optional — it is a correctness requirement. An OrderPaid event published for a payment that did not actually commit to the database will trigger fulfillment, shipping label generation, and customer notification for an order that does not exist in a paid state. The downstream consequences are expensive to unwind.

Saga Orchestration for Multi-Step Transactions

eCommerce checkout involves multiple operations that need to succeed together: payment authorization, inventory reservation, order record creation. In a distributed system, these operations span system boundaries — you cannot wrap them in a single database transaction. If payment succeeds but inventory reservation fails, you have charged the customer for an item you cannot fulfill.

The saga pattern models this as a sequence of local transactions with compensating transactions for rollback. Payment authorized ? reserve inventory ? create order record. If inventory reservation fails, execute the compensating transaction: void the payment authorization. If order creation fails, execute compensating transactions for both inventory and payment.

Orchestrated sagas use a central coordinator (a saga orchestrator) that drives the sequence and handles failures. Choreographed sagas use events — each step publishes an event that triggers the next step, and failure events trigger compensating steps. Orchestrated sagas are easier to reason about for complex flows; choreographed sagas distribute the logic across services but are harder to trace.

Consumer Group Design and Failure Isolation

When multiple downstream systems consume from the same event stream, consumer group design determines how events are distributed and how failures in one consumer affect others. Each consumer group maintains its own offset — its position in the event stream. A failure in one consumer group does not affect other consumer groups’ progress. This is the isolation property that makes event-driven architecture robust for multi-consumer eCommerce systems.

Design consumer groups around ownership, not function. The fulfillment system owns its consumer group. The accounting system owns its. If the accounting system is down for maintenance, the fulfillment consumer group processes events uninterrupted. When accounting comes back up, it resumes from where it left off without any events being lost.

What Event-Driven Architecture Costs Operationally

A production event-driven system requires: a message broker (Kafka, RabbitMQ, AWS SQS/SNS, or a managed equivalent) with appropriate retention, replication, and monitoring. Dead-letter queue handling for messages that fail processing after retries. Distributed tracing that can correlate an event across multiple consumers — without it, debugging production failures becomes very difficult. Schema management for event payloads — uncontrolled schema changes break consumers.

The operational cost is real. Teams that adopt event-driven architecture for a two-service system and then discover they need all of the above tooling in place to operate it reliably often conclude that a simpler architecture would have served them better. Event-driven architecture is the right choice when the decoupling benefits justify the operational investment — which, for a high-volume eCommerce system with multiple downstream integrations, they usually do. For a system with two services and low event volume, they usually do not.

Amazon SP-API Integration for Magento and WooCommerce: A Practical Guide
Amazon SP-API Integration for Magento and WooCommerce: A Practical Guide 780 781 admin

Amazon’s Selling Partner API replaced the older MWS (Marketplace Web Service) platform and brought significant changes to how merchants connect their eCommerce systems to Amazon’s marketplace infrastructure. The SP-API surface is broader, the authentication model is different, and the migration from MWS left a lot of existing integrations in a broken state.

For merchants running Magento or WooCommerce, integrating with Amazon SP-API directly means dealing with OAuth authentication, LWA (Login with Amazon) credential management, rate limiting, and the specific quirks of Amazon’s feed-based data synchronization model. Off-the-shelf plugins exist, but most handle the basic cases and fail under the edge cases that volume merchants encounter daily.

This guide covers what a production Amazon SP-API integration actually requires — specifically for Magento and WooCommerce — including the inventory sync architecture, order routing logic, label generation workflow, and the edge cases that are not covered in Amazon’s documentation.

SP-API vs MWS: What Changed and What It Means for Integrations

The core architectural shift from MWS to SP-API is the move from AWS Signature Version 2 to LWA (Login with Amazon) OAuth 2.0. This is not a minor update. It means every integration built on MWS credentials needs a complete re-authentication implementation. Refresh token management, access token rotation, and the SP-API role assignment model all need to be handled correctly for the integration to function at all.

Beyond authentication, SP-API introduced a rate limiting model based on leaky bucket algorithms rather than the simpler request quotas MWS used. Each API operation has its own burst and restore rate. Hitting these limits in production — especially during peak sync windows — causes silent failures if your error handling does not account for 429 responses correctly.

Authentication: LWA, OAuth, and Credential Rotation

SP-API authentication requires three credential sets: the LWA client ID and secret (registered in your developer application), the refresh token (granted when a seller authorizes your application), and the AWS IAM credentials for signing requests. All three need to be present and valid for every API call.

The access token derived from the refresh token expires in one hour. Your integration needs to cache it for the duration of its validity and refresh it proactively before it expires — not reactively after a 401. Reactive refresh causes race conditions in multi-threaded sync processes that are difficult to debug under production load.

Store refresh tokens encrypted at rest. They are long-lived credentials that grant ongoing access to the seller account. Treat them with the same security discipline as a private key.

Inventory Sync Architecture: Polling vs. Notifications

For inventory pushes from your system to Amazon, the SP-API Feeds API is the standard path. Submit a JSON or XML feed with your inventory quantities, poll the feed status endpoint until processing completes, then check for processing errors in the result document. This is synchronous in principle and asynchronous in practice — feed processing times under normal conditions range from under a minute to over ten minutes depending on catalog size and Amazon system load.

For order pulls from Amazon into your system, the SP-API Notifications API (via SQS) is preferable to polling the Orders API. Polling at five-minute intervals accumulates API quota faster than notifications and introduces latency into your order processing pipeline. Set up an SQS destination, subscribe to ORDER_CHANGE notifications, and process new orders in near-real-time.

The practical caveat: SQS notification delivery is not guaranteed to be in order. Two notifications for the same order may arrive out of sequence. Your order processing logic needs to handle this — typically by always fetching the current order state from the Orders API when processing a notification rather than trusting the notification payload alone.

Order Routing: FBA vs MFN and Multi-Warehouse Logic

Orders on Amazon arrive as either FBA (Fulfilled by Amazon) or MFN (Merchant Fulfilled Network) orders. FBA orders do not require you to generate shipping labels — Amazon handles fulfillment. But you still need to pull them for reconciliation, accounting, and inventory adjustment purposes.

MFN orders require you to confirm shipment via the SP-API Shipments endpoint within the handling time window you committed to. Miss that window and Amazon degrades your seller metrics. Your order routing logic needs to identify MFN orders, route them to the appropriate warehouse, generate a label, and confirm shipment — all within the handling time window.

For multi-warehouse merchants, the routing logic needs to consider inventory position at each location, carrier availability, and shipment cost. Amazon does not impose routing logic on you, but your seller metrics — specifically, on-time delivery rate and pre-fulfillment cancellation rate — will reflect the outcomes of whatever routing decisions you make.

Label Generation and Shipment Confirmation Workflow

Generating labels via the SP-API Merchant Fulfillment API is straightforward for standard shipments. The complexity emerges in the edge cases: orders with multiple packages, dangerous goods restrictions on certain product categories, and the mismatch between Amazon’s available carrier options and your negotiated carrier rates.

For merchants with volume carrier contracts, the economics usually favor generating labels outside of Amazon’s Merchant Fulfillment API (using your own carrier integration) and confirming the shipment via the Orders API with your own tracking data. Amazon’s Merchant Fulfillment rates are competitive for small parcels but rarely beat volume-contracted rates for heavier shipments.

Shipment confirmation must include a valid tracking number and carrier code that Amazon recognizes. Submit invalid carrier codes and the shipment confirmation is accepted but the tracking data does not populate in the buyer-facing order detail — generating “where is my order” contacts that affect your customer satisfaction metrics.

Reconciliation: Catching Discrepancies Before Amazon Does

Amazon’s reconciliation tooling catches discrepancies — between what you reported shipping and what the carrier confirms, between inventory you claim to have and what Amazon’s systems register. When Amazon catches a discrepancy, the resolution process is slow and the consequences for seller metrics can be significant.

Build reconciliation into your integration rather than relying on Amazon to surface problems. Daily reconciliation jobs that compare your OMS order state against SP-API order state, your inventory feed submissions against Amazon’s current inventory levels, and your shipment confirmations against carrier tracking events will catch most discrepancies within 24 hours — when they are easy to resolve.

Magento-Specific Implementation Notes

Magento’s inventory management — particularly MSI (Multi Source Inventory) — adds complexity to Amazon sync because the saleable quantity calculation involves source allocations that are not always directly mappable to the per-SKU quantity Amazon expects. Make sure your inventory sync reads the saleable quantity for the appropriate stock scope, not the aggregate source-level quantity.

Order import from Amazon into Magento should create orders in a custom state (not the standard pending state) to prevent Magento’s standard order processing workflows from triggering prematurely. Amazon orders need to flow through a separate fulfillment workflow that interacts with SP-API for label generation and shipment confirmation before updating order state in Magento.

WooCommerce-Specific Implementation Notes

WooCommerce’s inventory model is simpler than Magento MSI, which makes the sync implementation more straightforward — but variable products with multiple attribute combinations require careful SKU-to-variation mapping. Amazon’s catalog treats each variation as a separate ASIN or child ASIN. Your mapping table needs to handle the relationship between WooCommerce variation IDs and Amazon ASIN/seller SKU pairs.

WooCommerce’s order creation hooks make it relatively easy to import Amazon orders and have them appear in the standard WooCommerce order management interface. The integration complexity is in the fulfillment side — specifically, preventing WooCommerce’s standard email notifications from firing on Amazon orders and ensuring the WooCommerce stock decrement happens at the right point in the Amazon fulfillment workflow, not at order import time.