Amazon Order ETL & FTP Upload

Overview

ShipmentBot automatically exports Amazon customer order data, converts it to the Seller Central tab-delimited CSV format, and delivers it to your organisation (or a 3PL) via FTP/SFTP every day. The end-to-end flow is implemented with three background jobs and a handful of configurable business/supplier settings.

flowchart TD
    A[Amazon SP-API<br/>GET_FLAT_FILE_ALL_ORDERS_DATA_BY_LAST_UPDATE_GENERAL] -->|30-day window| B(Extract Job)
    B --> C(Order Report ActiveStorage Blob)
    C --> D(Translate Job)
    D --> E[Filtered & de-duplicated CSV]
    E --> F(Load Job)
    F --> G[FTP/SFTP<br/>seller-central-orders/*.csv]
    F --> H[Email delivery summary]

Jobs:
β€’ Extraction job – pulls the raw report from Amazon.
β€’ Transformation job – filters, de-duplicates & stages the file.
β€’ Loading job – uploads the final CSV to FTP/SFTP and emails a receipt.


1. Extraction (background job)

BehaviourSource code reference
Trailing 30-day window – yesterday-29 days β†’ yesterdaystart_date = (yesterday - 29.days).beginning_of_day
end_date = yesterday.end_of_day
Retries until the report is DONEJob re-queues itself with an exponential delay run_at: (1.minutes * attempt).from_now
Stores the raw tab-delimited file as order_reports attachmentcurrent_business.order_reports.attach(blob)
Kicks off the Transformation Job once the blob is persisted(debounced to avoid duplicate runs)

2. Transformation (background job)

  1. Reads the raw report into memory (CSV.parse … liberal_parsing: true).
  2. Applies business/supplier settings:
    • exclude_pending, exclude_cancelled, exclude_return_orders.
    • Full FTP transfer can be toggled via ftp_transfer.
  3. De-duplication:
    • For every row we identify the matching internal order item records
    • If any item in the scope already has a flat_file_<status>_at timestamp the row has already been shipped and is skipped (row_has_already_been_sent?).
  4. Status-driven tracking: rows that are kept are grouped by status (Pending, Shipped, Cancelled,…) and the matching order-items receive the corresponding flat_file_<status>_at = Date.today stamp inside a DB transaction.
  5. Generates a clean CSV (backup) and attaches it to transferred_order_reports.
  6. Enqueues the Load Job inside the same DB transaction so we are guaranteed to either upload all stamped rows or none.

3. Loading (background job)

BehaviourDetails
ProtocolChooses SFTP (net-sftp) or FTP (net/ftp) based on the ftp_uri scheme.
Remote directoryAlways seller-central-orders/ – created if missing.
File naming<Business-Name>-<timestamp>.csv (parameterised).
Duplicate safetyUploaded files are listed after transfer and logged via ColumnUpdate.push!(…, :ftp_uploaded_at) so the same attachment is never sent twice.
NotificationSends an HTML email via Resend with the upload summary and the CSV attached.

Business & Supplier Configuration

SettingScopeEffect
activeBusiness / SupplierTurns the entire pipeline on/off per entity.
ftp_transferBusinessIf false we still build the CSV and send the email, but do not upload to FTP.
exclude_pendingBusinessSkip orders with status Pending, PartiallyShipped, Shipping.
exclude_cancelledBusinessSkip orders with status Cancelled.
exclude_return_ordersBusinessSkip orders whose Amazon Order ID starts with R00… (returns).

Supplier-level jobs are identical to the business flow but scoped to a single supplier (using a dedicated transformation job).


Deduplication Guarantees

  1. Row level: A row is emitted only if every matching internal order item record is missing a flat_file_<status>_at timestamp.
  2. File level: ActiveStorage MD5 checksums prevent attaching the exact same blob twice.
  3. FTP level: The Load Job queries the remote directory after upload; filenames that already exist are not re-sent.

Monitoring & Alerts

  • Job metrics – each job logs to Rails; failures bubble up to the default Delayed::Job error queue.
  • Staleness guard – if order_reports haven’t been generated for >30 h, Extract Job emails admins (BaseMailer.send_email_admins).
  • Email receipts – every successful load includes a row/skip breakdown and historical shipment counts to facilitate cost monitoring.

File Format Cheat-Sheet

Amazon delivers a tab-delimited file that contains >180 columns. The pipeline relies on only a handful:

IndexHeaderPurpose
0amazon-order-idPrimary key for the order.
4order-statusPending / Shipped / Cancelled etc.
10skuMerchant SKU.
14quantityItems in this row.
16item-price(converted to unit price downstream if required).

The Translate Job asserts the header positions to fail fast if Amazon modifies the report layout (assert_row_index).


Extending the Pipeline

  • Additional filters – add new predicates in the transformation job before the row is written.
  • Alternative destinations – hook into the blob attachment (inside the DB transaction) to push to S3, BigQuery, etc.
  • Custom pricing – the job already collects monetary columns; enrich rows inside the pricing conversion step.

Changelog

DateChange
2025-06-16Initial spec drafted.

Back to top

Copyright © ShipmentBot. All rights reserved.

Page last modified: Jun 16 2025 at 12:00 AM.