Amazon Order ETL & FTP Upload

Overview

ShipmentBot automatically exports Amazon customer order data, converts it to the Seller Central tab-delimited CSV format, and delivers it to your organisation (or a 3PL) via FTP/SFTP every day. The end-to-end flow is implemented with three background jobs and a handful of configurable business/supplier settings.

flowchart TD
    A[Amazon SP-API<br/>GET_FLAT_FILE_ALL_ORDERS_DATA_BY_LAST_UPDATE_GENERAL] -->|30-day window| B(Extract Job)
    B --> C(Order Report ActiveStorage Blob)
    C --> D(Translate Job)
    D --> E[Filtered & de-duplicated CSV]
    E --> F(Load Job)
    F --> G[FTP/SFTP<br/>seller-central-orders/*.csv]
    F --> H[Email delivery summary]

Jobs:
• Extraction job – pulls the raw report from Amazon.
• Transformation job – filters, de-duplicates & stages the file.
• Loading job – uploads the final CSV to FTP/SFTP and emails a receipt.

1. Extraction (background job)

Behaviour	Source code reference
Trailing 30-day window – yesterday-29 days → yesterday	`start_date = (yesterday - 29.days).beginning_of_day` `end_date = yesterday.end_of_day`
Retries until the report is DONE	Job re-queues itself with an exponential delay `run_at: (1.minutes * attempt).from_now`
Stores the raw tab-delimited file as `order_reports` attachment	`current_business.order_reports.attach(blob)`
Kicks off the Transformation Job once the blob is persisted	(debounced to avoid duplicate runs)

2. Transformation (background job)

Reads the raw report into memory (CSV.parse … liberal_parsing: true).
Applies business/supplier settings:
- exclude_pending, exclude_cancelled, exclude_return_orders.
- Full FTP transfer can be toggled via ftp_transfer.
De-duplication:
- For every row we identify the matching internal order item records
- If any item in the scope already has a flat_file_<status>_at timestamp the row has already been shipped and is skipped (row_has_already_been_sent?).
Status-driven tracking: rows that are kept are grouped by status (Pending, Shipped, Cancelled,…) and the matching order-items receive the corresponding flat_file_<status>_at = Date.today stamp inside a DB transaction.
Generates a clean CSV (backup) and attaches it to transferred_order_reports.
Enqueues the Load Job inside the same DB transaction so we are guaranteed to either upload all stamped rows or none.

3. Loading (background job)

Behaviour	Details
Protocol	Chooses SFTP (`net-sftp`) or FTP (`net/ftp`) based on the `ftp_uri` scheme.
Remote directory	Always `seller-central-orders/` – created if missing.
File naming	`<Business-Name>-<timestamp>.csv` (parameterised).
Duplicate safety	Uploaded files are listed after transfer and logged via `ColumnUpdate.push!(…, :ftp_uploaded_at)` so the same attachment is never sent twice.
Notification	Sends an HTML email via Resend with the upload summary and the CSV attached.

Business & Supplier Configuration

Setting	Scope	Effect
`active`	Business / Supplier	Turns the entire pipeline on/off per entity.
`ftp_transfer`	Business	If `false` we still build the CSV and send the email, but do not upload to FTP.
`exclude_pending`	Business	Skip orders with status Pending, PartiallyShipped, Shipping.
`exclude_cancelled`	Business	Skip orders with status Cancelled.
`exclude_return_orders`	Business	Skip orders whose Amazon Order ID starts with `R00…` (returns).

Supplier-level jobs are identical to the business flow but scoped to a single supplier (using a dedicated transformation job).

Deduplication Guarantees

Row level: A row is emitted only if every matching internal order item record is missing a flat_file_<status>_at timestamp.
File level: ActiveStorage MD5 checksums prevent attaching the exact same blob twice.
FTP level: The Load Job queries the remote directory after upload; filenames that already exist are not re-sent.

Monitoring & Alerts

Job metrics – each job logs to Rails; failures bubble up to the default Delayed::Job error queue.
Staleness guard – if order_reports haven’t been generated for >30 h, Extract Job emails admins (BaseMailer.send_email_admins).
Email receipts – every successful load includes a row/skip breakdown and historical shipment counts to facilitate cost monitoring.

File Format Cheat-Sheet

Amazon delivers a tab-delimited file that contains >180 columns. The pipeline relies on only a handful:

Index	Header	Purpose
0	`amazon-order-id`	Primary key for the order.
4	`order-status`	Pending / Shipped / Cancelled etc.
10	`sku`	Merchant SKU.
14	`quantity`	Items in this row.
16	`item-price`	(converted to unit price downstream if required).

The Translate Job asserts the header positions to fail fast if Amazon modifies the report layout (assert_row_index).

Extending the Pipeline

Additional filters – add new predicates in the transformation job before the row is written.
Alternative destinations – hook into the blob attachment (inside the DB transaction) to push to S3, BigQuery, etc.
Custom pricing – the job already collects monetary columns; enrich rows inside the pricing conversion step.

Changelog

Date	Change
2025-06-16	Initial spec drafted.