Amazon Order ETL & FTP Upload
Overview
ShipmentBot automatically exports Amazon customer order data, converts it to the Seller Central tab-delimited CSV format, and delivers it to your organisation (or a 3PL) via FTP/SFTP every day. The end-to-end flow is implemented with three background jobs and a handful of configurable business/supplier settings.
flowchart TD
A[Amazon SP-API<br/>GET_FLAT_FILE_ALL_ORDERS_DATA_BY_LAST_UPDATE_GENERAL] -->|30-day window| B(Extract Job)
B --> C(Order Report ActiveStorage Blob)
C --> D(Translate Job)
D --> E[Filtered & de-duplicated CSV]
E --> F(Load Job)
F --> G[FTP/SFTP<br/>seller-central-orders/*.csv]
F --> H[Email delivery summary]
Jobs:
β’ Extraction job β pulls the raw report from Amazon.
β’ Transformation job β filters, de-duplicates & stages the file.
β’ Loading job β uploads the final CSV to FTP/SFTP and emails a receipt.
1. Extraction (background job)
Behaviour | Source code reference |
---|---|
Trailing 30-day window β yesterday-29 days β yesterday | start_date = (yesterday - 29.days).beginning_of_day end_date = yesterday.end_of_day |
Retries until the report is DONE | Job re-queues itself with an exponential delay run_at: (1.minutes * attempt).from_now |
Stores the raw tab-delimited file as order_reports attachment | current_business.order_reports.attach(blob) |
Kicks off the Transformation Job once the blob is persisted | (debounced to avoid duplicate runs) |
2. Transformation (background job)
- Reads the raw report into memory (
CSV.parse β¦ liberal_parsing: true
). - Applies business/supplier settings:
exclude_pending
,exclude_cancelled
,exclude_return_orders
.- Full FTP transfer can be toggled via
ftp_transfer
.
- De-duplication:
- For every row we identify the matching internal order item records
- If any item in the scope already has a
flat_file_<status>_at
timestamp the row has already been shipped and is skipped (row_has_already_been_sent?
).
- Status-driven tracking: rows that are kept are grouped by status (Pending, Shipped, Cancelled,β¦) and the matching order-items receive the corresponding
flat_file_<status>_at = Date.today
stamp inside a DB transaction. - Generates a clean CSV (
backup
) and attaches it totransferred_order_reports
. - Enqueues the Load Job inside the same DB transaction so we are guaranteed to either upload all stamped rows or none.
3. Loading (background job)
Behaviour | Details |
---|---|
Protocol | Chooses SFTP (net-sftp ) or FTP (net/ftp ) based on the ftp_uri scheme. |
Remote directory | Always seller-central-orders/ β created if missing. |
File naming | <Business-Name>-<timestamp>.csv (parameterised). |
Duplicate safety | Uploaded files are listed after transfer and logged via ColumnUpdate.push!(β¦, :ftp_uploaded_at) so the same attachment is never sent twice. |
Notification | Sends an HTML email via Resend with the upload summary and the CSV attached. |
Business & Supplier Configuration
Setting | Scope | Effect |
---|---|---|
active | Business / Supplier | Turns the entire pipeline on/off per entity. |
ftp_transfer | Business | If false we still build the CSV and send the email, but do not upload to FTP. |
exclude_pending | Business | Skip orders with status Pending, PartiallyShipped, Shipping. |
exclude_cancelled | Business | Skip orders with status Cancelled. |
exclude_return_orders | Business | Skip orders whose Amazon Order ID starts with R00β¦ (returns). |
Supplier-level jobs are identical to the business flow but scoped to a single supplier (using a dedicated transformation job).
Deduplication Guarantees
- Row level: A row is emitted only if every matching internal order item record is missing a
flat_file_<status>_at
timestamp. - File level: ActiveStorage MD5 checksums prevent attaching the exact same blob twice.
- FTP level: The Load Job queries the remote directory after upload; filenames that already exist are not re-sent.
Monitoring & Alerts
- Job metrics β each job logs to Rails; failures bubble up to the default Delayed::Job error queue.
- Staleness guard β if
order_reports
havenβt been generated for >30 h, Extract Job emails admins (BaseMailer.send_email_admins
). - Email receipts β every successful load includes a row/skip breakdown and historical shipment counts to facilitate cost monitoring.
File Format Cheat-Sheet
Amazon delivers a tab-delimited file that contains >180 columns. The pipeline relies on only a handful:
Index | Header | Purpose |
---|---|---|
0 | amazon-order-id | Primary key for the order. |
4 | order-status | Pending / Shipped / Cancelled etc. |
10 | sku | Merchant SKU. |
14 | quantity | Items in this row. |
16 | item-price | (converted to unit price downstream if required). |
The Translate Job asserts the header positions to fail fast if Amazon modifies the report layout (assert_row_index
).
Extending the Pipeline
- Additional filters β add new predicates in the transformation job before the row is written.
- Alternative destinations β hook into the blob attachment (inside the DB transaction) to push to S3, BigQuery, etc.
- Custom pricing β the job already collects monetary columns; enrich rows inside the pricing conversion step.
Changelog
Date | Change |
---|---|
2025-06-16 | Initial spec drafted. |