AWS Document Processing Pipeline
A production-style asynchronous document processing pipeline built entirely on AWS. Users upload PDF, CSV, or image files via a FastAPI REST endpoint - files are immediately stored in S3 and a processing job is enqueued in SQS, returning 202 Accepted without ever blocking the request thread. An EC2 worker in a private subnet polls the queue, processes the document, writes results back to S3, tracks job status in PostgreSQL (RDS), and fires an SNS email notification on completion.
Key engineering decisions: zero credentials on servers (EC2 uses IAM roles exclusively), full VPC isolation with public/private subnets, Dead Letter Queue preserving failed messages after 3 retries, structured JSON logging queryable in CloudWatch, and horizontal scaling by running multiple worker containers against the same SQS queue.
System Architecture