AWS architecture blueprint for a scalable file-processing pipeline

Create an extendable AWS architecture blueprint for a scalable file-processing pipeline, capable of handling periodic CSV file uploads for processing and notifications

@amatore

Author

Design
IDE
Dev Agent
Public

Prompt

Create an extendable AWS architecture blueprint using AWS CDK for a scalable file-processing pipeline, capable of handling periodic CSV file uploads for processing and notifications.

Blueprint Requirements:

Architecture Components:
- Storage Layer:
        Use Amazon S3 as the primary storage for incoming files, with configurable event triggers to initiate the pipeline based on file uploads.
        Variables:
            {{s3_bucket_name}}: Name of the bucket to store incoming files.
            {{s3_event_trigger}}: Define event triggers (e.g., 'PUT' events for specific file extensions or prefixes).

- Processing Layer:
        Create an AWS Lambda function using CDK to parse and process CSV files, with configurable concurrency and timeout settings.
        Define standard output metadata (e.g., file name, processing timestamp, and status) for easy downstream consumption.
        Variables:
            {{lambda_memory}}: Memory allocation for optimal processing.
            {{lambda_timeout}}: Timeout setting to handle variable file sizes.
            {{lambda_output_structure}}: Define output metadata format for integration with downstream services.

-  Error Handling and Queueing:
        Implement an SQS queue with a Dead Letter Queue (DLQ) for robust error handling and retries on failed events.
        Variables:
            {{sqs_queue_name}}: Name of the primary queue.
            {{dlq_name}}: Name of the Dead Letter Queue for unprocessed items.
            {{sqs_retry_limit}}: Maximum retries before moving to the DLQ.

- Notification and Alerting Layer:
        Set up an SNS topic for notifications on successful or failed processing outcomes, with customizable message formats and topics.
        Variables:
            {{sns_topic_name}}: Topic for success/failure notifications.
            {{sns_message_format}}: Define message structure to include details like file name, processing status, and processing time.

- Extendability and Customization:
  -  Data Store Options: Design for integration with a metadata storage solution, such as DynamoDB or RDS, to track file processing history and results.
  - Monitoring and Observability: Include CloudWatch metrics and logs with hooks for integrating custom observability tools, allowing expansion for specific monitoring requirements.
  - Security and Compliance: Use CDK to apply IAM roles and KMS encryption, with options for additional security measures (e.g., network isolation) as needed.

- Documentation:
  - README: Provide a comprehensive README file with CDK setup instructions, configuration options for each component, and usage examples to guide the customization and deployment of the blueprint.

- Purpose:
This CDK-based blueprint provides a flexible foundation for implementing a real-time or batch file-processing pipeline on AWS, allowing for scaling and integration of additional services over time.
AWS architecture blueprint for a scalable file-processing pipeline prompt for Amazon Q Developer