Responsibilities
- Architect and manage AWS cloud infrastructure spanning multiple geographic regions
- Oversee intricate networks involving multiple Virtual Private Clouds
- Lead infrastructure-as-code implementation using CloudFormation, with selective use of AWS CDK in Python where it enhances maintainability
- Provision and sustain environments for development, testing, staging, and production
- Administer EC2 instances, SQL Server on RDS, Redis caches, and Elasticsearch clusters
- Set up and manage secure connections via VPN and integrate with Active Directory for authentication and access control
- Ensure systems are highly available, resilient, and prepared for disaster recovery using multi-Availability Zone deployments
- Maintain and enhance Jenkins-based CI/CD pipelines integrated with Bitbucket repositories
- Enable automated deployments with pre-deployment change validation and reliable rollback mechanisms
- Utilize bash scripting to automate deployment processes
- Coordinate release activities across various environments and regions
- Collaborate with engineering teams to enhance build speed, testing coverage, and deployment confidence
- Drive consistency across environments to eliminate configuration drift and local setup issues
- Enforce security best practices throughout cloud platforms and application layers
- Manage Web Application Firewall (WAF) rules and enable logging at both regional and global levels
- Support adherence to regulatory standards such as PCI-DSS, PSD2, and data privacy requirements
- Securely manage secrets and configuration data using AWS Systems Manager Parameter Store
- Oversee SSL/TLS certificate lifecycle using AWS Certificate Manager
- Configure and enforce network security through security groups, network ACLs, and segmentation policies
- Partner with engineering and leadership on risk assessments, audits, and incident response planning
- Operate and enhance the ELK stack (Elasticsearch, Logstash, Kibana, Filebeat) for centralized logging and monitoring
- Define and maintain CloudWatch alerts, performance metrics, and operational dashboards
- Identify and act on opportunities to improve system performance, reliability, and cost efficiency
- Lead incident response efforts, conduct root cause analysis, and implement corrective actions
- Monitor and optimize content delivery via CloudFront, S3 storage usage, and data transfer expenses


