Sathvik Reddy Anumasu
+1-646-***-**** ************@*****.*** LinkedIn GitHub
PROFESSIONAL SUMMARY
Data Engineer with 5 years of experience designing and optimizing cloud-native data platforms across AWS and Azure. Skilled in building scalable ETL/ELT pipelines, real-time streaming (Kafka, Kinesis), and orchestration with Airflow. Experienced in SQL and NoSQL databases, data warehousing, and data lakes with strong focus on data lineage, metadata management, and catalogs. Proficient in CI/CD, Docker, and Kubernetes for deploying data services and API integrations. Adept in Python, Spark, and MLOps (MLflow, Azure ML), with a track record of improving cost efficiency, governance (GDPR, HIPAA), and time-to-insight in analytics and ML-driven solutions.
TECHNICAL SKILLS
Programming Languages: Python, SQL, Scala, Java, .Net, R, Bash
Warehouse: Snowflake, Synapse, Redshift, Big Query, Delta Lake
Data Processing: Spark, Databricks, Airflow, Glue, ADF, DBT, Hadoop, Hive
DevOps & IaC: Git, Jenkins, GitHub, Terraform, Docker, Kubernetes
Streaming: Kafka, Event Hubs, Kinesis Firehose, SNS, SQS
MLOps : MLflow, Azure ML, Feature Stores, Batch/Real-Time Inference
Could Platforms & Services: AWS (S3, Redshift, Glue, Lambda, Kinesis), Azure (ADF, Synapse, Fabric, ADLS, Blob Storage, Event Hubs), GCP (Big Query, Dataflow, Dataproc, GCS)
Visualization & Monitoring: Power BI, Tableau, Grafana, CloudWatch, Azure Monitor, ELK Stack
Databases: SQL (PostgreSQL, MySQL, SQL Server, Oracle), NoSQL (DynamoDB, Cosmos DB, MongoDB)
Modeling & Governance: Star, Snowflake, Data Vault, Data Lineage, Dimensional, Metadata Management, Purview, GDPR, HIPAA
AI/ML: OpenAI, TensorFlow, Pandas, NumPy, Lang Chain, Hugging Face
Operating Systems: Windows, Linux, Unix
WORK EXPERIENCE
Vanguard Data Engineer July 2024 – Present
Stack: AWS (S3, Glue, Lambda, Step Functions, EMR, Redshift, DMS, RDS, DynamoDB, EC2, ELB, Athena, QuickSight, CloudWatch), Snowflake, Matillion, Terraform, Jenkins, Git, Docker, Kubernetes, Python (PySpark, Pandas, boto3, MLflow), Java, SQL, Power BI, Tableau
Designed and deployed ETL/ELT pipelines using AWS Glue, S3, Lambda, and Step Functions, enabling ingestion, transformation, and reporting with Athena and QuickSight.
Built batch and streaming data workflows on EMR, Redshift, and AWS DMS, supporting cross-platform sync and change data capture.
Optimized Matillion ETL workflows and SQL queries in Redshift, applying partitioning and distribution strategies to improve reporting efficiency and reduce costs.
Developed and managed Snowflake ELT pipelines using DBT and Fivetran, integrating external sources into AWS for analytics and reporting.
Automated RDS/DynamoDB operations (provisioning, backup, monitoring) using Python and boto3, reducing manual tasks and improving system reliability.
Implemented CI/CD pipelines with Terraform, Jenkins, GitLab, and deployed workloads on S3, EC2, and Elastic Load Balancer.
Securian Financial Data Engineer Aug 2023 – July 2024
Stack: AWS (S3, Glue, Lambda, EMR, Kinesis, Redshift, DMS, Snowball, Lake Formation, IAM, Athena, QuickSight, CloudWatch), Snowflake, Airflow, Terraform, Jenkins, Git, Docker, Kubernetes, Python (PySpark, Pandas, boto3), SQL
Engineered ETL/ELT pipelines with AWS Glue, Lambda, EMR, and S3, ingesting data from Oracle, SQL Server, and mainframes into a layered lake (Parquet/ORC/CSV) for downstream analytics in Athena/QuickSight.
Built streaming workflows using Kinesis (and Airflow for orchestration), delivering curated data to Snowflake/Redshift for near-real-time reporting.
Tuned Redshift workloads (distribution/sort keys, query optimization, workload management) and leveraged Auto Scaling to improve performance and resource efficiency.
Managed migrations to AWS with DMS and Snowball, standardizing patterns for reliable cutovers from on-prem systems.
Implemented governance and compliance with Lake Formation and IAM (RBAC, fine-grained access, masking) plus automated data validation/QA to raise pipeline reliability across regulated data sets (insurance/health).
University Of Missouri Kansas City Graduate Teaching Assistant Jan 2023 – Dec 2023
Stack: Java, Python, SQL, Tableau, Power BI, AWS EC2, AWS S3, AWS Glue, AWS Redshift
Developed a Cloud-Based Content Management System and hosted an interactive quiz application on AWS S3 and succeeded in a remarkable 99.9% uptime by deploying it on an AWS EC2 instance, increasing usability and maintenance.
Mentored and guided 50 students offering in-depth feedback on assignments and facilitating project walkthroughs that helped students in building scalable and efficient web applications.
Collaborated with university IT teams to troubleshoot and optimize software applications, ensuring seamless operation across distributed systems.
Team-Up! Python Developer Mar 2021 – Apr 2022
Stack: Python, SQL, Power BI, Excel, Azure Data Factory, Data Bricks
Analyzed customer data using SQL and Power BI to generate actionable insights that improved customer experience.
Created predictive models using regression analysis and clustering techniques to segment customers and optimize marketing strategies.
Designed and maintained ETL workflows using Azure Data Factory, automating data ingestion from multiple sources.
EDUCATION
University of Missouri Kansas City
Master of Science, Data Science
Coursework: Data Science, Machine Learning Algorithms, Artificial Intelligence, Introduction to Statistical Learning, Human Computer Interaction,
Big Data Technologies, Cloud Computing, Data Structures and Algorithms, Data Warehouse Mining, Internet of Things
Sathyabama Institute of Science and Technology
Bachelor of Science, Computer Science
CERTIFICATIONS
AWS Certified Solutions Architect - Associate
Certified Associate Python Programmer - Python Institute
Microsoft Certified: Power BI Data Analyst Associate