Name: Sravani
*******.*****@*****.***
Phone:980-***-****
PROFESSIONAL EXPERIENCE
Data Engineer with 11 years of experience designing, building, and optimizing large-scale, cloud-native data platforms across Google Cloud (GCP), AWS, and Azure. Proven expertise in architecting data pipelines, data lakes, and data warehouses to support enterprise analytics, business intelligence, and real-time decision-making.
Skilled in developing end-to-end ETL/ELT workflows using Apache Spark, Hadoop, Kafka, Airflow, Cloud Dataflow, and Azure Data Factory. Strong hands-on experience with data modeling, data migration, and performance tuning, ensuring reliability, scalability, and cost optimization of data platforms.
Proficient in Python and SQL for data engineering, transformations, and automation. Experienced in managing both structured and unstructured data across platforms such as BigQuery, Snowflake, Redshift, Oracle, MySQL, PostgreSQL, SQL Server, and Cosmos DB.
Expertise in implementing real-time streaming solutions with Kafka, Kinesis, and Pub/Sub, enabling low-latency analytics and business insights. Demonstrated success in leading cloud migration projects from on-premise systems (e.g., Teradata) to modern cloud data warehouses.
Strong background in data governance, quality, and compliance, leveraging Informatica, DataFlux, and Alation to meet GDPR and CCPA requirements. Skilled in containerization and orchestration using Docker and Kubernetes (AKS, EKS, GKE), as well as automating infrastructure with Terraform, Ansible, and CloudFormation.
Experienced in CI/CD pipeline implementation using Jenkins, GitLab CI, Git, and AWS/Azure DevOps to streamline deployments and reduce downtime. Adept at delivering dashboards and reports with Power BI, Tableau, and Looker to provide actionable insights for business stakeholders.
Designed and deployed end-to-end ML pipelines by integrating Dataflow / Spark with Vertex AI / SageMaker / Azure ML for training, validation, and deployment of predictive models.
Built predictive models (regression, classification, time-series forecasting) to support business use cases such as demand forecasting, churn prediction, and anomaly detection
Supported ML engineers by building real-time streaming pipelines with Kafka + Redshift, enabling anomaly detection in claims processing.
.Implemented MLlib models in PySpark on Databricks and EMR, enabling scalable model training over large datasets.
Recognized for the ability to work with cross-functional teams, gather requirements, and translate business needs into scalable, data-driven solutions that improve efficiency, performance, and cost savings.
TECHNICAL SKILLS
Category
Skills
Programming
Python, SQL, Scala, Java
Big Data / ETL
Apache Spark, Hadoop, Kafka, Airflow, Databricks, AWS Glue, Azure Data Factory, GCP Dataflow
Cloud Platforms
AWS: S3, Redshift, Glue, EMR, EC2, Lambda
GCP: BigQuery, Pub/Sub, Dataflow, Composer
Azure: Synapse, Data Lake, Databricks, Data Factory
Databases / Warehouses
Snowflake, BigQuery, Redshift, SQL Server, Oracle, PostgreSQL, MongoDB, Cassandra
DevOps & Infrastructure
Terraform, Docker, Kubernetes, Jenkins, Git, GitLab CI/CD, Ansible, CloudFormation
Monitoring & Logging
Prometheus, ELK Stack, Splunk, CloudWatch, Stackdriver (Cloud Monitoring)
BI & Reporting
Power BI, Tableau, Looker
ML/DL Frameworks: TensorFlow, PyTorch, Keras, Scikit-learn, XGBoost, LightGBM, CatBoost
CERTIFICATIONS
●AWS Certified: Data Engineer – Associate
●Microsoft Certified: Azure Data Engineer Associate
●Google Cloud Certified: Professional Data Engineer
EDUCATIONAL DETAILS
●Bachelor of Technology in Computer Science, May 2010-May2014
PROFESSIONAL EXPIERENCE
Client: CVS, Dallas, TX July 2024 to Present Role:Sr.Data Engineer.
●Engineered scalable big data pipelines using Apache Spark, Apache Beam, and Cloud Dataflow to process and transform large datasets for analytics and reporting.Designed and maintained high-performance data warehouses in BigQuery using partitioning, clustering, and materialized views, reducing query costs and latency by 40%.
●Developed and automated ETL/ELT workflows with Cloud Composer (Airflow), improving pipeline reliability, scheduling, and orchestration.
●Led cloud migration from Teradata to GCP, modernizing data infrastructure with BigQuery, Cloud Storage, and Dataflow, reducing operational costs by 30%.
●Built proactive monitoring solutions using Cloud Monitoring (Stackdriver), Prometheus, and custom logging to ensure pipeline health, performance, and cost optimization.
●Integrated Snowflake, PostgreSQL, BigQuery, and Firestore to support structured and semi-structured data processing use cases
●Designed and deployed fault-tolerant, scalable data lakes on Cloud Storage + BigQuery with Dataflow for both batch and streaming workloads
●Conducted performance tuning for BigQuery queries, Spark jobs, and pipelines to improve execution time and optimize cost.
●Automated deployments with Terraform, GitLab CI, and Cloud Build, enabling reliable and repeatable infrastructure provisioning.
●Proven leadership experience mentoring junior engineers, leading design/code reviews, and guiding large-scale cloud migration strategies.
●Strong collaboration with product managers, business stakeholders, and data science teams to deliver scalable, data-driven solutions aligned with business goals.
●Environment: BigQuery, Cloud Storage, Dataflow (Apache Beam), Cloud Composer (Airflow), Pub/Sub, Spark on Dataproc, Firestore, PostgreSQL, Snowflake, Cloud Monitoring (Stackdriver), Prometheus, Terraform, GitLab CI, Cloud Build, Docker, Python, Pandas, NumPy, SQL, Apache Kafka.
Client: Macy's, New York, NY October 2022 to July 2024 Role: Sr.Data Engineer
Responsibilities:
●Designed and implemented scalable data pipelines on GCP using Apache Beam, Dataflow, and Cloud Composer (Airflow) to process batch and streaming data.
●Acquired and integrated data from BigQuery, Firestore, and Cloud Storage, building structured datasets to support analytics and business reporting.
●Developed and automated ETL/ELT workflows to ensure reliable ingestion and transformation of data from multiple sources.
●Built and optimized data warehouses in BigQuery using partitioning, clustering, and optimized schemas for high-performance queries.
●Created and maintained data lakes on Cloud Storage integrated with BigQuery for scalable and cost-effective storage.
●Partnered with business stakeholders to define requirements and translate them into scalable data models for enterprise reporting.
●Designed and deployed CI/CD pipelines using GitLab CI and Terraform to automate infrastructure provisioning and deployments.
●Developed interactive Power BI and Looker dashboards to visualize KPIs, supply chain metrics, and sales performance.
●Conducted performance tuning for BigQuery queries and Spark jobs on Dataproc, reducing processing time and optimizing resource costs.
●Implemented monitoring and logging solutions with Cloud Monitoring, Prometheus, and custom alerts to ensure pipeline reliability and cost visibility.Optimized BigQuery and Spark pipelines, reducing cost by 30% and improving processing speed by 25%.
●Collaborated cross-functionally with analytics teams to build dashboards, accelerating decision-making for supply chain operations.
Environment: GCP (BigQuery, Cloud Storage, Dataflow, Pub/Sub, Cloud Composer, Dataproc), Snowflake (on GCP), Firestore, PostgreSQL, Power BI, Looker, Terraform, GitLab CI, Docker, Python, SQL, Pandas, NumPy, Apache Spark, Kafka.
Client: Homesite insurance, Boston, MA April 2019 to September 2022 Role: Data Engineer
Responsibilities:
●Engineered and deployed high-performance data architectures using AWS EC2, S3, EBS, RDS, and DynamoDB, delivering scalable, fault-tolerant solutions for large-scale data systems.
●Designed and implemented ETL pipelines using Apache Spark and AWS Glue, significantly improving real-time analytics and batch processing efficiency across distributed platforms.
●Led seamless migrations of on-premises data warehouses to Snowflake and AWS Redshift, enhancing data scalability and reducing query response times for analytical workloads.
●Developed and deployed custom Airflow DAGs to automate complex workflows, integrating with AWS Glue and Apache Spark for efficient orchestration of data operations.
●Applied Python with libraries such as Pandas and NumPy for data cleansing and validation, reducing data errors by 30% and improving data reliability.
●Established robust data governance frameworks by defining schemas, classifications, and access policies using AWS Glue Data Catalog, ensuring regulatory compliance and data integrity.
●Configured advanced monitoring and alerting using AWS CloudWatch, ELK Stack, and Splunk, enabling real-time system analysis and proactive issue resolution.
●Automated infrastructure provisioning and configuration using AWS CloudFormation and Ansible, reducing setup time by 50% and minimizing manual configuration errors.
●Integrated Snowflake with AWS S3, and BI tools such as Tableau and Power BI, streamlining data access and visualization for business users.
●Optimized performance of NoSQL databases like MongoDB and Cassandra, enhancing query speed and system scalability for high-throughput applications.
●Built and maintained RESTful APIs using Flask and Django, enabling seamless integration between data services and external platforms.
●Designed and maintained robust CI/CD pipelines using tools like Jenkins and Bamboo, improving deployment frequency and software delivery reliability.
●Implemented messaging and notification systems using AWS SNS and SQS, supporting reliable communication in distributed microservices architecture.
●Migrated legacy data pipelines to the Cloudera ecosystem using Apache Hive, Impala, and Spark, enhancing performance and scalability of big data operations.
●Created disaster recovery strategies using EBS snapshots and AWS S3 versioning, safeguarding critical data assets and ensuring business continuity.
●Documented cloud infrastructure and data architectures in Confluence, supporting knowledge sharing, onboarding, and effective cross-functional collaboration.
●Automated data quality checks and ETL job triggers using Python scripts integrated with Apache Airflow and Cloud Composer.
●Developed auto-remediation scripts using Shell and PowerShell for monitoring alerts and infrastructure events.
Environment: AWS EC2, S3, EBS, RDS, DynamoDB, AWS Glue, Apache Spark, Snowflake, AWS Redshift, Airflow, Python, Pandas, NumPy, AWS Glue Data Catalog, AWS CloudWatch, ELK Stack, Splunk, CloudFormation, Ansible, Azure Blob Storage, Tableau, Power BI, MongoDB, Cassandra, Flask, Django, RESTful APIs, Jenkins, Bamboo, AWS SNS, AWS SQS, Cloudera, Apache Hive, Impala,, Confluence, Shell scripting, Kafka, Scala, Git, Bitbucket.
Client: AgFirst Columbia, SC June 2017 to March 2019 Role: Data Engineer
Responsibilities:
●Designed and implemented large-scale data storage solutions using Azure Data Lake Storage and Delta Lake, enhancing scalability and performance for agricultural and business analytics.
●Developed and maintained complex Azure Data Factory pipelines for seamless ETL/ELT operations, ensuring efficient data integration from various sources.
●Utilized Azure Databricks and PySpark to process large datasets, implement custom UDFs, and build MLlib-based predictive models to drive data insights.
●Automated data workflows using Apache Airflow and Databricks Jobs, significantly improving pipeline reliability and reducing manual intervention.
●Designed and optimized Azure Synapse Analytics solutions to support large-scale data warehousing and advanced analytics workloads.
●Built and published interactive dashboards and real-time reports in Power BI, including the creation of custom visuals tailored to specific business needs.
●Deployed and maintained Azure SQL databases with optimized indexing, partitioning, and query performance for high-volume transaction environments.
●Integrated Azure Service Bus and Event Hub for real-time data ingestion and event-driven processing to enable responsive business operations.
●Streamlined infrastructure deployment using Terraform, Bicep, and ARM templates, facilitating scalable and repeatable cloud resource provisioning.
●Implemented robust CI/CD pipelines with Azure DevOps, enhancing team collaboration, code quality, and deployment efficiency.
●Ensured data security and compliance by configuring Azure Key Vault, Log Analytics, and implementing governance policies for data protection.
●Provisioned and managed various Azure services such as Function Apps, Storage Accounts, and Cosmos DB, supporting end-to-end cloud solutions.
●Worked closely with cross-functional teams and business stakeholders to gather data requirements and deliver customized, high-impact data solutions.
●Applied SQL manages Lakehouses, Delta Tables, and Data Lakes, supporting unified and flexible data architectures.
●Conducted performance tuning and optimization for SQL queries, Spark job configurations, and data models to maximize system throughput and efficiency.
Environment: Azure Data Lake Storage, Azure Synapse Analytics, Azure Databricks, Azure Data Factory, Power BI, Azure SQL Database, Azure Service Bus, Azure Event Hub, Apache Airflow, Databricks Jobs, PySpark, MLlib, Microsoft Fabric, Delta Lake, Azure Key Vault, Azure Log Analytics, Azure DevOps, Terraform, Bicep, ARM Templates, Azure Function Apps, Azure Storage Accounts, Cosmos DB, SQL, CI/CD, ETL/ELT Pipelines.
Client: Jarus Technologies July 2014 to Feb 2017 Role: Data Engineer
Responsibilities:
●Analyzed and interpreted complex data sets using Informatica 8.1, identifying significant patterns and trends that supported strategic business decision-making processes.
●Utilized Data Flux tools to thoroughly cleanse and standardize incoming data, ensuring high levels of data quality and integrity for accurate and reliable reporting.
●Extracted, transformed, and loaded large volumes of data from Oracle 9i databases, creating comprehensive and structured datasets for in-depth analysis.
●Conducted rigorous data validation and testing using Quality Center 8.2, verifying the accuracy and consistency of data and analytical outputs before delivery.
●Developed and optimized complex SQL queries to efficiently retrieve and manipulate data, directly supporting various business intelligence and reporting needs.
●Managed and analyzed database objects with TOAD, improving database performance and ensuring fast, efficient data retrieval across multiple platforms.
●Designed, implemented, and maintained PL/SQL scripts to automate ETL processes, enhancing the accuracy and speed of data extraction, transformation, and loading workflows.
●Integrated data from diverse sources, including flat files, into the data warehouse, facilitating seamless data flow and accessibility for analytical and reporting purposes.
●Analyzed large datasets stored in Teradata to generate actionable business insights, driving operational improvements and increased efficiencies.
●Created comprehensive reports and dynamic dashboards using analytical tools, enabling stakeholders to make well-informed, timely decisions based on accurate data insights.
●Collaborated with cross-functional teams to understand business requirements and translate them into effective data solutions leveraging ETL processes and database management.
●Implemented best practices in data governance and quality assurance, significantly reducing data inconsistencies and improving overall data reliability.
●Provided on-going support and troubleshooting for ETL jobs and data pipelines, ensuring continuous data availability and system stability.
Environment: Informatica 8.1, Data Flux, Oracle 9i, Quality Center 8.2, SQL, TOAD, PL/SQL, Flat Files, Teradata, ETL Processes, Data Warehousing, Data Validation, Data Cleansing, Data Integration, Reporting and Dashboard Tools, Data Governance.