Data Engineer Machine Learning

Location:

Houston, TX

Posted:

September 10, 2025

Contact this candidate

Resume:

SRIKAVYA KADIVETI

+1-832-***-**** *****************@*****.***

PROFESSIONAL SUMMARY

Experienced Data Engineer with over 5 years of expertise in designing and delivering scalable, cloud-native data solutions across Azure, GCP, and AWS. Skilled in building and optimizing data pipelines and architectures using BigQuery, Dataflow, Snowflake, Kafka, Spark, and Airflow for real-time and batch processing of daily enterprise data. Proficient in Python, Scala, SQL, and Hadoop for analytics and transformation, with hands-on experience in CI/CD (Jenkins), Kubernetes, Docker, and Terraform for automation and deployment. Adept at multi-cloud migration, big data integration, and implementing machine learning models to deliver predictive insights, reduce costs, and accelerate decision-making. TECHNICAL SKILLS

• Data Analysis Libraries: Pandas, Numpy, Scipy, Scikit-Learn, Statsmodels, NLTK, Plotly, Matplotlib, MLlib, Scala

• Big Data Technologies: HDFS, Hive, Spark, Spark Streaming, Pyspark, YARN, Sqoop, Flume, Oozie, Zookeeper, HBase, Amazon Redshift, Kafka, Data Lake

• Programming Languages: Python, R, SQL, Scala, T-SQL, Pl/SQL, C#, HiveQL, Visual Basic, PERL, Lua, UNIX Shell Scrip ting

• ETL & Data Tools: Informatica PowerCenter, SSIS, SAP Crystal Reports, SQL Developer, TOAD, Teradata SQL Assistant, SQL Workbench, PuTTY

• Data Modeling Tools: Toad Data Modeler, SQL Server Management Studio(SSMS), MS Visio, SAP PowerDesigner, Erwin Data Modeler (v9.x)

• Databases: SQL Server (2014/2016), MySQL, PostgreSQL, Oracle(12c/11g/10g/9i), Teradata, Amazon Redshift, Azure SQL Data- base, MS Access, Hive

• Reporting Tools & BI Tools: Tableau, Power BI, SSRS, Crystal Reports (Xi/2013), SAP Business Objects, SAP Business Intelligence, Birst, Alteryx

• Cloud Technologies: AWS, Microsoft Azure (Familiar), GCP, Amazon EC2

• Analytics & Visualization: Alteryx, Tableau, Power BI, MS Excel (Advanced)

• Project Execution Methodologies: Agile, Scrum, Lean Six Sigma, Ralph Kimball and Bill Inmon DW Methodologies

• Operating Systems: Windows Server 2012 R2/2016, UNIX, CentOS

• Softskills: Communication skills, Business acumen, Attention to detail, Ability to mentor, Ability to adapt, Leadership, Problem solving, Self-motivation, Initiative,critical thinking skills, Customer-facing, Resolving issues, improve efficiency, mentorship, creative approach, inquisitive, committed WORK HISTORY

NTT DATA Jan 2025 - Current

AWS Data Engineer Frisco, TX

• Increased data pipeline efficiency by 35% by developing, deploying, and managing scalable pipelines using AWS Glue, Lambda, and Kinesis for real-time ingestion and transformation, ensuring adherence to data modeling best practices.

• Built secure, scalable data lakes using Amazon S3, integrating with Snowflake and data warehousing techniques to enhance structured and unstructured data storage, analytics, and scalability.

• Improved real-time data availability by integrating AWS DynamoDB with Lambda to store and back up item values instantly, supporting marketing analytics, digital media campaigns, and domain-specific data needs.

• Reduced manual data transfer overhead by automating ingestion from BDW Oracle and Teradata into HDFS using Sqoop, applying automation and workflow optimization to streamline big data access.

• Enabled real-time analytics by configuring Spark Streaming to ingest Kafka data into DBFS, supporting up-to-the-minute operational monitoring, troubleshooting, and performance tuning.

• Accelerated big data processing using Spark Core, SQL, Streaming, PySpark, and Spark-Scala for efficient batch, stream, and interactive analysis in healthcare, advertising, and fundraising domains.

• Shortened deployment cycles by 60% by automating CI/CD workflows with Jenkins, Maven, GitHub, Chef, Terraform, and AWS, incorporating version control and source code management best practices.

• Enabled faster decision-making by querying datasets from Amazon S3 via AWS Athena, developing dbt models, and building business insights through AWS QuickSight dashboards.

• Enhanced analytics capabilities by integrating Snowflake with machine learning models and scripting in Python for predictive insights and optimized operations on Google Cloud and Azure platforms.

• Increased stakeholder engagement by creating interactive dashboards using Tableau and Power BI, visualizing KPIs from pipelines across multiple database technologies.

• Streamlined ETL processes by automating and scheduling data workflows with Apache Airflow, ensuring consistent data pipeline performance and integration across multi-cloud systems. Western Alliance Bank July 2024 – Dec 2024

Azure Data Engineer Dallas,Tx

• Enabled data-driven decision-making by developing a modern data solution using Azure PaaS services, which improved visualization and enhanced business process understanding in alignment with data governance principles.

• Reduced integration delays by 40% through the implementation of Azure Data Factory to ingest and transform structured and unstructured data, facilitating improved data quality and compliance.

• Enhanced real-time analytics by leveraging Kafka and Spark Streaming to build scalable streaming solutions, resulting in more responsive data processing and faster insights.

• Accelerated performance by 35% by optimizing data transformation in Spark and storing results efficiently in HDFS and Snowflake, enhancing query responsiveness and reliability.

• Improved big data workflow efficiency through end-to-end implementation of big data technologies such as Hadoop, SOLR, Pyspark, Kafka, Storm, and Web Methods in large-scale analytics projects.

• Reduced manual provisioning time by 60% by designing infrastructure as code with Terraform to automate deployment and scaling of Azure cloud resources.

• Enabled high-performance global data storage by managing Azure Cosmos DB for low-latency, globally distributed applications, ensuring fast data retrieval and consistency.

• Increased predictive accuracy by 25% by integrating Azure Machine Learning models into existing pipelines, driving data-driven predictive analytics.

• Streamlined deployment process by implementing CI/CD pipelines with Jenkins, reducing delivery times and minimizing build/test issues.

• Enhanced scalability and reliability by containerizing data pipelines with Docker and orchestrating deployments using Kubernetes across multiple environments.

• Optimized ETL workflows by utilizing Informatica and Talend, streamlining data integration and transformation processes for high-volume enterprise data

• Environment: Apache Airflow, Apache Beam, Azure Cosmos DB, Azure Data Factory, Azure PaaS, Big Query, CI/CD, Cloud Pub/Sub, Cloud Storage, Data Lake, DataProc, Dataflow, Docker, GCP PaaS, Git, Hadoop, indexes, Informatica, Jenkins, Kubernetes, Maven, Pyspark, S3, Snowflake, Spark, SQL Database, SOLR, Tableau, Terraform, T-SQL scripting, AWS

Standard Charted Bank Jan 2022 - Jul 2023

GCP Data Engineer Bengaluru

• Increased data accuracy by 25% and boosted confidence in analytics outcomes by implementing robust validation and cleansing processes to resolve missing values, duplicates, and outliers.

• Reduced query execution times by 40% by monitoring data pipeline performance, diagnosing bottlenecks, and optimizing workflows for faster, more reliable data flows.

• Cut CI/CD deployment times by 50% by designing and implementing Jenkins jobs to automate build, testing, and deployment workflows at Standard Chartered Bank

• Enabled real-time and batch financial data processing by developing scalable pipelines using GCP Dataflow, Apache Beam, and Cloud Pub/Sub, supporting mission-critical analytics.

• Accelerated cloud adoption by 30% by assisting in the migration of on-prem Hadoop systems to GCP, ensuring seamless integration with cloud-native services

• Processed 2TB+ daily financial data by leveraging PySpark for complex transformations using DataFrames and Spark functions, improving ETL throughput.

• Enhanced multi-cloud data operations by integrating AWS S3, GCP Cloud Storage, DataProc, and BigQuery for scalable, cost-efficient storage and processing.

• Unified data architecture across teams by integrating Data Lakes, SQL databases, and Logic Apps, enabling faster analytics and BI reporting.

• Improved performance and reduced costs by 20% through multi-cloud migration strategies using GCP PaaS and AWS to modernize legacy systems.

• Automated 90% of recurring workflows by orchestrating ingestion, transformation, and reporting tasks using Apache Airflow.

• Increased deployment scalability by 3x by containerizing applications with Docker and orchestrating them with Kubernetes across hybrid-cloud environments.

• Eliminated manual infrastructure provisioning by implementing Terraform scripts to automate cloud resource creation and management on GCP.

• Environment: Jenkins, CI/CD, Dataflow, Apache Beam, Cloud Pub/Sub, Hadoop, PySpark, Cloud Storage, DataProc, BigQuery, AWS S3, Data Lake, SQL Database, GCP PaaS, Apache Airflow, Docker, Kubernetes, Terraform ICICI Prudential Life Insurance Jun 2019 - Nov 2021 Data Engineer Bengaluru

• Improved data access efficiency by monitoring pipeline performance and optimizing data storage and retrieval mechanisms, reducing latency in insurance analytics.

• Ensured smooth ETL operations by building scalable pipelines using Python, PySpark, Hive SQL, and Presto to handle structured and unstructured data from Cassandra and HDFS.

• Enhanced operational efficiency by automating data ingestion, transformation, and validation processes with Hive SQL, Presto SQL, and Spark SQL for large-scale insurance datasets.

• Enabled actionable business insights by developing Tableau dashboards connected to BigQuery and Presto via ODBC, visualizing trends in policyholder behavior and claims.

• Improved deployment reliability by implementing CI/CD pipelines with Docker, Kubernetes, and Jenkins, facilitating rapid iteration and stable delivery of applications.

• Reduced manual infrastructure overhead by designing infrastructure as code using AWS Terraform templates, allowing automated provisioning and dynamic scaling of cloud resources.

• Enabled real-time analytics and faster claims processing by developing data streaming pipelines with Apache Kafka and Spark Streaming for live insurance transactions.

• Strengthened underwriting and risk assessment by integrating machine learning models in Python and scikit-learn to predict customer churn and forecast insurance claims.

• Improved cross-platform data synchronization by using RESTful APIs and Flask to connect third-party insurance data sources, supporting seamless policyholder data sharing.

• Enhanced BI capabilities by designing and deploying scalable data warehouses using Snowflake and Google BigQuery, supporting large-scale analytics and streamlined reporting.

• Environment: Apache Kafka, AWS Terraform, Cassandra, Docker, Git, Hive, Hive SQL, HDFS, Kubernetes, Pandas, Presto, Python, Pyspark, RESTful APIs, SAP, scikit-learn, Snowflake, Spark, Tableau, data architecture, Linux EDUCATION

University Of Houston, Houston, TX

Master of Science in Engineering Data Science

CERTIFICATION

•ITIL Foundation Certificate in IT Service Management PUBLICATION

•Identifying Parkinson's Ailment Through The Classification Of Audio Recording Data. JCT Journal – LINK

Contact this candidate