Rasagna Konduri
Data Engineer
Richardson, TX ****************@*****.*** 945-***-**** LinkedIn SUMMARY
Data Engineer with 5+ years of experience designing, building, and optimizing scalable data pipelines and analytics platforms across technology, finance, and consulting sectors. Expertise in Python, SQL, cloud platforms (AWS, Azure), big data technologies (PySpark, Kafka, Hadoop, Flink), data warehousing (Snowflake, Redshift), and modern ETL tools (dbt, Airflow, Databricks). Adept at implementing CI/CD, infrastructure-as-code
(Terraform), and MLOps pipelines (MLflow, Kubeflow) to accelerate machine learning workflows and enhance operational efficiency. Proven ability to collaborate effectively in Agile environments, consistently delivering measurable improvements in data quality, processing efficiency, and system performance.
EXPERIENCE
NVIDIA CA Data Engineer September 2024 – Present
• Designed and deployed data pipelines for ML model training and advanced analytics using AWS services (S3, EC2, Glue) and on-premise systems to process high-throughput sensor data from autonomous vehicle simulations, improving data processing performance by 30%.
• Developed high-performance data processing frameworks using Python, PySpark, and NumPy to transform raw LiDAR and radar signals into structured features, reducing feature engineering time by 40%.
• Standardized 30+ transformation pipelines by implementing dbt Core on AWS Databricks Lakehouse, enabling automated lineage tracking, improving feature consistency by 35%, and cutting redundant storage costs by 20%.
• Built end-to-end MLOps pipelines using MLflow and Kubeflow to automate perception model training, deployment, and monitoring, reducing model iteration time by 50% and enabling continuous retraining on new driving scenario data.
• Integrated NVIDIA NIM inference microservices with LLMs (Llama 3, GPT-4) to automate driving scenario generation and log analysis, reducing manual labeling effort by 60% while increasing synthetic training data diversity by 40%.
• Collaborated with cross-functional Agile teams, including perception engineers and researchers, to streamline data workflows and implement sprint reviews, reducing feature development delays by 20%. JPMorgan Chase India Data Engineer June 2021 – July 2023
• Developed and deployed data transformation workflows using dbt Core and Snowflake, streamlining ETL processes and improving data quality by 25%, which enabled faster and more reliable reporting across the organization.
• Built modular Python frameworks using Pandas and PySpark to automate financial data validation and reconciliation for over 20 daily regulatory reports, achieving 99.9% accuracy and reducing manual review time by 60%.
• Optimized Spark jobs by implementing adaptive query execution, dynamic partition pruning, and broadcast joins, reducing runtime by 45% for capital markets analytics workloads processing 8TB+ daily trade data.
• Architected real-time data pipelines for fraud detection and transaction monitoring using Apache Kafka, Apache Flink, and Spark Structured Streaming, enabling unified batch and stream processing with low-latency performance.
• Leveraged AWS tools (SQS, SNS, Lambda, Glue with PySpark) to build event-driven and scalable cloud data pipelines, achieving a 40% improvement in data processing efficiency. Used Apache Airflow for orchestration and workflow automation.
• Designed and optimized SQL Server data warehousing solutions, leveraging indexing (clustered and non-clustered), partitioning, complex queries (joins, CTEs, window functions), and stored procedures/functions/triggers to enhance query performance and code reusability. Accenture India Data Engineer March 2019 – June 2021
• Engineered and implemented scalable data integration workflows using Azure Data Factory, Azure Databricks, and SSIS, streamlining end- to-end ETL processes and reducing data pipeline latency by 30% on Azure-based platforms.
• Automated CI/CD pipelines for data solutions using Azure DevOps and GitHub Actions, implementing infrastructure-as-code (Terraform) to enable zero-downtime deployments of 20+ ETL jobs, reducing production incidents by 40%.
• Enhanced big data processing performance by 35% using Hadoop, Oozie (for orchestration), Hive (for querying), Pig (for transformation), and MapReduce (for ingestion) in a distributed data environment.
• Developed interactive data visualizations and probabilistic forecasting dashboards using Power BI, Plotly, and PyMC3 (Bayesian modeling), improving strategic decision-making through accurate data-driven insights. SKILLS
Programming Languages: Python, Scala, SQL, PL/SQL, UNIX Shell Scripting Big Data & Distributed Processing: Apache Spark (PySpark, Spark SQL), Hadoop (HDFS, MapReduce, Hive, Pig) Cloud Platforms & Services: AWS (EC2, S3, Redshift, Glue, Lambda, Kinesis, DynamoDB), Azure (Data Factory, Databricks, Data Lake, Synapse) ETL & Data Integration Tools: Apache Airflow, dbt, SSIS, Informatica PowerCenter, Talend, Azure Data Factory, AWS Glue, Oozie Databases & Warehousing: MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, DynamoDB, Snowflake, Redshift, Data Lake Architecture Streaming & Event-Driven Technologies: Apache Kafka, Apache Flink, Apache Spark Streaming, Amazon Kinesis, Spark jobs MLOps & DevOps / Containerization: Docker, Kubernetes, MLflow, Kubeflow, Jenkins, GitHub Actions, Terraform, CI/CD Machine Learning & Statistical Libraries: TensorFlow, PyTorch, Scikit-learn, Keras, Pandas, NumPy, SciPy, NLTK, PyMC3, LLMs (Llama 3, GPT-4) Data Visualization & Reporting: Power BI, Tableau, Microsoft Excel, Plotly, SSRS, Amazon QuickSight Software & Data Practices: Agile/Scrum, DataOps, Git (version control), code reviews, data pipeline architecture EDUCATION
Master’s in Information Technology and Management The University of Texas at Dallas, Richardson, TX, USA. Bachelor’s in Information Technology Vignana Bharathi Institute of Technology, Hyderabad, India.