Data Engineer Machine Learning

Location:

Blanchester, OH, 45107

Salary:

70000

Posted:

October 15, 2025

Contact this candidate

Resume:

Summary

Mani Vaibhav Ruhanth Koliparthi

*******@***********.*** +1-856-***-**** Pitman, NJ LinkedIn GitHub

Data Engineer with around 4 years of experience in designing and implementing scalable data pipelines, ETL workflows, and big data solutions using SQL, Python, Scala, Apache Flink, Spark, and Hadoop. Skilled in real-time and batch data processing, data integration, and data quality management. Proficient in building machine learning, deep learning, and NLP models to enable data-driven insights. Experienced with Tableau, Power BI, and cloud platforms like AWS and Azure. Strong background in data governance, metadata management, and cross-functional collaboration in Agile environments.

Technical Skills

Programming & Scripting: Python, SQL, R, Scala, NoSQL, Shell Scripting, Bash, HTML, CSS

Big Data, ETL & Frameworks: Apache Spark, Apache Flink, PySpark, Apache Kafka, Apache Airflow, Hadoop, Snowflake, Databricks, Informatica, Sterling Integrator, Data Modeling, Data Mapping, Data Mining, Data Extraction

Databases & Storage: MS-SQL, MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, Redshift, BigQuery, Snowflake, SparkSQL, Delta Lake, Azure Synapse Analytics, HDFS

Cloud Platforms & DevOps: AWS (S3, EMR, EC2, Lambda, Redshift), Azure (Data Factory, Synapse, Blob Storage), GCP (BigQuery, Dataflow), Git, GitHub, Docker, Jenkins, CI/CD, Linux

Data Analysis, ML & Visualization: NumPy, Pandas, Scikit-learn, TensorFlow, Keras, Flask, Django, Machine Learning, Statistics, SAS, ggplot2, Tableau, Power BI, Looker, Matplotlib, Seaborn, Google Analytics, MS Excel

Data Management & Methodologies: Data Governance, Data Quality Management, Metadata Management, Data Catalogs, Master Data Management (MDM), Agile, Scrum, SDLC, Transformation & Loading (ETL)

Professional Experience

Data Engineer, JP Morgan 08/2024 – Present Remote, USA

Engineered scalable, real-time data pipelines using Python (Pandas, NumPy), SQL, Apache Kafka, AWS Glue, and Apache Spark to ingest, process, and stream investment data into Amazon Redshift and S3 data lakes, improving data accuracy and accessibility by 40%.

Automated complex ETL workflows leveraging AWS Lambda, Step Functions, and advanced Python scripting, reducing data latency and boosting reporting efficiency by 30%.

Developed interactive Power BI dashboards integrating data from cloud warehouses and lakes, reducing manual reporting by 50% and enabling faster investment performance insights, with advanced data analysis using MS Excel.

Implemented robust data quality monitoring and validation frameworks using Hadoop, Apache Airflow, Python libraries, and SQL-based data audits, achieving a 98% data accuracy rate across pipelines.

Collaborated closely with analysts and stakeholders to translate business requirements into scalable data engineering solutions, incorporating predictive analytics and aligning with strategic investment goals.

Data Engineer, Razorpay Inc. 03/2020 – 12/2022 Bangalore, India

Designed and implemented a real-time ETL pipeline using Apache Kafka and PySpark, integrating data from 15+ payment channels into Razorpay’s AWS Redshift, enabling low-latency analytics and supporting critical business reporting needs.

Engineered and optimized 100+ complex SQL and PySpark workflows to process 5+ TB of payment and merchant data, achieving a 30% improvement in query performance by utilizing partitioning, indexing, and resource management.

Developed predictive machine learning models in Python using Pandas, NumPy, and Scikit-learn to forecast transaction volumes, payment failures, and merchant trends, significantly enhancing inventory management.

Leveraged AWS S3 for storage, Lambda for ETL orchestration, and Redshift for analytics, maintaining 90%+ data pipeline uptime, ensuring data availability and reliability for real-time business intelligence across Razorpay’s payment ecosystem.

Built interactive Power BI dashboards and automated Excel reports for 100+ users, visualizing payment trends, merchant metrics, and predictive analytics, empowering business stakeholders at Razorpay to make timely, data-driven decisions.

Collaborated with data analysts, product managers, and business leaders to gather requirements and deliver tailored data engineering solutions, directly supporting Razorpay’s initiatives and fostering a data-driven culture across the organization.

Education

Master of Science, Rowan University 01/2023 - 12/2024 NJ, USA Computer Science

Bachelor of Technology, Presidency University 08/2018 - 06/2022 Bangalore, India Computer Science and Engineering

Projects

Real-Time Data Streaming with Kafka, Spark, AWS, Kafka, PySpark, S3, Athena May 2024

Built a real-time data streaming pipeline using Kafka producers to ingest logs, ensuring fault tolerance with topic partitioning and replication.

Implemented Spark Structured Streaming for low-latency processing, achieving a 5x speedup over traditional batch methods.

Integrated with AWS S3, Athena to store and analyze processed logs in real-time, reducing query response times significantly.

Optimized Kafka consumer groups, PySpark processing for scalability, enabling real-time log monitoring and anomaly detection.

Contact this candidate