Summary
Mani Vaibhav Ruhanth Koliparthi
*******@***********.*** +1-856-***-**** Pitman, NJ LinkedIn GitHub
Data Engineer with around 4 years of experience in designing and implementing scalable data pipelines, ETL workflows, and big data solutions using SQL, Python, Scala, Apache Flink, Spark, and Hadoop. Skilled in real-time and batch data processing, data integration, and data quality management. Proficient in building machine learning, deep learning, and NLP models to enable data-driven insights. Experienced with Tableau, Power BI, and cloud platforms like AWS and Azure. Strong background in data governance, metadata management, and cross-functional collaboration in Agile environments.
Technical Skills
Programming & Scripting: Python, SQL, R, Scala, NoSQL, Shell Scripting, Bash, HTML, CSS
Big Data, ETL & Frameworks: Apache Spark, Apache Flink, PySpark, Apache Kafka, Apache Airflow, Hadoop, Snowflake, Databricks, Informatica, Sterling Integrator, Data Modeling, Data Mapping, Data Mining, Data Extraction
Databases & Storage: MS-SQL, MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, Redshift, BigQuery, Snowflake, SparkSQL, Delta Lake, Azure Synapse Analytics, HDFS
Cloud Platforms & DevOps: AWS (S3, EMR, EC2, Lambda, Redshift), Azure (Data Factory, Synapse, Blob Storage), GCP (BigQuery, Dataflow), Git, GitHub, Docker, Jenkins, CI/CD, Linux
Data Analysis, ML & Visualization: NumPy, Pandas, Scikit-learn, TensorFlow, Keras, Flask, Django, Machine Learning, Statistics, SAS, ggplot2, Tableau, Power BI, Looker, Matplotlib, Seaborn, Google Analytics, MS Excel
Data Management & Methodologies: Data Governance, Data Quality Management, Metadata Management, Data Catalogs, Master Data Management (MDM), Agile, Scrum, SDLC, Transformation & Loading (ETL)
Professional Experience
Data Engineer, JP Morgan 08/2024 – Present Remote, USA
Engineered scalable, real-time data pipelines using Python (Pandas, NumPy), SQL, Apache Kafka, AWS Glue, and Apache Spark to ingest, process, and stream investment data into Amazon Redshift and S3 data lakes, improving data accuracy and accessibility by 40%.
Automated complex ETL workflows leveraging AWS Lambda, Step Functions, and advanced Python scripting, reducing data latency and boosting reporting efficiency by 30%.
Developed interactive Power BI dashboards integrating data from cloud warehouses and lakes, reducing manual reporting by 50% and enabling faster investment performance insights, with advanced data analysis using MS Excel.
Implemented robust data quality monitoring and validation frameworks using Hadoop, Apache Airflow, Python libraries, and SQL-based data audits, achieving a 98% data accuracy rate across pipelines.
Collaborated closely with analysts and stakeholders to translate business requirements into scalable data engineering solutions, incorporating predictive analytics and aligning with strategic investment goals.
Data Engineer, Razorpay Inc. 03/2020 – 12/2022 Bangalore, India
Designed and implemented a real-time ETL pipeline using Apache Kafka and PySpark, integrating data from 15+ payment channels into Razorpay’s AWS Redshift, enabling low-latency analytics and supporting critical business reporting needs.
Engineered and optimized 100+ complex SQL and PySpark workflows to process 5+ TB of payment and merchant data, achieving a 30% improvement in query performance by utilizing partitioning, indexing, and resource management.
Developed predictive machine learning models in Python using Pandas, NumPy, and Scikit-learn to forecast transaction volumes, payment failures, and merchant trends, significantly enhancing inventory management.
Leveraged AWS S3 for storage, Lambda for ETL orchestration, and Redshift for analytics, maintaining 90%+ data pipeline uptime, ensuring data availability and reliability for real-time business intelligence across Razorpay’s payment ecosystem.
Built interactive Power BI dashboards and automated Excel reports for 100+ users, visualizing payment trends, merchant metrics, and predictive analytics, empowering business stakeholders at Razorpay to make timely, data-driven decisions.
Collaborated with data analysts, product managers, and business leaders to gather requirements and deliver tailored data engineering solutions, directly supporting Razorpay’s initiatives and fostering a data-driven culture across the organization.
Education
Master of Science, Rowan University 01/2023 - 12/2024 NJ, USA Computer Science
Bachelor of Technology, Presidency University 08/2018 - 06/2022 Bangalore, India Computer Science and Engineering
Projects
Real-Time Data Streaming with Kafka, Spark, AWS, Kafka, PySpark, S3, Athena May 2024
Built a real-time data streaming pipeline using Kafka producers to ingest logs, ensuring fault tolerance with topic partitioning and replication.
Implemented Spark Structured Streaming for low-latency processing, achieving a 5x speedup over traditional batch methods.
Integrated with AWS S3, Athena to store and analyze processed logs in real-time, reducing query response times significantly.
Optimized Kafka consumer groups, PySpark processing for scalability, enabling real-time log monitoring and anomaly detection.