Vikas Reddy Data Engineer
*****@***********.*** 260-***-**** USA LinkedIn
Summary
Results-driven Data Engineer with strong 3+years of expertise in building scalable data pipelines, real-time data processing, and cloud-based data warehousing. Proficient in designing robust ETL workflows, implementing data quality frameworks, and optimizing complex SQL queries for high-performance analytics. Skilled in collaborating with cross-functional teams using Agile methodologies to deliver reliable and actionable data solutions. Experienced with a broad technology stack including Apache Kafka, AWS, Spark, Python, and various data governance tools, ensuring data integrity and operational efficiency. Technical Skills
Data Streaming & Ingestion: Apache Kafka, AWS Kinesis, Apache NiFi, Sqoop, Spark Streaming, AWS Lambda
Cloud Platforms & Data Warehousing: Amazon Redshift, Apache Druid, Snowflake, HDFS, Hive, Azure Data Factory, Databricks
Programming & Scripting: Python (Pandas, NumPy, Marshmallow), PySpark, FastAPI, PyKafka, SQL (CTEs, Window Functions)
ETL & Data Integration: Talend, Informatica MDM, PySpark ETL workflows
Data Quality & Governance: Great Expectations, OpenLineage, Apache Atlas
Data Modeling & Analytics: Customer Data Platforms (CDP), Data Modeling, Complex SQL Query Optimization
Methodologies & Collaboration: Agile Development, Cross-functional Team Collaboration Professional Experience
Data Engineer, Prosper Marketplace 10/2024 – Present Remote, USA
Designed and implemented robust ETL pipelines using Apache Kafka, AWS Lambda, and Python to efficiently extract, transform, and load large volumes of loan and transaction data, ensuring data accuracy and reducing processing time by 35%.
Worked on building scalable data pipelines using Apache Kafka, AWS Kinesis, and Lambda to ingest loan and transaction data, reducing latency by 40%, enabling near real-time analytics essential for loan risk and portfolio management.
Partnered with product managers, data scientists, and engineers in Agile teams to develop integrated platforms for loan performance, customer segmentation, and fraud detection, improving cross-team collaboration by 30%.
Developed Python stream consumers with FastAPI and PyKafka to power real-time dashboards and anomaly alerts for transaction data, enhancing issue detection and troubleshooting by 45%, improving data trustworthiness.
Engineered Amazon Redshift and Apache Druid storage solutions supporting sub-second queries and complex time-series aggregations to analyze borrower behavior and loan payment patterns, ensuring 99.5% uptime during peak hours.
Created optimized SQL transformations using Window Functions and CTEs to calculate loan default probabilities, credit utilization, and payment delinquency metrics, improving query performance by 60% for faster underwriting insights.
Implemented data quality checks with Great Expectations and OpenLineage, using Python with Pandas and Marshmallow to detect schema changes and null anomalies, reducing data incidents by 50% and supporting regulatory compliance. Data Engineer, KPMG India 01/2021 – 07/2023 Hyderabad, India
Directed integration of a centralized Customer Data Platform (CDP) consolidating user interactions, CRM records, and digital touchpoints with Agile teams, enhancing customer insights and cross-channel campaign effectiveness by 40%, in collaboration with marketing analysts, CRM specialists, and product managers.
Designed and deployed scalable ingestion pipelines using Apache NiFi and Sqoop to extract customer and engagement data from legacy CRM systems and third-party APIs, storing it in HDFS and Hive, reducing manual data integration efforts by 60%.
Utilized Talend and Informatica MDM to cleanse, deduplicate, and standardize customer profiles, resulting in golden records that boosted marketing segmentation accuracy and personalization effectiveness by 55%.
Developed PySpark ETL workflows on Databricks to process clickstream and engagement data, improving batch processing performance and reducing data latency by 70%, while using Pandas and NumPy for profiling and validation.
Implemented real-time customer behavior tracking with Kafka and Spark Streaming, enabling dynamic segmentation and personalized content delivery with 85% improvement in responsiveness.
Automated data quality checks, lineage documentation, and audit compliance using Great Expectations, Apache Atlas, and Azure Data Factory, aligning with data governance policies and boosting audit scores by 50%.
Designed customer-centric data models in Snowflake and partnered with analytics teams to build complex SQL queries using CTEs and Window Functions, reducing dashboard load times and accelerating customer insight reporting by 65%. Education
Master of Science, Information Systems 10/2023 – 05/2025 Indiana Institute of Technology Fort Wayne, IN, USA Bachelor of Technology, Electrical Engineering 06/2018 – 07/2022 Jawaharlal Nehru Technological University Karimnagar, India