Data Engineer Real-Time

Location:

Farmington Hills, MI

Salary:

120000

Posted:

October 28, 2025

Contact this candidate

Resume:

Kiran Kumar Yerrolla

248-***-**** **************@*****.*** Detroit, MI 48335

LinkedIn: https://www.linkedin.com/in/kiran-kumar-30795820b/

Data Engineer with 5+ years of experience building and optimizing cloud-based data pipelines using AWS (Glue, Lambda, Step Functions, Redshift, Kafka, Flink), Databricks, and SQL. Skilled in scalable ETL/ELT workflows, real-time streaming, and data modeling with a proven record of improving performance, reducing costs, and enabling real-time analytics. Currently pursuing a Doctor of Business Administration (DBA) in Business Intelligence & Analytics (Expected 2029) to combine technical expertise with advanced business and leadership skills.

RELEVANT EXPERIENCE

Rocket Mortgage Detroit, MI AWS Data Engineer November 2023- Current

●Designed and optimized ETL pipelines using AWS Glue, Lambda, and Step Functions to process and transform terabytes of data from RDS, S3, and Redshift, ensuring high scalability and reliability.

●Built real-time streaming applications with Apache Flink to consume and process data from Kafka topics, writing results into RDS and integrating with Zero-ETL pipelines for seamless replication into Redshift, improving data freshness by 90%.

●Served in 24/7 on-call rotations to monitor mortgage data jobs and production pipelines; promptly diagnosed and resolved outages, communicated data delays, and restored pipeline integrity.

●Improved Redshift cluster performance by 25% through distribution key optimization, vacuuming, and query tuning, significantly reducing query latency for analytics workloads.

●Designed and implemented event-driven architectures using AWS SQS and SNS for asynchronous messaging and notifications, ensuring reliable communication across distributed systems.

●Developed automated ingestion pipelines with AWS Lambda and Kinesis to move data from APIs, databases, and file systems into S3, reducing manual interventions and improving scalability.

●Developed data quality checks with AWS Glue Data Brew and implemented CloudWatch monitoring & alarms, reducing pipeline failures and cutting on-call tickets by 25%.

●Migrated data to cost-effective storage tiers in Amazon S3 (e.g., Glacier, Intelligent-Tiering) with lifecycle policies, reducing costs by 30%.

●Leveraged AWS EMR to transform and migrate large datasets between S3, DynamoDB, and other AWS stores, supporting diverse analytical workloads

●Collaborated with cross-functional teams (data scientists, analysts, engineers, product managers) to deliver business-aligned, high-performance data solutions supporting enterprise analytics.

CVS Health Detroit, MI Data Engineer May 2023- October 2023

●Developed Spark applications in Databricks using Python and Spark-SQL to extract, transform, and aggregate data from diverse file formats, uncovering customer usage insights and enabling accurate reporting

●Architected scalable data lakes on AWS S3 with Databricks, Glue, and Athena, supporting enterprise-wide analytics and cross-functional collaboration.

●Designed continuous ingestion pipelines with Databricks Structured Streaming, AWS Kinesis, and Lambda, reducing data latency and enabling near real-time analytics.

●Integrated Databricks with AWS Redshift to accelerate ETL workflows, improving query performance and enhancing data accessibility for business users.

●Optimized Spark jobs in Databricks, reducing cluster costs and cutting processing times by 20% for large datasets.

●Led migration of on-premises data warehouses to AWS S3 + Databricks, streamlining legacy ETL workflows and modernizing data platforms as part of the enterprise cloud strategy.

●Implemented Delta Lake for ACID transactions, schema enforcement, and reliable data versioning, ensuring data quality and consistency across reporting pipelines.

●Built automated ETL pipelines for data cleansing, transformation, and integration into cloud-based data lakes and warehouses, supporting BI and compliance reporting across healthcare business units.

Infosys Hartford, CT Data Engineer Sep 2022- April 2023

●Developed and optimized data pipelines in Apache Airflow using Python, processing 1M+ records daily and improving ingestion speed by 35%, directly impacting real-time analytics.

●Automated Power BI dashboards for customer segmentation, enabling stakeholders to identify ticket types, target high-potential leads, and resolve venue issues 20% faster, leading to a 15% increase in upsell opportunities.

●Operationalized a CLV (Customer Lifetime Value) model by building scalable Prefect pipelines, automating workflows, and reducing manual effort by 30%, enhancing retention strategies by 25%.

●Optimized SQL queries with efficient indexing and tuning, cutting execution times by 45% and accelerating reporting cycles by 50%.

●Developed CI/CD workflows in CircleCI to automate deployment of configuration files to AWS S3, reducing manual errors and streamlining deployments.

●Documented architectures and ETL workflows in Lucidchart and Confluence, improving cross-team collaboration and onboarding efficiency.

●Mentored a team of 4 junior data engineers in Prefect ETL design and SQL best practices, boosting team productivity by 35% and strengthening expertise in big data technologies.

●Collaborated with technical and non-technical stakeholders to translate business problems into technical solutions, delivering 5 key data products that drove a 25% increase in user engagement.

Amazon Seattle, WA Summer Intern May 2022- Aug 2022

●Contributed to developing scalable ETL pipelines using Hadoop, Hive, Spark (PySpark & SparkSQL), and AWS EMR to process multi-terabyte datasets with high performance and fault tolerance.

●Assisted in automating real-time data ingestion with AWS Glue, Kinesis, Firehose, and Lambda, integrating Step Functions for workflow orchestration.

●Worked on data warehousing solutions using Redshift, DynamoDB, and Snowflake (via dbt), supporting analytics and reporting workloads.

●Developed monitoring dashboards with Tableau and AWS QuickSight, integrated with CloudWatch for system health and performance tracking.

●Collaborated with cross-functional teams to deliver data engineering solutions, gaining exposure to enterprise-scale cloud practices and agile project management.

Cholamandalam Investment and Finance Company Hyderabad, IND Associate Data Engineer Jan 2019- Dec 2020

●Developed custom web crawlers to extract provider data from license verification sites, reducing manual verification time by 40%.

●Orchestrated and automated ETL pipelines using AWS Glue for data cataloging, with Python scripts for downstream loading, reducing report retrieval time by 20%.

●Designed and developed PL/SQL stored procedures, functions, and views in SQL Server to support Finance applications, including complex billing calculations per provider, location, and payer.

●Enhanced ETL workflows in SSIS, optimizing transformations (CASE, UNION, complex joins, CTEs, Derived Columns) to decrease execution time for 100+ providers by 20%.

●Contributed to cloud migration projects, migrating SQL Server workloads to AWS using DMS, Lambda, Step Functions, Glue, and S3.

●Modernized legacy feed processes by migrating to Python + AWS pipelines, reducing processing time by 30% and improving scalability.

SKILLS

Databases & Warehousing: Amazon Redshift, PostgreSQL, SQL Server, MySQL, DynamoDB

Big Data & Processing: Apache Spark (PySpark, SparkSQL), Databricks, Hadoop, Hive, Apache Flink, Kafka

Cloud Platforms:

oAWS: S3, Glue, EMR, Lambda, Kinesis, Firehose, Step Functions, RDS, EventBridge, IAM, SNS, SQS

oAzure: Data Factory, Synapse, Data Lake Storage, Functions, Logic Apps

ETL & Orchestration: dbt, Apache Airflow, AWS Glue, Informatica, Talend

BI & Analytics Tools: Tableau, Power BI (DAX, Power Query M), AWS QuickSight, Alteryx, MS Excel (Advanced)

Programming & Scripting: Python, SQL, Scala, R, Bash, Shell Scripting

Machine Learning (Applied): Scikit-Learn, Regression, Clustering, Decision Trees, SVM, Time-Series (ARIMA)

DevOps & CI/CD: GitHub, GitLab, Jenkins, Docker, Kubernetes

Monitoring & Automation: AWS CloudWatch, Ansible, PowerShell

Other Tools: JIRA, Confluence, JSON, REST APIs, Parquet

EDUCATION

Texas A& M University Kingsville, TX

Master of science in Computer Science. Jan 2021 – May 2022

Jawaharlal Nehru Technological University Hyderabad, IND

Bachelors in Electronics and Communication Engineering Aug 2015 – June 2019

Belhaven University Jackson, MS

Doctor of Business Administration in Business Intelligence and Analytics. June 2025 -June 2029

CERTIFICATION

Databricks Certified Data Engineer Professional(2025)

AWS Certified Solutions Architect – Associate (2023)

Microsoft Certified: Azure Data Engineer Associate (2023)

Microsoft Certified: Power BI Data Analyst Associate (2023)

Contact this candidate