Name: Omkar Reddy Lakkireddy
Ph.no:+1-940-***-****
Email id:***********@*****.***
https://www.linkedin.com/in/omkar-reddy-lakkireddy/
PROFESSIONAL SUMMARY
Over 5 years of experience as a Data Engineer working across multiple domains.
Strong expertise in Azure Data Factory, Azure Synapse, and Azure Blob Storage, delivering HIPAA-compliant healthcare data pipelines.
Built and optimized ETL/ELT pipelines using Python, SQL, Apache Spark, Apache Beam, and PySpark to process large-scale datasets.
Designed and implemented real-time streaming solutions using Apache Kafka and Azure Event Hubs.
Skilled in data validation, reconciliation, and data governance.
Automated CI/CD pipelines using Azure DevOps, Jenkins, GitHub Actions, and Terraform to streamline deployments and infrastructure provisioning.
Created interactive Power BI dashboards and reports to deliver insights.
Collaborative team player with experience in Agile Scrum, delivering secure, scalable, and business-driven data solutions across multi-cloud environments (Azure, AWS).
TECHNICAL SKILLS
•Cloud Platforms: Azure (Data Factory, Synapse, Blob Storage), AWS (S3, Glue, Redshift, EMR, Kinesis, Lambda)
•Big Data Technologies: Apache Spark, PySpark, Kafka, Flink, Airflow, NiFi, Hadoop (HDFS, Hive, MapReduce)
•Programming & Scripting: Python, SQL, Scala, Shell Scripting, Java
•Databases & Warehousing::Azure Synapse, BigQuery, Redshift, Oracle, SQL Server, MySQL, MongoDB, Cassandra, Teradata, DB2
•ETL Tools: Informatica IDMC, Talend, AWS Glue, Azure Data Factory
•Data Warehousing: Redshift, BigQuery, Azure Synapse, Teradata
•Data Visualization: Power BI, Tableau
•DevOps & IaC: Docker, Kubernetes, Terraform, Jenkins, Azure DevOps, GitHub Actions
•Messaging Systems: Kafka, AWS Kinesis, Pub/Sub
•Data Formats: JSON, Parquet, Avro, ORC, CSV
EDUCATION
•Master’s in Information Systems and Technologies
University of North Texas, Denton, TX
•Bachelor of Technology in Electronics and Communication Engineering
Lakireddy Balireddy College of Engineering,Mylavaram
CERTIFICATIONS
•Microsoft certified - Fabric Data Engineer Associate
•Microsoft certified - Azure Fundamentals
PROFESSIONAL EXPERIENCE
Client: WellsFargo, Charlotte NC Nov 2023 – Till Date
Role: Data Engineer I
Responsibilities:
Developed end-to-end data ingestion pipelines using AWS Glue, AWS Lambda, and Apache Airflow, enabling automated ETL processing across Amazon S3, Amazon Redshift, and Amazon RDS.
Implemented real-time data streaming pipelines using Amazon Kinesis Data Streams, Kinesis Firehose, and Lambda, allowing continuous ingestion and transformation into Amazon Redshift and Amazon OpenSearch.
Built modular PySpark jobs to process high-volume batch data from Amazon S3, applying business logic and loading the results into Redshift and S3 data lakes.
Created ETL frameworks using AWS Glue Studio and Glue Catalog, enabling scalable, reusable jobs and consistent schema management.
Ensured compliance with PCI-DSS by automating audits and encryption enforcement in AWS Glue and Redshift workflows.
Automated the deployment of Airflow DAGs using Docker, Amazon ECS, and CI/CD tools such as Jenkins and Code Pipeline, streamlining job execution and updates.
Used AWS Step Functions to orchestrate complex multi-stage workflows involving Lambda, Glue, and S3 interactions, improving traceability and error handling.
Integrated Amazon SQS and SNS for distributed messaging and event-driven triggers across ingestion and transformation processes.
Performed data validation, reconciliation, and transformation using SQL, Python, and Spark SQL, ensuring high data quality during batch and real-time operations.
Implemented monitoring and alerting for ETL jobs and streaming pipelines using Amazon Cloud Watch, with custom metrics, dashboards, and automated recovery steps.
Migrated structured and semi-structured datasets from on premise and third-party platforms into AWS data lake architectures, ensuring data integrity, lineage, and compliance.
Developed distributed data processing applications using Apache Spark on Amazon EMR, performing large-scale data aggregations, joins, and transformations.
Worked with Hadoop Distributed File System (HDFS) for ingesting large datasets and querying them using Apache Hive, including creating external tables, partitions, and bucketed tables for performance optimization.
Client: Accenture, India Jan 2022 – Jul 2023
Role: Big Data Engineer
Responsibilities:
Designed and developed distributed healthcare data pipelines on Azure and Hadoop ecosystems (HDFS, YARN, MapReduce) to process EHR, claims, and clinical data securely and efficiently.
Built and optimized ETL workflows using Azure Data Factory, DBT, and Snowflake, integrating data from Epic Caboodle and FHIR APIs for analytics and reporting.
Implemented real-time and batch processing with Apache Spark on Azure Databricks and Event Hubs/Kafka, enabling streaming of patient events and operational metrics.
Developed and deployed AI/ML models in Azure Machine Learning to predict readmission risks, patient flow, and ROI of healthcare initiatives using regression and gradient boosting.
Ensured HIPAA-compliant data governance with Azure Key Vault, ADLS Gen2 ACLs, and role-based access for secure data management.
Optimized pipeline performance and resource utilization using YARN tuning, Databricks job optimization, and Delta Lake partitioning strategies.
Automated deployment and monitoring through Azure DevOps, CI/CD pipelines, and Azure Monitor, improving reliability and reducing downtime.
Delivered interactive Power BI dashboards leveraging Snowflake and ADLS data, providing real-time insights into clinical quality, patient outcomes, and operations.
Client: Metamorph IT Systems Pvt Ltd, India Apr 2020 – Dec 2021
Role: ETL Tester
Responsibilities
•Tested ETL jobs in Informatica and Talend to ensure accurate data loading from multiple sources.
•Executed end-to-end ETL pipeline testing in Azure Data Factory, ensuring seamless data flow and improving overall data reliability.
•Worked with CSV/JSON files and verified data loading into relational databases.
•Collaborated with cross-functional teams to troubleshoot and resolve issues, optimizing ETL processes and enhancing performance.
•Monitored and validated ETL workflows, ensuring timely data loads into SQL Server and handling exceptions effectively.
•Enhanced data warehouse testing and implemented performance optimizations, resulting in better accuracy and efficiency.