Rohith Garre Data Engineer
FL, USA (Open to Relocate) +1-561-***-**** ******.*******@*****.*** LinkedIn SUMMARY
Data Engineer with 6+ years of experience building scalable data pipelines, real-time streaming systems, and cloud-native platforms. Skilled in Python, SQL, Spark, Kafka, Flink, and Hadoop, with expertise across AWS, Azure, and GCP. Strong background in data warehousing (Snowflake, Redshift, BigQuery, Synapse), ETL orchestration (Airflow, Glue, Step Functions), and data modeling (Star, Snowflake, Data Vault, Lakehouse). Proficient in Docker, Kubernetes, CI/CD pipelines, and ensuring compliance with HIPAA, GDPR, and CCPA while delivering reliable, high-quality data solutions.
SKILLS
Languages & Scripting: Python (Pandas, NumPy, PySpark, Scikit-learn for EDA), SQL (T-SQL, PL/SQL, PostgreSQL, MySQL, Spark SQL), Scala, Java, Shell Big Data & Streaming: Apache Spark (Batch & Streaming), Hadoop (HDFS, MapReduce), Hive, HBase, Kafka, Flink, Apache NiFi Cloud Services: AWS (S3, Redshift, EMR, CloudWatch, SQS), Azure Data Factory, GCP Functions ETL & Orchestration: Apache Airflow, AWS Glue, Lambda, Step Functions, Luigi, Control-M Data Warehousing & Databases: Snowflake, Redshift, BigQuery, Azure Synapse, Delta Lake, PostgreSQL, MySQL Data Modeling & Architecture: Dimensional Modeling, Star Schema, Snowflake Schema, Data Vault, Medallion Architecture, Data Lake/Lakehouse, Event-Driven Pipelines, SCD1/SCD2, Partitioning & Bucketing DevOps & Containers: Docker, Kubernetes (EKS/AKS), AWS EC2, CI/CD (GitHub Actions, Jenkins, GitLab, Terraform Data Governance & Compliance: Data Lineage, Metadata Management, AWS Glue Data Catalog, Collibra, HIPAA, GDPR, CCPA Analytics & BI Tools: Power BI, Tableau, Looker, Amazon QuickSight, KPI Development, Performance Dashboards Monitoring & Validation: Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, PyDeequ, CloudWatch Alerts, Custom Data Quality Checks
Other Tools: Faker (synthetic data generation), Seaborn (basic visualization), Statistical Analysis (EDA) Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket, Agile/Scrum PROFESSIONAL EXPERIENCE
Humana Data Engineer FL, USA June 2024 – Present
Built PySpark pipelines on AWS EMR/S3 to process 2 TB/day of claims and eligibility data, improving regulatory reporting (HIPAA, CMS) efficiency by 42%.
Developed real-time ingestion workflows with Kafka, Flink, NiFi to monitor patient admissions and readmissions across 12M+ records, enabling faster care interventions.
Designed Snowflake/Redshift warehouses for cost-of-care and provider network analytics, improving query performance by 60% and cutting reporting cycles from 3 days to 8 hours.
Automated ETL orchestration (Airflow, Glue, Step Functions) for compliance and utilization reports, reducing manual effort by 30+ hours/month.
Delivered Power BI/Tableau dashboards tracking utilization rates, chronic disease management outcomes, and network performance, improving executive decision-making speed by 65%.
Integrated data lineage and metadata tracking with Collibra and AWS Glue Data Catalog, improving audit readiness by 70% and ensuring full compliance with HIPAA and CMS reporting standards. Cybage Software Data Engineer India May 2021 – Apr 2023
Engineered batch and streaming pipelines (Spark, Kafka, Hadoop) to process 1.5 TB/day of e-commerce transactions, reducing data availability lag by 48%.
Built Snowflake/BigQuery warehouses supporting customer behavior analytics and personalization models, improving query response time by 55%.
Migrated legacy ETL to Azure Data Factory, enabling predictive inventory forecasting and dynamic pricing, while cutting infrastructure costs by 33%.
Automated Airflow and Glue workflows powering retail sales and campaign performance dashboards, saving 25 hours/month in manual monitoring.
Designed Looker/Tableau dashboards for conversion funnel, retention, and campaign ROI, reducing reporting turnaround from 2 days to 4 hours.
Implemented data quality validation with PyDeequ and ELK Stack to monitor e-commerce transaction pipelines, reducing critical data errors by 41% and improving SLA compliance for client reporting. Sigma Tech Solution Data Engineer India Apr 2019 – Apr 2021
Developed Spark/Hive pipelines on Hadoop to process 800 GB/day of financial transactions, accelerating fraud detection reporting by 37%.
Built Redshift and Delta Lake warehouses to support AML compliance checks and risk analytics, boosting query performance by 45%.
Orchestrated 120+ ETL jobs in Control-M and Airflow for daily regulatory and audit reporting, achieving 99.7% SLA adherence.
Integrated Kafka streams with PostgreSQL systems for real-time payment monitoring, maintaining <2s latency across 5M+ records.
Implemented data lineage and metadata tracking in Glue Data Catalog, improving compliance audit readiness by 65%. Genius SoftTech Data Analyst India May 2018 – Feb 2019
Collected, cleaned, and transformed 250K+ customer and sales records using Python and SQL for sales and marketing analytics, reducing prep time by 40%.
Built statistical churn models (Scikit-learn) that improved customer retention by 12% through targeted engagement campaigns.
Automated Excel-to-SQL pipelines supporting monthly sales reporting, saving the operations team 15 hours/week.
Delivered Power BI/Tableau dashboards for sales targets, lead conversion, and support metrics, reducing reporting cycles from 3 days to 6 hours.
Conducted A/B testing on marketing campaigns, driving an 18% increase in email conversion rates. EDUCATION
Florida Atlantic University – Boca Raton, FL, USA May 2023 – Dec 2024 Master of Science in Computer Science
Gandhi Institute of Technology and Management (GITAM University) – Visakhapatnam, India Jul 2015 – Apr 2019 Bachelor of Science in Computer Science