SREEKAR JENEPALLI
United States +1-913-***-**** ****************@*****.*** linkedin github
Summary
Cloud Data Engineer with 5 years of experience designing scalable ETL pipelines and cloud data architectures. Adept at implementing Change Data Capture and optimizing Apache Spark ETL jobs, including streaming and batch processing. Proven track record in transforming raw data into queryable insights while enhancing performance and compliance in enterprise healthcare environments.
Skills
• Cloud Platforms: Snowflake, Apache Airflow, AWS Glue, AWS Lambda, AWS EMR, AWS S3, Redshift, Athena, Kinesis, DMS, Step Functions, Cloud Formation, Lake Formation, CloudWatch, Azure, Azure Data Factory, Azure Synapse Analytics, Microsoft azure, AWS Cloud Services, dbt, Cloud Data Services, Microsoft Fabric, AWS Skillset
• Data Engineering: ETL Pipelines, Data Integration, PySpark, Spark SQL, SQL, AWS Glue Jobs, AWS DMS, Kafka, APIs, Event-driven Pipelines, Data Pipelines, Databricks, Data Architecture, Cloud Development, spark, Change Data Capture, Scala, Apache Hudi, Apache Griffin, AWS Deequ
• Data Warehousing And Modeling: Dimensional Modeling, Star and Snowflake Schema, Data Vault, Delta Lake, Lake- house Architecture, Redshift Spectrum, S3-based Analytics, Partitioning, Indexing, Data Modeling, Relational Databases, NoSQL Databases
• Software Engineering: Design Patterns, Git, CI/CD Pipelines, Code Reviews, Unit Testing, Agile, Scrum, Python Scripting, Analytical Skills, Java
• Infrastructure As Code And DevOps: Terraform, AWS CloudFormation, AWS CDK, Jenkins, GitLab CI/CD, Automated Deployments, Infrastructure Automation, Spot Optimization, Azure Devops, SSIS, Star Schema Design
• Monitoring And Governance: AWS CloudWatch, AWS X-Ray, Glue Data Catalog, Lake Formation, Row-Level Security, Data Validation, Lineage, HIPAA Compliance, Cost Tagging, Real-time Data Processing, Data Governance
• Analytics And Visualization: Power BI, Tableau, DAX, Redshift and S3 Dashboards, Athena, Business Intelligence Reporting, Data Quality Assurance
• Industry Experience: Healthcare, Cloud Modernization, Operational and Clinical Analytics, Financial Analytics Experience
Johnson & Johnson Aug 2024 - Present
Cloud Data Engineer New Jersey, USA
• Designed and developed event-driven ETL pipelines with AWS Step Functions, Glue, and Lambda to deliver scalable data ingestion and near real-time analytics for enterprise healthcare workloads.
• Optimized PySpark workloads on EMR to process large-volume datasets, achieving a 30% increase in transformation throughput and leveraging Apache Spark Data Frames.
• Automated infrastructure provisioning using Terraform and CloudFormation, reducing environment setup time by 40%.
• Instituted validation and lineage frameworks with Glue Data Catalog, Lake Formation, and CloudWatch to enhance data governance and auditability.
• Modeled analytical layers in Redshift using dimensional and Data Vault techniques, boosting BI query performance by 25%.
• Implemented HIPAA-compliant security controls including Row-Level Security, encryption keys, and IAM-based access policies to strengthen data protection.
• Developed and maintained CI/CD pipelines in Jenkins and GitLab CI/CD, reducing release failures by 20% and accelerating deployment cycles.
• Delivered compliance and performance dashboards for 200+ stakeholders, shortening audit turnaround and improving executive visibility.
Optum Oct 2020 - Aug 2023
ETL Developer Hyderabad, India
• Engineered automated data pipelines integrating Revenue Cycle Management (RCM), claims, and financial datasets using AWS Glue, Python, and SQL to improve data accuracy and operational efficiency by 35%.
• Migrated on-premise workflows to AWS Redshift and S3 via DMS, enabling scalable and secure healthcare data ware- housing for analytics and compliance reporting.
• Developed PySpark transformation frameworks on EMR for processing high-volume healthcare transactions, optimizing compute utilization and reducing batch latency by 25%.
• Designed data warehouse schemas (Star and Snowflake) supporting financial, claims, and operational dashboards for clinical and business teams.
• Automated error handling, logging, and alerting with CloudWatch, SNS, and Python utilities, reducing pipeline incidents by 40%.
• Collaborated on DevOps automation with Terraform and Jenkins to standardize infrastructure deployments for analytics workloads.
• Partnered with business analysts to translate reporting requirements into reusable and scalable data models, improving report delivery speed by 25%.
Novartis Jan 2019 - Oct 2020
Data Engineer Intern Hyderabad, India
• Supported the development of ETL pipelines in AWS Glue and Python to integrate healthcare and clinical datasets for downstream analytics.
• Assisted in building data ingestion workflows to populate Redshift and S3 environments, ensuring consistency and accessibility for analysts.
• Wrote SQL queries and validation scripts to verify record accuracy and completeness across multiple data sources.
• Contributed to the creation of introductory dashboards in Tableau and Power BI to visualize operational and clinical performance metrics.
• Documented data flow diagrams, schema mappings, and process steps to support future automation and onboarding efforts.
Projects
Azure Data Lakehouse for Retail Analytics
• Ingested 10M+ structured and semi-structured retail records into Azure Data Lake using Data Factory pipelines; built curated layers in Synapse Analytics with partitioning and indexing to reduce query latency by 35%.
• Automated transformation workflows using Data Factory triggers and Power BI refresh pipelines, ensuring consistent global reporting.
• Delivered Power BI dashboards for sales forecasting, churn prediction, and retention analysis that simulated a 10% uplift in customer engagement.
IoT Streaming Analytics on AWS
• Built a real-time IoT pipeline processing 3M+ sensor events per day using AWS Kinesis, Glue, and EMR with Spark Streaming to enable continuous monitoring.
• Deployed anomaly detection models in Python to predict device failures, reducing manual maintenance by 30%.
• Stored enriched streams in Redshift and visualized performance metrics in Power BI dashboards, improving response time by 15%.
Healthcare Claims Data Platform on Azure
• Integrated millions of healthcare claims and RCM records into Azure Synapse using Data Factory pipelines and applied data masking for PHI protection.
• Implemented Key Vault encryption and role-based access controls to comply with HIPAA and GDPR standards.
• Applied ML models in Synapse ML to identify fraudulent claims, reducing anomaly detection time by 20%.
• Developed Power BI dashboards for compliance and finance teams, improving audit visibility and cutting report turnaround by 25%.
Education
University of Missouri – Kansas City (UMKC) Aug 2023 - May 2025 Master of Science, Computer Science Kansas City, Missouri, United States
• GPA: 3.5
• Coursework: Cloud Computing, Data Science, Business Analytics, Deep Learning, IoT, Blockchain, Statistical Learning, Advanced Operating Systems, Information Security Assurance Jawaharlal Nehru Technological University (JNTUH) May 2018 - Aug 2022 Bachelor of Technology, Computer Science and Engineering Hyderabad, Telangana, India
• GPA: 7.0
• Achievements: Graduated with First Class Distinction., Undertook academic projects on data integration, analytics pipelines, and visualization systems.
Certifications
• AWS Certified Solutions Architect – Associate (SAA-C03)
• Databricks Certified Data Engineer Professional
• Microsoft Certified: Azure Data Engineer Associate (DP-203)
• Microsoft Certified: Azure Fundamentals (AZ-900)