P JEEVANA CHOWDARY
Cincinnati, OH 513-***-**** ****************@*****.***
SUMMARY:
Data Engineer with 5+ years of experience designing and building scalable, secure data pipelines across AWS, Azure, and GCP. Proficient in Apache Spark, PySpark, Kafka, Hadoop, and SQL for processing structured and semi-structured data. Developed ETL/ELT workflows using AWS Glue, Azure Data Factory, and Apache Airflow, integrating diverse sources into robust data processing systems and scalable data infrastructure using cloud-based tools and platforms like data lakes, warehouses, and Databricks lakehouse. Strong in data modeling with star and snowflake schemas on Snowflake, BigQuery, Redshift, and Synapse, leveraging modeling tools for schema design and performance tuning. Implemented CDC using Kafka Connect and Debezium, and automated CI/CD with Terraform, GitHub Actions, and Jenkins. Enforced governance using dbt, Great Expectations, and validation logic to ensure HIPAA and GDPR compliance. Experienced with PostgreSQL, SQL Server, MongoDB, DynamoDB, and Cassandra, and delivered insights via Power BI, Tableau, QuickSight, and Looker. Strong academic foundation in Computer Science, Physics, and Mathematics, with applied expertise in data architecture, wrangling, and AI/ML pipelines within SDLC frameworks, supporting business applications and data-driven product management initiatives.
WORK EXPERIENCE:
Data Engineer, Cigna Health Care, United States Jan 2024-Present
• Built ETL workflows with AWS Glue and Lambda, ingesting CSV, and API data into Amazon S3, transforming and loading it into Redshift for downstream analytics while aligning with SDLC best practices and modern data infrastructure standards and improving data acquisition processes.
• Created Python-based transformations to standardize clinical and claims data structures, leveraging domain knowledge, data wrangling techniques, and implementing unit tests using pytest to reduce anomalies and support anomaly detection, enforce quality standards and Quality Checks, and enhance data integrity, improving data reliability by 35%.
• Modeled datasets in Redshift and PostgreSQL using normalized and dimensional schemas, supported by ER diagrams, enabling data exploration and statistical summarization using advanced SQL queries including window functions, CTEs, and aggregation.
• Optimized slow-running SQL by refactoring queries, introducing materialized views, and indexing strategies, driving performance improvement in Redshift performance, and aligning with SQL optimization best practices.
• Implemented data profiling and quality validation using Great Expectations, integrating checks into CI/CD pipelines with GitHub Actions and Step Functions, ensuring adherence to internal data governance, regulatory requirements, and compliance standards.
• Built secure data workflows by enforcing encryption with AWS KMS, managing access with IAM and Lake Formation, and documenting architecture in Confluence to maintain traceability and enhance data management while safeguarding confidential data in alignment with AWS regulatory guidelines.
• Delivered exploratory and ad-hoc analysis using Power BI, QuickSight, and Python, aiding clinical, actuarial, Customer Service, Sales teams with statistical insights using hypothesis testing, distributions, and probability-based evaluations, supporting Business Intelligence Solutions, public health reporting and Math functions for forecasting and performance trends.
• Integrated clinical data sources following CDISC principles, mapping records into structured ADaM-ready formats to support downstream health analytics in a scalable, cloud-native architecture.
• Deployed pipeline infrastructure using CloudFormation, managed secrets via Secrets Manager, and leveraged GitHub and Code Build to maintain reproducibility and traceability.
• Developed exploratory ML models using Python (scikit-learn, XGBoost) to predict claim denials and patient churn trends, contributing to data-driven healthcare strategy, research, and operational planning.
• Used CI/CD practices for pipeline versioning and production deployment across dev/test/prod environments, emphasizing rollback readiness and consistent environment promotion aligned with agile development methodologies and continuous integration principles.
• Interfaced with cross-functional teams, supporting multiple priorities and documenting pipeline logic and retry patterns in team wikis to ensure knowledge transfer and operational resilience, and effective root cause analysis during incident triage. Data Engineer, Cognizant Technology Solutions, India Jan 2020-Dec 2022
• Designed scalable ETL pipelines using Azure Data Factory and Data Lake Storage to ingest, transform, and load datasets (including XML formats) into Azure Synapse, supporting SDLC-aligned delivery, business applications, and Data Solutions like customer profiling and financial transaction modeling and reusable data architecture for analytics.
• Developed real-time fraud detection pipelines using Apache Kafka and Spark Structured Streaming, integrated with Synapse and Azure SQL, enhancing finance risk analytics and anomaly detection capabilities in high-throughput transaction pipelines.
• Modeled dimensional warehouse structures using star and snowflake schemas in Synapse and Snowflake, supporting BI reporting, ad-hoc analytics, and regulatory compliance with optimized performance and compute for enterprise data warehouse architecture.
• Created reusable Python modules for data validation, exception handling, and log tracking to ensure consistency, traceability, and reliability across Azure Data Factory workflows and Azure DevOps using best practices in software design and Automated Testing.
• Deployed infrastructure as code using Terraform and Bicep for provisioning ADF pipelines, Synapse pools, Key Vault secrets, and monitoring configurations, supporting reproducible, version-controlled cloud environments.
• Orchestrated batch and streaming jobs using Apache Airflow and ADF triggers with dynamic retry logic, SLA enforcement, and alerting for enhanced pipeline resilience and operational awareness.
• Built CI/CD pipelines with Azure DevOps and GitHub Actions to deploy SQL scripts, PySpark code, and data pipeline templates. for reusable data products, reducing manual deployment effort and improving release stability through continuous integration workflows.
• Enforced data protection using Azure Key Vault, row-level security, and dynamic data masking to comply with HIPAA, GDPR, and SOC 2 and regulatory standards across Synapse and Azure SQL.
• Designed and published Power BI dashboards connected to Synapse and Azure SQL using DirectQuery, enabling live reporting, operational KPIs, and self-service analytics for cross-functional stakeholders. SKILLS
• Cloud Platforms & Services: AWS Services (S3, Redshift, Glue, Lambda, RDS, Athena, SNS), Azure (ADF, Synapse, Data Lake, DevOps), GCP (BigQuery, Dataflow, Cloud Composer, Cloud Spanner), Snowflake, Databricks, Google Kubernetes Engine
(GKE), Terraform, Docker, Kubernetes
• Data Engineering & Pipelines: Apache Airflow, AWS Glue, Azure Data Factory, Prefect, dbt (Data Build Tool), Fivetran, Informatica, Talend, Apache NiFi, Data Fusion, SSIS, Autosys, Apache Flume, Apache Sqoop
• Big Data & Hadoop Ecosystem: Apache Spark (PySpark, Scala), Delta Lake, Apache Kafka, HDFS, MapReduce, Hive, HBase, Pig, YARN, Cloudera, Hortonworks
• Databases (SQL & NoSQL): PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, DynamoDB, Cassandra, Amazon Redshift, Snowflake, BigQuery, Cloud Spanner, Teradata, DB2, HBase
• Programming Languages& Scripting: Python, SQL (queries, CTEs, window functions), Math functions, Java, Scala, Shell Scripting, R, PL/SQL, JavaScript
• Data Modeling & Warehousing: Dimensional Modeling, Star/Snowflake Schema, Medallion Architecture, Data Vault, ETL/ELT
• Monitoring, CI/CD & DevOps: Git, GitHub Actions, Bitbucket, Jenkins, Terraform, Prometheus, Grafana, Confluence, JIRA
• Data Quality, Testing & Governance: Great Expectations, SodaSQL, Monte Carlo, Data Contracts, OpenMetadata
• Visualization & Reporting Tools: Power BI, Tableau, Looker, SSRS, Crystal Reports, Matplotlib, Seaborn, Numpy, Pandas, Excel
• AI/ML & Data Science: scikit-learn, TensorFlow, Keras, XGBoost, CNN, RNN, SpaCy, NLTK, LangChain, Time Series Analysis
• Development & IDEs: PyCharm, Jupyter Notebook, IntelliJ IDEA, Eclipse, Visual Studio
• Version Control & OS: Git, GitHub, Bitbucket, JIRA, Confluence, Windows, Linux, Unix, Mac OS
• Other Tools & Utilities: Postman, JSON, SaaS, PaaS, Google Analytics, Spring Boot, Microsoft Office, GTM, Artifactory, Putty EDUCATION
Master of Science, Information Technology Jan 2023-Apr 2024 University of Cincinnati, Cincinnati, OH
• Relevant Coursework: Cloud Technology (AWS, Azure), DBMS, Machine Learning and Data Mining, Data Analytics, Data Architecture, Software Development Lifecycle (SDLC) Bachelor of Technology in Computer Science and Engineering Jun 2017-July 2021 Jawaharlal Nehru Technology University, Kadapa, AP PROJECTS
Chatbot Using LLM
• Architected an AI chatbot using OpenAI and LangChain, effectively handling over 500 user queries with improved accuracy.
• Created a Streamlit web app that facilitated natural language interactions, enhancing user engagement with the AI chatbot. Analysis on YouTube Data
• Implemented a web app on AWS with Cognito and DynamoDB, reducing user management overhead by 25%.
• Optimized data transformation using AWS Glue and PySpark, increasing efficiency by 40% and integrating QuickSight for improved query performance.
CERTIFICATES
• Microsoft Azure Data Engineering Associate (DP-203)