Nivruthi P
Phone: +1-512-***-**** Email: ***********@*****.*** LinkedIn
PROFESSIONAL SUMMARY:
● Data Engineer with 5 years of experience in designing, building, and optimizing scalable data pipelines and analytics solutions across healthcare and enterprise domains.
● Strong expertise in Azure Data Factory (ADF), Databricks, Azure Synapse, Azure Data Lake, Delta Lake, PySpark, and SQL for end-to-end data integration, transformation, and automation.
● Hands-on experience developing ETL pipelines and workflow automation for high-volume healthcare and financial datasets, ensuring data accuracy, quality, and reliability.
● Proficient in designing and optimizing data models (Star & Snowflake Schema, Fact/Dimension Tables) in Azure Synapse and Snowflake for analytical and BI reporting.
● Experienced in data ingestion from APIs, SQL databases, and flat files, and performing data migration into Azure Data Lake and AWS Glue environments.
● Collaborated with Data Analysts and Data Scientists to support AI/ML model development, feature engineering, and model deployment using Python, TensorFlow, and Pandas.
● Skilled in implementing data validation, data cleaning, and data governance frameworks ensuring compliance with GDPR and HIPAA for secure healthcare data.
● Adept in using Azure DevOps, Git, and Jenkins for CI/CD, version control, and collaborative workflow management.
● Experienced in Power BI Integration, Looker dashboards, and cross-functional teaming to enable data-driven decision-making.
● Proven ability to work in Agile/Scrum environments, translating business requirements into efficient, production-ready data engineering solutions.
TECHNICAL SKILLS:
● Cloud & Big Data: Azure Data Factory, Databricks, Azure Synapse, Azure Data Lake, Delta Lake, AWS (S3, EC2, RDS, Lambda, ECS, QuickSight, Kinesis), GCP (BigQuery, Dataflow, Google Cloud Storage)
● Programming & Scripting: Python, SQL, PySpark, Pandas, NumPy, Matplotlib, TensorFlow, C, Java, R, Scala, Apex, Shell Scripting (Bash), Data Structures
● ETL & Data Engineering : Data Pipelines, Apache Spark, Databricks, Kafka, Airflow, Informatica, Snowflake, AWS Glue, ADS, Presto, Flink, DBT, Hadoop, Hive, API Integration, Data Migration, Workflow Automation, Data Visualization
● Data Modeling & Warehousing: Star & Snowflake Schema, Fact/Dimension Tables, SQL Server, Data Governance, Data Transformation, Data Modeling, Data Architecture, Data Warehouse
● Data Quality & Governance: Data Validation, Data Cleaning, GDPR, HIPAA Compliance, Data Security, Data Encryption, Role-Based Access Control (RBAC)
● DevOps & Version Control: Azure DevOps, Git, Jenkins, CI/CD, Docker, Kubernetes
● Reporting & Collaboration: Power BI Integration, Looker, Data Support for Dashboards, Cross-functional Teaming
● Project Management: Agile, Scrum, Sprint Planning, JIRA
● Soft Skills : Effective Communication, Problem-Solving, Optimization Techniques, Collaborative, Teamwork, Continuous Learning, Innovative Thinking
CERTIFICATIONS:
● AWS Certified Data Engineer Associate
● AWS Certified Cloud Practitioner
● Google Data Analytics
PROFESSIONAL EXPERIENCE:
CVS Health, Data Engineer – Texas Jan 2023 – Present
● Built and maintained ETL pipelines using Azure Data Factory (ADF), AWS Glue, and Databricks to process large-scale EHR, claims, and pharmacy data across healthcare domains.
● Designed and implemented data ingestion to AWS services such as S3, RDS, Redshift, and Glue, integrating with Azure Data Lake and Synapse for hybrid cloud analytics.
● Developed and orchestrated ETL workflows in ADF and AWS Glue, leveraging connections, crawlers, and triggers to automate data extraction, transformation, and loading from SQL Server, APIs, and flat files.
● Created automated event-driven data ingestion pipelines using AWS Lambda and ADF Web Activities, improving near real-time data availability in Amazon Redshift and Azure Synapse.
● Designed and optimized data models (Star & Snowflake Schemas, Fact/Dimension Tables) in Azure Synapse and Snowflake, supporting clinical analytics and Power BI reporting.
● Implemented incremental data loads using Delta Lake, improving data refresh performance and reducing compute cost by 35%.
● Applied PySpark, SQL, and Python for data cleaning, transformation, and aggregation of patient, claims, and pharmacy datasets.
● Built data validation frameworks to ensure accuracy and reliability across source systems; implemented data governance and reconciliation checks for regulatory compliance.
● Implemented HIPAA, GDPR, and ISO/IEC 27001 standards for data handling; used data encryption, masking, and Role-Based Access Control (RBAC) for PHI security.
● Managed Terraform deployments for automating infrastructure across AWS Glue, S3, Lambda, and IAM roles, standardizing data pipeline provisioning.
● Developed Power BI dashboards and Looker reports by connecting to Azure Synapse and AWS Redshift, enabling healthcare leaders to track KPIs and patient risk scores.
● Deployed containerized data services using Docker and orchestrated with Amazon EKS, enabling fault-tolerant clinical data processing.
● Used Azure DevOps, Git, and Jenkins for CI/CD automation, code versioning, and workflow monitoring.
● Created comprehensive technical documentation detailing data sources, transformations, and dependencies.
● Collaborated in Agile/Scrum cycles, participating in sprint planning, code reviews, and process improvement initiatives.
● Supported UAT testing and QA validation for pipeline deployments, ensuring production readiness and data accuracy.
Cognizant, Data Engineer – India May 2020 – May 2022
● Designed and developed ETL pipelines using Azure Data Factory, Azure Databricks, and Azure Functions to ingest, transform, and orchestrate high-volume banking data from core systems, credit bureaus, and payment processors.
● Utilized Azure Blob Storage, ADLS Gen2, and Delta Lake for staging and transformation; improved ETL efficiency by 40% using PySpark and adaptive query optimization in Databricks.
● Built real-time and batch data ingestion workflows integrating APIs via ADF Web Activities, Azure Event Hubs, and Kafka, ensuring near real-time processing of credit card and fraud data.
● Developed SQL-based validation scripts and DBT transformation logic integrated with Azure Synapse SQL and Data Lake Storage, maintaining high data quality and lineage.
● Implemented Delta Lake for ACID-compliant data storage with audit trails and incremental updates, reducing reconciliation effort by 60%.
● Performed data migration from on-prem SQL environments to Azure Synapse and Data Lake, ensuring schema consistency and business continuity.
● Designed Power BI dashboards connected to Synapse, Snowflake, and SQL Server to visualize financial KPIs such as loan performance, segmentation, and fraud metrics.
● Integrated AI/ML and predictive analytics using Python, TensorFlow, and Statistics for early anomaly detection and credit risk prediction.
● Deployed CI/CD pipelines with Azure DevOps, Jenkins, and Docker, reducing release cycles by 60%; used GitHub for source control and peer-reviewed code merges.
● Implemented RBAC and encryption via Azure Key Vault for API and database access, ensuring compliance with PCI-DSS, GDPR, and E-Commerce security standards.
● Orchestrated batch and streaming workflows in Apache Airflow with dependency tracking, retry mechanisms, and alerting for high-availability pipelines.
● Provisioned cloud infrastructure using Terraform to support scalable Azure data platform components.
● Collaborated with Data Scientists and BI teams to model data for Power BI Integration and self-service analytics.
● Contributed to data governance frameworks using Azure Purview, maintaining audit logs, role assignments, and compliance documentation.
● Used JIRA for issue tracking, sprint management, and project visibility within Agile development cycles.
● Authored detailed technical documentation for SQL logic, ETL workflows, and deployment steps, ensuring audit-readiness and smooth handovers.
EDUCATION:
Master's in Management Information Systems December 2023 Auburn University at Montgomery