Avinash Parasa
DATA ENGINEER
Texas, USA 551-***-**** *************.****@*****.*** LinkedIn
SUMMARY
• Over 4 years of experience as a Data Engineer in both the healthcare and finance sectors, specializing in data pipeline optimization, real-time analytics, and data integration for large-scale datasets.
• Expert in designing and implementing ETL workflows using tools like Apache Kafka, Apache Spark, AWS Glue, and Apache Airflow, ensuring efficient data transformation and processing.
• Proficient in cloud platforms such as AWS (EC2, S3, Lambda, Redshift) and GCP, utilizing cloud technologies to build scalable and cost-effective data solutions.
• Strong background in data security, implementing GDPR, PCI DSS, and HIPAA compliance protocols for sensitive data protection and maintaining high standards of data governance.
• Experience with data modeling and optimizing data storage using AWS Redshift, PostgreSQL, and MySQL, enhancing query performance and reducing storage costs.
• Skilled in real-time analytics, having built platforms using Apache Kafka and AWS services to monitor credit card transactions for fraud detection and providing real-time insights into customer behavior.
• Created automated dashboards and visualizations in Power BI and Tableau, enabling business stakeholders to gain actionable insights into transaction trends, patient outcomes, and key performance metrics. SKILLS
Methodologies: SDLC, Agile, Waterfall
Programming Languages: Python, SQL, R, Scala
Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn Visualization Tools: Tableau, Power BI, Excel (Pivot Tables, VLOOKUP), Amazon Quicksight IDEs: Visual Studio Code, PyCharm, Jupyter Notebook Cloud Platforms: Amazon Web Services (IAM, S3, VPC, EC2, Athena, AWS Glue, Lambda, EMR, Redshift, Sage Maker), Google Cloud Platform, Microsoft Azure Database: MySQL, Oracle SQL, PL/SQL, PostgreSQL, MongoDB Data Engineering Concepts: Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, dbt, ETL/ELT, PySQL, PySpark Other Technical Skills: DAX, SAS, JIRA, SAP, SSIS, SSRS, Machine Learning Algorithms, Mathematics, Probability distributions, Confidence Intervals, Hypothesis Testing, Regression Analysis, Linear Algebra, Advanced Analytics, Data Mining, Data Visualization, Data warehousing, Data transformation, Data Storytelling, Association rules, Clustering, Classification, Regression, A/B Testing, Forecasting & Modelling, Data Cleaning, Data Wrangling, Process Mapping, Solution Oriented, Ad Hoc Analysis, Project Management, Data Presentation, Requirement Gathering, Root Cause Analysis, Data Sets, Data Modules, Quantitative Analytics, Docker, Big Data & AI Integration, PCI DSS, Data Lineage & Masking, RBAC, CI/CD Pipeline Version Control Tools: Git, GitHub
Operating Systems: Windows, Linux, Mac OS
EXPERIENCE
Data Engineer HCA Healthcare, TX May 2024 – Present
• Contributed to the development of a HIPAA-compliant multi-cloud healthcare analytics platform on GCP integrated with AWS, enabling ingestion, processing, and analysis of multi-terabyte patient health data from EHR, clinical, and billing systems to support clinical decision-making and operational improvements.
• Developed ETL/ELT pipelines in Databricks (PySpark) for processing structured and semi-structured healthcare data, applying quality checks before storing in a centralized Google Cloud Storage (GCS) data lake.
• Built Cloud Data Fusion workflows and Apache Spark jobs on Dataproc to ingest and consolidate data from 12+ disparate sources into GCS, supporting batch and streaming workloads.
• Created dbt transformation models in Snowflake and BigQuery using dimensional modeling (star schema, SCD Type 2) for consistent reporting and improved data discoverability.
• Assisted in building workflows with Apache Airflow and Google Cloud Pub/Sub for real-time ingestion from AWS S3 and on-prem systems into GCP, reducing data latency from hours to near real-time.
• Utilized AWS Lambda and AWS Glue to extract and transform millions of records daily from external vendor systems before loading into GCP datasets.
• Designed ER diagrams and pipeline flow documentation in Figma to align development with data governance guidelines.
• Implemented PHI/PII masking and encryption (KMS) to secure sensitive data at rest and in transit, ensuring HIPAA compliance across 100% of datasets.
• Performed automated data quality checks using Great Expectations and dbt tests, reducing downstream data errors by 30%.
• Optimized SQL queries in Snowflake, BigQuery, and Spark SQL with partitioning, clustering, and materialized views, improving performance by 25%.
• Collaborated with BI teams to develop Looker and Power BI dashboards visualizing patient outcomes, readmission rates, and treatment metrics for 200+ clinicians.
• Created PySpark based feature sets for predictive modeling, enhancing readmission risk prediction accuracy by 15%.
• Worked within an Agile environment, using Jira and Confluence for sprint tracking and documentation. Data Engineer Capegemini, India May 2020 – Aug 2023
• Developed Azure-based data engineering solutions for a large retail and financial services client, building ETL/ELT pipelines to process structured and unstructured data from transactional, customer feedback, and market data sources.
• Designed ingestion workflows in Azure Data Factory to integrate on-prem SQL Server, APIs, and external feeds into Azure Data Lake Storage and Azure Synapse Analytics.
• Built and optimized PySpark jobs in Azure Databricks for large-scale transformations, improving processing times by 30%.
• Applied dimensional modeling (star schema, snowflake schema, SCD Types) in Azure Synapse to support advanced analytics and high-performance reporting.
• Implemented Kafka Connect for real-time ingestion of transactional and streaming data into the Azure ecosystem.
• Orchestrated workflows using Apache Airflow, enabling consistent scheduling and monitoring across multiple environments.
• Developed PyTest-based unit tests for transformation scripts, ensuring data pipeline reliability.
• Configured Role-Based Access Control (RBAC) and data encryption to secure sensitive financial data in compliance with PCI-DSS standards.
• Integrated curated datasets into Power BI and Tableau dashboards (in collaboration with BI teams) for customer segmentation, sales performance, and inventory forecasting.
• Leveraged Azure DevOps and Terraform for CI/CD automation and infrastructure provisioning, ensuring reproducible deployments.
• Created data validation scripts in Python and SQL, reducing reporting errors by 25%.
• Participated in Agile ceremonies, using Jira for sprint tracking and Confluence for technical documentation. EDUCATION
Master of Science in Computer Science – University of North Texas, USA Bachelor of Technology in Electronics & Communication Engineering – K L University, India