Mahevish Husseni
Data Engineer
Location: California Phone: +1-669-***-**** Email: **********@*****.*** LinkedIn SUMMARY
Accomplished Data Engineer with over 5 years of experience in designing and delivering cloud-based data solutions using AWS, Azure, Databricks and Snowflake across healthcare, finance, and enterprise sectors.
Expertise in orchestrating ETL pipelines with tools like AWS Glue, Apache Airflow, and Azure Data Factory, resulting in improved data accessibility and scalability across various environments.
Proficient in big data technologies including AWS Redshift, Hadoop, and MySQL, MongoDB, driving large-scale data processing and enabling high-performance business intelligence and real-time analytics.
Skilled in optimizing OLAP cubes using SSAS, SQL, and DAX, enhancing query performance and improving data accessibility for cross-functional teams.
Experienced in Python, R, SQL, and Scala for data transformation and analysis, integrating machine learning models to enhance predictive analytics and reduce operational risks.
Hands-on expertise in real-time data processing and streaming platforms such as Apache Kafka and AWS Kinesis, enabling immediate data availability for decision-making.
Cloud-native and serverless architecture experience with tools like AWS Lambda and Azure Functions to drive scalable, cost-effective data solutions.
Designed robust identity and access management (IAM) systems, ensuring compliance with HIPAA and GDPR without any violations for over two years.
Developed interactive Power BI, Looker and Tableau dashboards, empowering stakeholders to make informed decisions in areas like revenue forecasting and capital allocation.
Proficient in CI/CD pipeline automation using Terraform, Jenkins, and AWS CodePipeline, ensuring seamless deployments and improved system availability.
Strong understanding of data governance and compliance, with expertise in handling regulated data for industries like healthcare and finance, ensuring adherence to global standards like GDPR and CCPA.
EDUCATION
International Technological University, San Jose, CA Master of Computer Science Jan 2022 - April 2023 L.J. Institute of Engineering and Technology, Ahmedabad, India Bachelor in Information Technology Jun 2008 – Jun 2014 TECHNICAL SKILLS
Languages: SQL, T-SQL, PL/SQL, Python, R, Java, C/C++, Scala, Node.js, JavaScript
Databases: SQL Server, Oracle, MongoDB, DB2, Teradata, BigQuery, PostgreSQL, MySQL, NoSQL, DynamoDB, Redshift, HBase, Presto
Big Data Technologies: Apache Kafka, Apache Airflow, Apache Hadoop, Databricks, Spark, HDFS, Hive, MapReduce, Pig, PySpark
Cloud Platforms: AWS (S3, EC2, Lambda, Redshift, CloudFormation, API Gateway, IAM, Glue, RDS, SageMaker), GCP, Azure (Data Lake, Synapse Analytics, Azure SQL Database, Data Factory, Blob Storage), Snowflake
ETL & Data Engineering Tools: SSIS, SSAS, Informatica PowerCenter, Talend, Snowflake, Data Warehousing, Data Pipelines, Data Governance
Reporting & Visualization: Power BI, SSRS, SAP Business Objects, Tableau, Looker, MS Excel
Development Tools: SQL Query Analyzer, ERWIN, SSMS, SWIFT Tool
Libraries & Frameworks: NumPy, Pandas, Matplotlib, Flask, TensorFlow, Scikit-learn, Keras, Redis
DevOps Tools: Docker, Kubernetes, Terraform, Azure DevOps, Jenkins, Git, GitHub, GitLab, Ansible, CloudWatch, Grafana
Methodologies: Logical and Physical Database Design, UML, Agile, Waterfall, Clustering, Data Mining, Experimental Design
Compliance & Security: GDPR, Role-Based Access Control (RBAC), Encryption WORK EXPERIENCE
Molina Healthcare, CA
Data Engineer Jun 2024 - Current
Contributed to designing data lakes using AWS S3 and Redshift, and Databricks enhancing data accessibility and improving collaboration across clinical and operational teams.
Implemented and optimized Spark SQL queries and data processing tasks within the Spark ecosystem, ensuring scalability and performance for processing and analyzing large-scale datasets
Assisted in migrating on-premises databases to Amazon RDS and AWS Glue and Databricks reducing infrastructure costs by 25% and supporting future scalability.
Played a pivotal role in designing and deploying cloud-based infrastructure solutions for data pipelines, leveraging various AWS services such as CloudFormation, RDS, EC2, Lambda, Athena, S3, DynamoDB, SQS, Redshift, SNS, AWS Glue, CloudWatch, and IAM, with proficiency in Python and PySpark.
Established robust data quality monitoring processes and maintained comprehensive documentation to enhance the user experience, leveraging AWS cloud services for both batch and stream processing.
Implemented, maintained and managed large-scale data warehousing solutions, ensuring scalability, performance, and reliability
Spearheaded the development of an end-to-end data engineering pipeline, from data source identification to schema development, API integration, and data transformation, utilizing AWS services like Amazon S3, RDS, and Glue and GCP services such as BigQuery and Cloud Storage.
Automated ETL pipelines using Python, SQL, and Scala, using Databricks and Docker on AWS and GCP, resulting in a 20% improvement in data accuracy and reduced processing time for healthcare data.
Incorporated unit testing frameworks such as PyTest and JUnit to validate data pipeline components, ensuring reliability and reducing the risk of downstream data errors.
Supported the implementation of graph-based analytics with Hadoop and Neptune, and GCP graph providing insights that assisted clinical decision-making.
Optimized data ingestion processes with AWS Glue and Apache Beam, and GCP Dataflow reducing manual input errors and streamlining data workflows.
Collaborated on building CI/CD pipelines with Terraform, Docker, and AWS CodePipeline, ensuring system uptime and smooth updates.
Assisted in integrating machine learning models into AWS infrastructure via SageMaker, improving predictive analytics for patient care.
Enhanced data security using RBAC, IAM, and encryption protocols, ensuring compliance with HIPAA and GDPR. Blue Cross Blue Shield (BCBS), CA
Data Engineer Jun 2023 – May 2024
Developed and automated ETL packages using SSIS, enabling seamless data integration from multiple sources, and improving data reliability.
Administered and optimized OLAP cubes with SSAS, reducing report generation times by 30%.
Developed and optimized data processing workflows using Hadoop ecosystem components such as HDFS, MapReduce, and Hive, enabling batch processing and analytics on massive datasets.
Led the development of end-to-end data pipelines in Azure, incorporating services like Azure Data Factory, Azure Databricks by using Pyspark and Python
Streamlined data processing by automating ETL workflows in Informatica, Datastage, and Ab Initio, resulting in a 30% reduction in manual handling and improved data processing speed. This initiative enhanced data quality and integrity within Snowflake, with additional integration of Informatica with Airflow for enhanced workflow management.
Leveraged AWS Lambda to automate event-driven ETL processes, ensuring near real-time data ingestion and transformation for dynamic reporting and analytics. Lambda was integrated with S3, Glue, and Redshift, providing a scalable, serverless architecture to improve data processing efficiency.
Developed backend applications to integrate real-time data and automate data loading processes using Oracle SQL, PL/SQL, T-SQL, and Unix Shell scripting. Engaged in SQL performance tuning, troubleshooting, and database administration to ensure optimal database performance.
Designed, optimized, and secured CI/CD pipelines using Apache Airflow, Docker, Kubernetes, and SQL, collaborating cross-functionally for efficient data processing and analysis
Configured and managed Apache Kafka clusters to ensure high availability, fault tolerance, and scalability of data streaming infrastructure, handling terabytes of data per day.
Utilized Snowflake's architecture for external data sharing, supporting data monetization strategies, and collaborated on end-to-end data pipelines, optimizing ETL job performance and modernizing infrastructure, including ETL processes.
Transitioned legacy reports from SSRS to Power BI, enhancing flexibility and enabling self-service analytics for business users.
Deployed real-time data synchronization using Power BI gateways, reducing latency in reporting and improving data visibility for leadership.
Collaborated with data science teams to integrate machine learning models into Power BI, improving predictive analytics and decision-making.
Optimized database query performance by restructuring and indexing, achieving a 20% increase in data retrieval speed. KPMG, India
Data Engineer Feb 2019 – Dec 2021
Developed and optimized ETL processes in Snowflake, leveraging SQL, Snowpark and Snowflake's features to ensure seamless data extraction, transformation, and loading
Executed ETL workflows with AWS Glue and Apache Airflow, and GCP dataflow reducing manual intervention and improving the accuracy of financial reporting.
Designed, implemented, and managed a robust data warehousing solution using Snowflake and GCP bigquery, enabling the organization to efficiently store and analyze terabytes of data.
Created new data validation processes using Python and PySpark and GCP cloud datafusion to ensure data quality and integrity, conducting data profiling and cleansing activities.
Improved query performance by 30% through in-depth Performance optimization, including utilization of Snowflake's advanced SQL features
Designed, built, tested, and maintained data management systems using Apache Spark and Apache Hadoop ecosystem tools such as Hive and HDFS, GCP dataproc enabling efficient data processing and analytics.
Implemented real-time data ingestion pipelines using Snowpipe, enabling near-real-time access to critical data for analytics and reporting and worked on Disaster Recovery and High availability.
Enhanced data management efficiency and accuracy through proficient utilization of Snowflake features including data retention, Time Travel, clustering, and materialzed views along with GCP bigquery. Implemented dynamic data masking and fail-safe mechanisms to ensure data integrity and streamline workflows.
Optimized decision-making processes by leveraging Snowflake functionalities such as data sharing, scheduling tasks, and dynamic data masking as well as GCP Looker Studio for improved data visualization. Enhanced performance of AWS Redshift for high-volume financial data, accelerating data analysis and report generation by 10%.
Engineered cloud-based data solutions on AWS and Snowflake, simplifying financial model storage and improving system efficiency.
Developed interactive Power BI dashboards to support real-time financial metrics, driving more informed capital planning and cost management decisions. Trinity Technolabs, India
Data Analyst Nov 2017 - Jan 2019
Designed and automated Tableau dashboards, cutting report generation time by 5% and providing real-time insights for key stakeholders.
Processed and normalized large datasets using Python and SQL, improving data quality and enabling data-driven business decisions.
Constructed ETL pipelines with Azure Data Factory, increasing data accessibility and streamlining reporting processes for cross-functional teams.