Sandesh Phuyal Senior Data Engineer
***************@*****.***
Professional Summary
Experienced Data Engineer with over 7+ years of hands-on expertise in building and optimizing data systems, focusing on big data and cloud technologies.
Proficient in leveraging Google Cloud Platform (GCP), AWS, and Azure to develop efficient, high-performance data architectures and pipelines.
Expert in designing and executing complex ETL pipelines using Apache Airflow, enhancing data transformation and integration processes.
Skilled in leveraging big data technologies such as Hadoop, Spark, and Kafka to manage large datasets and facilitate real-time data processing.
Skilled in implementing GCP services like BigQuery, Dataflow, Pub/Sub, and Cloud Storage for large-scale, real-time data processing and advanced analytics.
Advanced proficiency in programming with Python, Java, SQL, and Scala, developing solutions that improve data processing and analytics.
Demonstrated ability to deploy machine learning models into production, utilizing data to drive decision-making and business insights.
Strong background in managing both SQL and NoSQL databases including MySQL, PostgreSQL, MongoDB, and Snowflake.
Experienced with continuous integration and deployment using Jenkins, Git, and Terraform, ensuring efficient and error-free code deployments.
Capable of performing advanced data analysis and statistical modeling to identify trends and make data-driven recommendations to stakeholders.
Knowledgeable in data security and compliance, implementing best practices to protect data integrity and confidentiality.
Extensive experience in designing, implementing, and optimizing data systems using AWS services like Redshift, Lambda, S3, DynamoDB, and Glue
Proficient in using data visualization tools to translate complex datasets into understandable and actionable insights for business use.
Adept at automating and optimizing data workflows to increase efficiency and reduce operational costs.
Skilled communicator, capable of translating technical details into clear, strategic business language for cross-functional teams and senior management.
Committed to ongoing professional development and staying current with industry trends and technologies.
Member of professional organizations like ACM and IEEE, participating actively in conferences and workshops to enhance skills and network with industry peers.
Technical Skills
Programming Languages: Python, Node.js, SQL, Java, Scala,
Databases: MongoDB, MySQL, PostgreSQL, Snowflake, Oracle
Big Data Technologies: Hadoop, Spark, Hive, HBase, Kafka, Redshift, AWS EMR
Cloud Platforms: AWS (EC2, S3, Lambda, Glue, Redshift, Dynamo DB, SNS/SQS), Azure, Databricks, Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage)
Data Modeling: ETL/ELT, Data Warehousing
Tools: Apache Airflow, Jenkins, Git, Maven, Terraform
Analytics: AWS EMR, PySpark, Pandas, NumPy
Version Control: Git, Bitbucket
Project Management: Agile, Scrum, JIRA
BI Tools: Tableau, Quicksight, Looker.
DevOps Tools: AWS CloudFormation, Jenkins, Terraform.
Operating Systems: Unix, Linux, Windows
Professional Experience
Senior Data Engineer- Pfizer Inc., New York, NY August 2022 to Present
Lead the design and implementation of a multi-terabyte data warehouse on AWS Redshift, improving query performance through advanced optimization techniques.
Implemented real-time data processing systems using Apache Kafka and Apache Spark, reducing data latency for critical analytics operations.
Collaborated with business analysts and data scientists to design data models that support complex machine learning algorithms and predictive analytics.
Automated infrastructure provisioning and management using Terraform and AWS CloudFormation, reducing manual setup time.
Enhanced data security protocols and backup strategies, achieving compliance with industry standards such as GDPR and HIPAA.
Spearheaded the migration of legacy systems to cloud-based solutions, overseeing a team of 5 engineers, and delivered the project ahead of schedule.
Conducted comprehensive data audits and optimizations, ensuring high data quality and reliability for business-critical reports.
Designed and implemented scalable data pipelines using Google Cloud Dataflow and Apache Beam for real-time data processing.
Developed custom Python scripts to automate data cleansing and transformation tasks, increasing data accuracy and reducing processing times.
Utilized advanced SQL techniques to develop robust data retrieval systems, significantly improving data access for strategic decision-making.
Introduced machine learning models into production environments, enabling automated anomaly detection and improving operational efficiency.
Built a centralized data warehouse using Google BigQuery, enabling high-performance analytics on multi-terabyte datasets.
Designed and deployed serverless ETL pipelines using AWS Lambda to automate data transformations into Redshift.
Leveraged GCP Pub/Sub for real-time messaging and event-driven architecture to process live data feeds.
Authored detailed documentation on data architecture designs, standards, and procedures to maintain high data governance across the department.
Trained junior data engineers and analysts on best practices in data management and processing.
Regularly reviewed and updated data processing scripts and systems to enhance performance and accommodate evolving business needs.
Integrated AWS Redshift Serverless with Glue, Lambda, and S3 for real-time data ingestion and reporting.
Automated infrastructure provisioning on GCP using Terraform, improving deployment speed and consistency.
Developed dashboards and reports using AWS Quicksight to provide actionable business insights.
Presented findings and insights from data analyses to executive leadership, influencing key business decisions.
Automated infrastructure deployment using AWS CloudFormation, reducing manual effort by 40%
Data Engineer- ServiceNow, Santa Clara, CA March 2020 – July 2022
Designed data integration solutions using ETL tools like Informatica and AWS Glue, simplifying the data flow from ingestion to insights.
Managed the storage and processing of data using AWS services such as S3, EMR, and DynamoDB, ensuring scalability and high availability.
Optimized Hadoop and Spark environments for performance improvements, enabling faster data processing and analysis.
Implemented and maintained robust data pipelines capable of processing hundreds of gigabytes of data per day with minimal downtime.
Developed API integrations for real-time data ingestion from diverse sources, enhancing data freshness and relevance for analytics purposes.
Engineered SQL and NoSQL database optimizations to support both transactional and analytical operations, improving response times and system stability.
Built event-driven microservices using Node.js and Python, integrated with AWS services like SNS, SQS, and DynamoDB.
Migrated legacy data systems to GCP-based architecture, leveraging Cloud Storage and BigQuery for optimized data management.
Utilized version control tools like Git and Bitbucket to manage code changes and collaboration among the data team.
Applied best practices in database schema design and indexing to optimize data retrieval and storage efficiency.
Created dashboards and reports using business intelligence tools such as Tableau and Power BI, providing actionable insights to business units.
Orchestrated workflows using AWS Step Functions, streamlining the processing of data pipelines.
Developed ETL pipelines using Google Dataflow and Cloud Functions, streamlining data integration workflows.
Automated testing and deployment of data models and pipelines using CI/CD practices, increasing deployment reliability and speed.
Participated in the full software development lifecycle, from requirements gathering and system design to testing and deployment.
Implemented complex SQL queries for Redshift to support high-volume data extraction and reporting.
Utilized GCP's AI Platform to deploy machine learning models for predictive analytics and anomaly detection.
Supported data governance and metadata management efforts, ensuring adherence to data standards and policies.
Facilitated the successful adoption of cloud-native services for data analytics and processing, leading to a 50% reduction in operational costs.
Delivered training sessions on new technologies and methodologies to enhance the team's capability and efficiency.
Conducted root cause analyses for data-related issues and implemented corrective actions to prevent future occurrences.
Data Engineer- Chewy, Inc., Plantation, FL January 2017 – February 2020
Architected and developed a centralized data lake on AWS, integrating data from over 20 different source systems, providing a unified data view.
Led the adoption of microservices architecture for data processing tasks, significantly enhancing the modularity and scalability of data operations.
Implemented comprehensive logging and monitoring solutions using Elasticsearch and Kibana, improving system transparency and operational oversight.
Developed robust data transformation frameworks using PySpark, handling complex data structures and large volumes of data efficiently.
Architected a GCP-based data lake, integrating data from various sources using GCP tools like Cloud Storage and Data Fusion.
Streamlined data ingestion processes with automated tools like Sqoop and Flume, reducing manual intervention and increasing data timeliness.
Designed and executed data retention policies and procedures, ensuring optimal performance of the data platform while complying with legal requirements.
Pioneered the use of container technology with Docker and Kubernetes in data processing workflows, enhancing portability and scalability.
Established and enforced data quality checks and balances, ensuring the accuracy and consistency of data across the organization.
Crafted complex data extraction queries to support ad-hoc reporting and analytical needs, reducing turnaround time for business requests.
Provided technical leadership in database design and performance tuning, advising on best practices and advanced optimization techniques.
Coordinated cross-functional teams to ensure seamless integration of data systems with business processes.
Developed and maintained documentation for system architectures, data flows, and processes to ensure consistency and knowledge sharing.
Evaluated and integrated new data management and analytics technologies, staying ahead of industry trends and competitive advantages.
Monitored and optimized the performance of GCP services, ensuring cost-effectiveness and system reliability.
Mentored new team members, providing guidance on technical challenges and career development.
Engaged with stakeholders to define data requirements and deliver tailored data products that meet specific business needs.
Education
Master of Science in Information Technology, Southern New Hampshire University, Manchester/ New Hampshire