Nitesh V Ph : 1-224-***-****
Cloud Data Engineer ***********@*****.***
Professional Summary
● Over 8 years of progressive IT experience specializing in Big Data ecosystems and Azure cloud technologies, with a strong focus on data ingestion, storage, querying, processing, and analysis.
● Extensive hands-on experience in Azure Cloud Services (PaaS & IaaS), with expertise in Azure Databricks, Azure Synapse Analytics, Azure Cosmos DB, SQL Azure, Azure Data Factory, Azure Analysis Services, Application Insights, Azure HDInsight, Key Vault, and Azure Data Lake.
● Skilled in developing and optimizing data pipelines using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics to enable efficient data integration and transformation.
● Strong background in deploying AI solutions and leveraging Azure Machine Learning for predictive analytics and data-driven insights.
● Proven ability to design and implement scalable data solutions utilizing Azure components such as Azure Data Lake, Azure Blob Storage, and Azure SQL Data Warehouse.
● Hands-on experience with Apache Hadoop ecosystem tools including HDFS, Spark, Hive, Pig, and HBase, ensuring robust data processing capabilities and performance tuning.
● Proficient in Python, SQL, Scala, and R for scripting, data manipulation, and analysis within Azure environments.
● Expertise in implementing CI/CD pipelines and using version control tools like Git for efficient software development lifecycle management.
● Experienced in Agile methodologies and Waterfall project execution, collaborating effectively with cross-functional teams to deliver business-critical data solutions.
● Exceptional problem-solving skills with a focus on optimizing data workflows, enhancing system efficiencies, and ensuring data integrity and security.
● Strong communication and interpersonal skills, adept at translating complex technical concepts to non- technical stakeholders and fostering productive team collaborations.
● Committed to ongoing learning and staying updated with the latest Azure technologies and industry trends to drive continuous improvement and innovation.
● Demonstrated expertise in designing ETL processes, ensuring seamless data migration and integration across diverse systems and platforms.
● Proficient in data modeling and database design, optimizing storage solutions for improved performance and scalability.
● Adept at utilizing Azure Monitor and Azure Security Center for proactive monitoring and ensuring robust security protocols.
● Experience in managing large-scale data projects, delivering high-quality solutions within tight deadlines and budget constraints.
● Skilled in leveraging Azure AI and Cognitive Services to implement intelligent data solutions and enhance business insights.
● Proven track record of leading and mentoring teams, fostering a culture of collaboration and continuous improvement.
● Strong understanding of data governance and compliance, ensuring adherence to industry standards and regulations.
● Ability to architect end-to-end data solutions, aligning with business objectives and driving strategic initiatives.
● Proficient in conducting root cause analysis and implementing corrective actions to resolve complex data issues and ensure system reliability.
● Extensive experience in integrating and managing big data ecosystems with cloud platforms, ensuring seamless interoperability and data flow across multiple environments.
● Expertise in utilizing Azure DevOps for automating build, test, and deployment processes, enhancing efficiency and reducing time-to-market for data solutions.
● Proven ability to optimize cost management and resource utilization in cloud environments, ensuring cost- effective and scalable data solutions.
Technical Skills
Azure Cloud Services Azure Databricks, Azure Synapse Analytics, Azure Cosmos DB, SQL Azure, Azure Data Factory, Azure Analysis Services, Application Insights, Azure HDInsight, Key Vault, Azure Data Lake
Big Data Ecosystem Apache Hadoop (HDFS, Spark, Hive, Pig, HBase), Apache Kafka, Apache Sqoop, Apache NiFi, Apache Pig, Apache Oozie
Programming Languages Python, SQL, Scala, R, Shell Scripting Data Integration & ETL Azure Data Factory, Azure Databricks, Apache Sqoop, Talend Integration Suite Machine Learning & AI Azure Machine Learning, Python for ML, Spark MLlib Database Technologies SQL Server, MySQL, Snowflake, NoSQL databases (HBase, MongoDB, Myria DB, Oracle)
Reporting& Visualization Power BI, Tableau, Business Objects Version Control & CI/CD Git, Jenkins, Azure DevOps Operating Systems Windows, Linux (Ubuntu, CentOS)
AWS Cloud Services
AWS EMR, AWS Redshift, AWS Glue, AWS Lambda, AWS S3, AWS DynamoDB, AWS RDS, AWS Kinesis
Apache Spark Ecosystem Spark Streaming, Spark SQL, Spark MLlib, Spark GraphX, Spark Structured Streaming
Education
Bachelors in Computer Science from SRM, India 2016 Working Experience
Grainger, Lake Forest, IL Aug 2023 to Present
Lead Azure Data Engineer
Project Description:
The project involves developing and maintaining a comprehensive data platform to support Grainger's supply chain optimization and inventory management.
Responsibilities:
● Developing a data ingestion pipeline using Azure Data Factory to process and transform large datasets from various sources into Azure Data Lake Storage.
● Designing and implementing data models in Azure Synapse Analytics to support advanced analytics and reporting requirements.
● Integrating Azure Databricks for real-time data processing and machine learning model deployment to optimize supply chain operations.
● Ensuring data security and compliance by implementing Azure Key Vault for managing sensitive information and credentials.
● Automating data workflows and orchestrations using Azure Logic Apps and Azure Automation.
● Collaborating with cross-functional teams to define data requirements, design data solutions, and ensure data integrity.
● Developing and optimizing complex SQL queries and stored procedures in Azure SQL Database to support data analysis and reporting.
● Implementing CI/CD pipelines using Azure DevOps to streamline deployment processes and ensure code quality.
● Monitoring and troubleshooting data pipelines, ensuring high availability and performance of the data platform.
● Creating and maintaining comprehensive documentation for data architecture, data flows, and technical specifications.
● Conducting regular data quality assessments and implementing measures to ensure data accuracy and consistency.
● Providing technical leadership and mentoring junior data engineers, fostering a culture of continuous learning and improvement.
● Engaging with stakeholders to gather requirements, provide project updates, and ensure alignment with business objectives.
● Utilizing Azure Machine Learning for predictive analytics to forecast demand and optimize inventory levels.
● Implementing Azure Monitor and Application Insights for proactive monitoring and alerting to ensure system reliability and performance.
Environments: Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Databricks, Azure Key Vault, Azure Logic Apps, Azure Automation, Azure SQL Database, Azure DevOps, Azure Machine Learning, Azure Monitor, Application Insights, SQL, Python, Scala, Power BI, Git. CVS Pharmacy, Woonsocket, RI May 2022 to Aug 2023
Role: Azure Data Engineer
Project Description:
Developed a comprehensive data analytics solution to improve CVS Pharmacy's inventory management and sales forecasting.
Responsibilities:
• Led the requirements gathering, analysis, and effort estimation for development and testing phases.
• Architected and implemented scalable data solutions using Azure Data Factory, Azure Databricks, and Azure Synapse Analytics.
• Developed and optimized data pipelines to process large volumes of data efficiently.
• Designed and implemented data ingestion frameworks using Azure Data Factory and Azure Databricks.
• Integrated various data sources, including on-premises databases and cloud-based services, into Azure Data Lake.
• Built and maintained ETL workflows to extract, transform, and load data into Azure SQL Data Warehouse.
• Developed machine learning models using Azure Machine Learning and Python to predict inventory levels and customer demand.
• Conducted performance tuning and optimization of data processing jobs in Azure Databricks.
• Implemented data quality checks and validation processes to ensure data accuracy and integrity.
• Utilized Azure Analysis Services for building and deploying analytical models.
• Created interactive dashboards and reports using Power BI to provide actionable insights to stakeholders.
• Implemented CI/CD pipelines using Azure DevOps for continuous integration and deployment of data solutions.
• Managed Azure resources, including Databricks clusters, Data Lake storage, and Synapse Analytics, ensuring optimal performance and cost-efficiency.
• Collaborated with cross-functional teams to define data requirements and deliver business-critical solutions.
• Provided technical guidance and mentorship to junior team members, fostering a culture of continuous improvement and learning.
Environments: Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake, Azure Machine Learning, Power BI, Azure DevOps, Python, SQL, Azure Analysis Services. BestBuy, Minneapolis, MN June 2021 to May 2022
AWS Data Engineer
Project Description:
The BestBuy Customer Data Platform (CDP) is designed to centralize customer data, enhance customer segmentation, and provide personalized marketing strategies. This project aims to improve customer engagement and drive sales through data-driven insights.
Responsibilities:
• Lead the design and implementation of AWS-based data architecture to support data ingestion, transformation, and storage.
• Developed ETL processes using AWS Glue and Apache Spark for data transformation and load into Amazon Redshift.
• Designed and implemented data ingestion pipelines using AWS Lambda, S3, and Kinesis for real-time data processing.
• Optimized Amazon Redshift queries and table design to improve query performance and reduce processing time.
• Managed data security and compliance using AWS IAM roles, policies, and AWS KMS for data encryption.
• Implemented data quality checks and validation using AWS Glue and AWS DataBrew.
• Automated data workflows and job scheduling using AWS Step Functions and AWS CloudWatch Events.
• Coordinated with data scientists and analysts to understand data requirements and provide necessary data transformations.
• Monitored and troubleshot data pipelines using AWS CloudWatch, ensuring high availability and reliability.
• Conducted performance tuning and optimization of Spark jobs and Redshift queries.
• Provided support and troubleshooting for data ingestion and transformation issues.
• Worked on data migration from on-premises databases to AWS Redshift using AWS DMS.
• Participated in designing and developing RESTful APIs for data access and manipulation.
• Developed scripts for data extraction and transformation using Python and PySpark.
• Implemented logging and monitoring solutions using AWS CloudWatch, CloudTrail, and SNS for alerting. Environments: AWS Glue, AWS Lambda, Amazon S3, Amazon Kinesis, Apache Kafka, Amazon Redshift, AWS IAM, AWS KMS, AWS DataBrew, AWS Step Functions, AWS CloudWatch, AWS DMS, Python, PySpark.
Carvana, Tempe, AZ April 2019 to June 2021
Data Engineer
Project Description:
The project involves creating a comprehensive data pipeline to improve the customer experience and operational efficiency. Developing an end-to-end data processing system to handle large volumes of vehicle and customer data.
Responsibilities:
• Played a lead role in gathering requirements and conducting system analysis to provide accurate development and testing effort estimations.
• Designed and implemented ETL processes to extract, transform, and load data from multiple sources into a centralized data warehouse.
• Developed Spark applications using Scala to process large datasets efficiently and perform data transformations.
• Created and maintained data pipelines using Apache Airflow to automate data ingestion and processing tasks.
• Implemented real-time data streaming solutions using Apache Kafka to ensure timely and reliable data delivery.Optimized SQL queries and Spark jobs to improve performance and reduce processing time.
• Built data models and schema designs to support various analytical and reporting requirements.
• Collaborated with data scientists to deploy machine learning models into production environments.
• Monitored system performance and data quality, implementing alerts and dashboards using tools like Grafana and Prometheus.
• Developed and maintained data integration workflows using Apache NiFi.
• Ensured data security and compliance with industry standards by implementing appropriate access controls and data encryption.
• Documented technical specifications, data flow diagrams, and standard operating procedures for data engineering processes.
• Provided training and support to junior data engineers, fostering a collaborative team environment.
• Participated in code reviews and followed best practices for version control using Git.
• Coordinated with cross-functional teams to ensure seamless integration of data solutions with existing systems and applications.
Environments: Apache Spark, Scala, Apache Kafka, Apache Airflow, Apache NiFi, SQL, PostgreSQL, Grafana, Prometheus, Git, Red Hat Enterprise Linux, Hadoop, HDFS, AWS S3. CES Technologies, Hyderabad, India. July 2016 to April 2019 Hadoop Developer
Project Description:
Real-Time Customer Insights Platform Designed and implemented a real-time customer insights platform leveraging Apache Hadoop and Apache Kafka for streaming analytics, enabling dynamic customer segmentation and personalized marketing campaigns.
Responsibilities:
• Gathered and analyzed business requirements, translating them into technical specifications
• Developed custom MapReduce programs for real-time data processing and analysis.
• Implemented Sqoop jobs with incremental load for data ingestion into Hadoop from multiple sources.
• Utilized PySpark for developing and optimizing ETL workflows on a distributed Hadoop cluster.
• Designed and optimized Hive queries for efficient data retrieval and storage.
• Managed metadata integration into Hive and migrated existing applications to Hive environment.
• Created ETL test scripts and guided testing teams during execution.
• Conducted performance tuning of ETL processes to enhance data processing efficiency.
• Provided technical leadership to the development team for PySpark-based ETL solutions.
• Optimized PySpark jobs for Kubernetes deployment, improving data processing speed.
• Implemented data partitioning and bucketing strategies in Hive for optimized query performance.
• Managed and processed data from diverse sources including relational databases using Sqoop and Flume.
• Installed and configured Apache Hadoop, Hive, and other components in the Hadoop ecosystem.
• Developed and maintained HBase tables to store semi-structured data for fast retrieval
• Integrated Apache Kafka with Hadoop to streamline real-time data streaming and processing
• Monitored Hadoop cluster performance and ensured high availability of the system
• Created automated workflows using Oozie to manage complex job dependencies
• Collaborated with data scientists to implement machine learning models on the Hadoop platform
• Implemented security protocols and best practices to ensure data integrity and confidentiality Environments: Apache Hadoop, Apache Kafka, Apache Hive, PySpark, Sqoop, Flume, Kubernetes.