Resume

Senior Data Scientist/Data Engineer

Location:

Seattle, WA, 98125

Posted:

March 18, 2024

Contact this candidate

Resume:

ABHISHEK PRAHARAJ

Email: ad4et0@r.postjobfree.com

PH:509-***-****

PROFESSIONAL SUMMARY: -

A highly accomplished Data Scientist and Engineer with extensive experience spanning over 11 years, specializing in the strategic implementation of machine learning and data engineering technologies to drive business transformation across Retail, Finance, Insurance, and Energy sectors. Renowned for expertise in designing and executing scalable data pipelines, predictive analytics, and real-time decision-making systems using advanced cloud technologies such as AWS, Azure, and GCP. I am proficient in a broad tech stack including Python, Spark, TensorFlow, PyTorch, Databricks, Kafka, and Hadoop, coupled with deep knowledge in statistical modeling, NLP, and AI. Demonstrates exceptional leadership in spearheading cross-functional teams, guiding project lifecycle from conceptualization to deployment, and optimizing operations to significantly enhance performance metrics. Equipped with a robust analytical skill set and innovative approach, adept at uncovering insights from complex datasets, improving risk management, customer engagement, and operational efficiency. Committed to fostering a culture of continuous learning and innovation, leading by example to mentor emerging talent and advancing the adoption of cutting-edge data science methodologies within leading organizations, thereby ensuring sustained competitive advantage and generating substantial business value.

TECHNICAL SKILLS: -

Analytic Development : Python, R, Spark, SQL, Scala, Java, Bash, PySpark

Python Packages : Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Fastai, SciPy, Matplotlib, Seaborn, Numba

Programming Tools : Jupyter, RStudio, Github, Git

Cloud Computing : Amazon Web Services (AWS), Azure, Google Cloud Platform (GCP), DataBricks, Snowflake

Hadoop Components: Hive, Pig, Zookeeper, Sqoop, Oozie, Yarn, Maven, Flume, HDFS, Airflow

Hadoop Administration: Zookeeper, Oozie, Cloudera Manager, Ambari, Yarn

Data Management: Snowflake, Apache Cassandra, AWS Redshift, Amazon RDS, Apache HBase, SQL, NoSQL, Elasticsearch, HDFS, Data Lake, Data Warehouse, Database, Teradata, SQL Server

The architecture of ETL Data Pipelines: Apache Airflow, Hive, Sqoop, Flume, Scala, Python, Flume, Apache Kafka, Logstash

Scripting: HiveQL, SQL, Shell Script Language

Machine Learning : Natural Language Processing & Understanding, Machine Intelligence, Machine Learning algorithms

Analysis Methods : Forecasting, Predictive, Statistical, Sentiment, Exploratory and Bayesian Analysis. Regression Analysis, Linear models, Multivariate analysis, Sampling methods, Clustering

Data Science : Natural Language Processing, Machine Learning, Social Analytics, Predictive Maintenance, Chatbots, Interactive Dashboards.

Artificial Intelligence : Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Naïve Bayes

Natural Language Processing : Text analysis, classification, chatbots.

Deep Learning : Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Data Modeling : Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series analysis

EXPERIENCE: -

Lead Data Scientist / Data Engineer at Nordstrom Inc., Seattle, Washington (Apr 2022 - Present)

• Spearheaded the integration of machine learning and data engineering practices, significantly enhancing data processing and analytics capabilities on GCP.

• Designed and implemented scalable data pipelines using BigQuery, Dataflow, and Apache Beam, processing multi-terabyte datasets for real-time analytics.

• Developed predictive models using TensorFlow and PyTorch for customer behavior analysis, increasing marketing campaign effectiveness by 25%.

• Optimized predictive models for inventory management, resulting in a 20% reduction in overstock and stockouts through advanced feature engineering techniques.

• Led cross-functional teams through the agile development lifecycle, from ideation to deployment, enhancing productivity by 30%.

• Integrated real-time data streams with machine learning algorithms for dynamic pricing strategies, improving revenue by 15%.

• Established CI/CD pipelines for seamless deployment of data applications and machine learning models in GCP and AWS environments.

• Implemented advanced analytics and NLP techniques for customer sentiment analysis, enhancing customer service and feedback mechanisms.

• Employed TensorFlow Serving and Flask for deploying machine learning models as microservices, enabling scalable, real-time predictive analytics.

• Facilitated the transition from batch to real-time data processing using Apache Kafka and Spark Streaming, reducing data latency by over 50%.

• Architected a robust data governance and security framework, ensuring compliance with GDPR and CCPA regulations.

• Conducted comprehensive data quality and integrity audits, implementing automated correction processes that improved data accuracy by over 90%.

• Collaborated with stakeholders to translate business requirements into technical solutions, delivering custom data analytics reports and dashboards.

• Championed the use of A/B testing frameworks for evaluating model performance, resulting in a 10% increase in model accuracy.

• Enhanced team knowledge and skillsets through regular training sessions on the latest data science and engineering technologies and methodologies.

Senior Data Scientist at American Family Insurance Group, San Jose, California (May 2020 - Apr 2022)

• Directed the design and execution of machine learning projects, focusing on predictive modeling for insurance risk assessment, reducing errors by 15%.

• Orchestrated the development and maintenance of scalable ETL pipelines in Azure, facilitating efficient data manipulation and analysis.

• Applied advanced statistical models and machine learning algorithms for analyzing large datasets, improving claim processing speed by 20%.

• Leveraged Azure Databricks for collaborative data exploration and model development, enhancing team productivity and model accuracy.

• Implemented machine learning models for personalized insurance product recommendations, increasing cross-sell and up-sell rates by 18%.

• Utilized Azure ML and TensorFlow to automate fraud detection, decreasing fraudulent claims by 25% and saving millions in potential losses.

• Developed and deployed APIs for real-time access to predictive models, streamlining the integration with business applications.

• Established a data lake on Azure, consolidating disparate data sources for a holistic view of customer data, enhancing data accessibility by 40%.

• Automated the data cleansing and preprocessing tasks using PySpark and Databricks notebooks, reducing data preparation time by 50%.

• Created a model monitoring and management framework in Azure, ensuring models remained accurate and effective over time.

• Led a team in the application of NLP techniques for extracting insights from customer feedback, improving customer satisfaction scores by 10%.

• Conducted ongoing training sessions for the data science team on best practices in data engineering and machine learning.

• Engaged in strategic planning with senior management to align data science projects with business goals, ensuring a 20% increase in project ROI.

• Pioneered the use of deep learning for image recognition in claim processing, streamlining the assessment of damage and expediting claims.

• Fostered a culture of innovation, encouraging the team to explore new data science technologies and methodologies, resulting in two patented solutions.

Data Scientist / Engineer at Bank of America, Chicago, Illinois (Oct 2018 - May 2020)

• Led the creation of a sophisticated fraud detection system using machine learning, significantly reducing fraudulent transactions by 30%.

• Managed and optimized Hadoop and Spark clusters for efficient big data processing, improving data analysis speed by 40%.

• Integrated Databricks for streamlined data analytics and machine learning model development, boosting team efficiency by 25%.

• Designed real-time analytics dashboards using PowerBI, providing executives with instant access to financial insights and trends.

• Implemented machine learning algorithms for credit risk modeling, enhancing loan approval processes and reducing default rates by 17%.

• Coordinated with IT and compliance departments to ensure data processing workflows adhered to regulatory standards, passing all audits without discrepancies.

• Utilized advanced NLP techniques for sentiment analysis on customer feedback across social media platforms, improving customer engagement strategies.

• Developed and deployed scalable microservices architecture for serving machine learning models, facilitating easy integration with banking applications.

• Architected a solution for real-time transaction monitoring using Kafka and Spark Streaming, drastically reducing the time to detect and respond to fraudulent activities.

• Established robust data encryption and anonymization processes for sensitive financial data, ensuring adherence to data privacy laws.

• Enhanced model performance through the implementation of ensemble learning techniques, leading to more accurate predictions of financial trends.

• Spearheaded the migration of data analytics platforms to the cloud, achieving greater scalability and a 30% reduction in operational costs.

• Collaborated with the cybersecurity team to employ machine learning in detecting and mitigating cyber threats, reducing potential security breaches by 40%.

• Conducted detailed exploratory data analysis (EDA) to uncover hidden patterns and anomalies in financial transactions, aiding in the development of targeted fraud prevention strategies.

Big Data Engineer

Anywhere Real Estate Inc., Madison, New Jersey, Aug’16-Oct’18

•Automated cloud deployments seamlessly through the creation and management of AWS CloudFormation templates

•Proficiently created and managed AWS CloudFormation templates to enable automated cloud deployments, ensuring consistency and efficiency

•EMR Cluster Management: Assumed responsibility for continuous monitoring and efficient management of EMR clusters using the AWS console, optimizing data processing capabilities

•Utilized the Amazon AWS IAM console to create custom users and groups, ensuring precise control over permissions and access to AWS resources

•Played a pivotal role in designing serverless solutions, leveraging AWS API Gateway, Lambda, S3, and DynamoDB to optimize performance and scalability while reducing costs

•Successfully executed a migration from on-premises RDBMS to AWS RDS, concurrently implementing Lambda functions to process and store data efficiently in S3 using Python

•Introduced support for Amazon AWS S3 and RDS to host static and media files while housing the database within the Amazon Cloud environment for a streamlined architecture

•Proficiently configured and ensured secure connections to the AWS RDS database running on MySQL, facilitating data accessibility and integrity

•Managed policies for AWS S3 buckets and Glacier, enhancing data storage, backup, and retrieval within the AWS ecosystem

•Actively employed AWS CloudFormation to ensure consistent and automated infrastructure provisioning on AWS, enhancing the infrastructure management process

•Demonstrated expertise in installing and utilizing the AWS Command Line Interface (CLI) for efficient control of various AWS services via Shell/Bash scripting, streamlining administrative tasks

•Conducted data processing operations within AWS Glue (Scala and PySpark), efficiently extracting data from MySQL data stores and loading it into RedShift Data Warehousing for in-depth analysis

•Proficiently handled the deployment of code to AWS Code Commit using Git commands such as pull, fetch, push, and commit from the AWS CLI, ensuring version control and collaboration

•Leveraged custom Amazon Machine Images (AMIs) to automate server builds during periods of peak demand, ensuring seamless auto-scaling capabilities for applications

•Deployed applications in AWS using Elastic Beanstalk for simplified management and scalability, enabling efficient application deployment and scaling

Big Data Engineer

DaVita Inc., Denver, Oct’14-Aug’16

•Conducted performance optimizations for HIVE analytics and SQL queries, which included the creation of tables, views, and the implementation of Hive-based exception processing to enhance query efficiency

•Developed numerous internal and external tables within the HIVE framework, addressing specific data storage and retrieval requirements

•Loaded extensive volumes of log data from diverse sources directly into HDFS using Apache Flume for streamlined data ingestion

•Employed advanced techniques such as partitioning and bucketing within Hive tables, ensuring organized data management and daily data aggregation.

•Executed the migration of ETL jobs to Pig scripts, facilitating complex data transformations, join operations, and pre-aggregations before storing data in HDFS.

•Leveraged multiple Java-based MapReduce programs for efficient data extraction, transformation, and aggregation, accommodating various file formats and processing needs

•Implemented the creation of Hive tables with dynamic partitions and buckets, enabling efficient data sampling and enhancing data manipulation through Hive Query Language (Hive QL)

•Prepared, transformed, and analyzed large volumes of structured, semi-structured, and unstructured data using Hive queries and Pig scripts to extract valuable insights

•Developed UNIX shell scripts and Python scripts for generating reports based on data extracted from Hive, enhancing data reporting capabilities

•Managed the creation of both internal and external tables while utilizing Hive Data Definition Language (DDL) commands to create, alter, and drop tables as needed

•Exploring Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, spark Paired RDD, Spark YARN.

•Used Google Cloud Composer to build and deploy data pipelines as DAGs using Apache Airflow

•Contributed to data cleansing by writing Pig Scripts and implementing Hive tables to store processed data in a tabular format, enabling more accurate analysis

•Spearheaded the setup of the Quality Assurance (QA) environment and updated configurations to effectively implement data processing scripts with Pig

•Played a key role in the development of Pig Scripts for change data capture and delta record processing, allowing the seamless integration of newly arrived data with existing data in HDFS

•Employed Hive for the creation of tables, data loading, and the execution of Hive Queries, which internally triggered MapReduce jobs to process and analyze the data

•Prepared data for in-depth analysis by executing Hive queries (HiveQL), Impala, and Pig Latin scripts, providing valuable insights into customer behavior and data patterns

Data Engineer

Murphy USA., El Dorado, Arkansas, Jun’12-Oct’14

•Engineered ETL pipelines using Microsoft SQL Server Integration Services (SSIS) for seamless data integration and processing.

•Implemented efficient data workflows to extract, transform, and load data from diverse sources into a unified data warehouse

•Designed and maintained database structures using Microsoft SQL Server, ensuring optimal performance and data integrity

•Developed and implemented data models to support business requirements and enhance data storage efficiency

•Implemented data quality checks and validation processes to ensure accuracy and consistency of data

•Established data governance practices to maintain high-quality and standardized data across the organization

•Leveraged big data technologies such as Apache Hadoop for processing and analyzing large datasets

•Implemented solutions for scalable storage and processing of big data to support analytical insights

•Worked on the development and maintenance of data warehousing solutions, utilizing technologies like Microsoft SQL Server Data Warehouse

•Optimized data warehouse performance and scalability to meet evolving business needs

•Collaborated with business intelligence teams to define data requirements and implement analytics solutions

•Developed and maintained reporting systems to provide stakeholders with actionable insights

•Utilized scripting languages such as Python and PowerShell for automation of data-related tasks

•Implemented efficient data processing scripts to improve workflow automation

•Collaborated with cross-functional teams to understand data needs and requirements

•Worked with a technology stack that includes Microsoft SQL Server, SSIS, Apache Hadoop, Python, PowerShell, and other relevant data engineering tools

•Stayed abreast of emerging trends and technologies in the data engineering domain, incorporating innovative solutions into existing processes

•Documented data engineering processes, workflows, and best practices

•Conducted knowledge transfer sessions to share expertise with team members.

EDUCATION: -

•Master of Science in Data Science from New Jersey Institute of Technology

•B. Tech. in Computer Science and Engineering from Vellore Institute of Technology, India

CERTIFICATIONS: -

•Natural Language Processing (NLP) in Python - Udemy

•Cluster Analysis and Unsupervised Machine Learning in Python – Udemy

•AWS Cloud Practitioner – Udemy

•GCP Associate Cloud Engineer - Google Cloud Certification - Udemy

Contact this candidate