Data Scientist

Location:

Great Hills, TX, 78759

Salary:

$110000k

Posted:

March 01, 2021

Contact this candidate

Resume:

Name: Kowshar Ahmed

Email ID: **************@*****.***

Cell No: 972-***-****

Current Location: Dallas, TX (Open to Relocate)

PROFESSIONAL SUMMARY

•Around 6+ years of IT experience with 4+ complete comprehensive industry experience in Machine Learning, Statistical Modelling, Deep Learning, Data Analytics, Data Modelling, Data Analysis, Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence.

•Having good experience in Analytics Models like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL, Erwin.

•Experienced in Data Modeling techniques employing Data Warehousing concepts like star/snowflake schema and Extended Star.

•Expertise in applying Data Mining techniques and optimization techniques in B2B and B2C industries.

•Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.

•Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.

•Expertise in Data Migration, Data Profiling, Data Wrangling, Data Cleansing, Transformation, Integration, Data Analysis, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Center.

•Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.

•Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech recognition using Tensor Flow.

•Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modelling, physical and logical Data Modeling using Erwin tool.

•Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.

•Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.

•Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, MapReduce, HiveQL, SparkSQL, PySpark.

•Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

•Proficient in data visualization tools such as Tableau, Python Seaborn, Matplotlib, Plotly, ggplot, R Shiny, Plotly, ggplot2 to create visually powerful and actionable interactive reports and dashboards.

•Expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau(9.x/10.x) and Power BI.

•Experienced in Agile methodology and SCRUM process.

•Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

•Proficient in Python and Database SQL, experience building, and product ionizing end-to-end systems.

•Solid coding and engineering skills preferably in Machine Learning.

•Be a valued contributor in shaping the future of our products and services.

TECHNICAL SKILLS

Databases

MySQL, PostgreSQL, Snowflake, Oracle, HBase, Amazon Redshift, MS SQL Server 2016/2014/2012/2008 Rss2/2008, Teradata

Statistical Methods

Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning

Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method

Hadoop Ecosystem

Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Sqoop, Flume

Reporting Tools

Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online, Server Reporting Services(SSRS), PowerBI

Languages

Python (2.x/3.x), R, Java, SAS, SQL, T-SQL

Operating Systems

PowerShell, UNIX/UNIX Shell Scripting, Linux and Windows

Data Analytics Tools

Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot).

Data Visualization

Visualization packages, Seaborn, Matplotlib, ggplot, ggplot2, Plotly, R Shiny, Microsoft Office(MS Excel, MS Word, MS PowerPoint).

R Package

dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sort of Machine Learning Packages.

PROFESSIONAL EXPERIENCE:

Role: Data Scientist

Client: 7-Eleven Inc, Irving, TX Aug 2019 to Present

Responsibilities:

•Built data pipelines for reporting, alerting and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DBand Kafka.

•Design, Develop and implement Comprehensive Datawarehouse Solution to extract, clean, transfer and load manage quality/accuracy of data from various sources to EDW Enterprise Data Warehouse.

•Worked on different data formats such as JSON, XML and performed Machine Learning algorithms in Python.

•Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various Machine Learning algorithms.

•Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine Learning applications, executed Machine Learning use cases under Spark ML and Mllib.

•Handled importing data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.

•Worked with the product teams, I created dashboards (Dash plotly) to track the current status of the products.

•Used AWS S3, Dynamo DB, AWS lambda, AWS EC2 for data storage and models & deployment.

•Design and implementation of unstructured/structured data post-processing, visualization and delivery algorithms deployed to production on AWS platform in distributed fashion or as Lambda service.

•Designed and developed NLP models for sentiment analysis.

•Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud infrastructure, AWS, EMR and S3.

•Transforming staging area data into a STAR schema (hosted on Amazon Redshift) which was then used for developing embedded Tableau dashboards

•Worked on Machine Learning on large size data using Spark and Map Reduce.

•Created a Handler function in Python using AWS that can invoke when the service is executed.

•Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.

•Proficiency in SQL across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, Teradata and Oracle)

•Worked on Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport.

•Application of various Machine Learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using the scikit-learn package in python, Matlab.

•Create and build Docker images for prototype deep learning models running on local GPU.

•Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.

•Created various types of data visualizations using Python and Tableau.

•Worked on Amazon Web Services cloud services to do machine learning on big data and using lambda function.

•Created Hive architecture used for real time monitoring and HBase used for reporting and worked for map reduce and query optimization for Hadoop hive and HBase architecture.

•Using Amazon SageMaker created platform for deploying machine learning models into a production environment on AWS.

•Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured source sand designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.

Environment: R, SQL, Python 2.7.x, SQL Server 2014, regression, logistic regression, randomforest, neural networks, Topic Modeling, NLTK, SVM (Support Vector Machine), JSON, XML,HIVE,PIG, Sklearn, SciPy, GraphLab, No SQL, SAS, SPSS, Spark, Hadoop, Kafka, Erwin9.6.4, Oracle 12c, Apache PySpark, Jupyter Notes, Spark MLLib, Tableau, ODS,PL/SQL, OLAP, OLTP, AWS, Lambda, Amazon Machine Learning, AWS S3.

Role: Data Scientist/Machine learning Engineer

Client: Avail MedSystems, Palo Alto, CA Jan 2018 to Jun 2019

Responsibilities:

•Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, R a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.

•Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.

•Designed and built R and Shiny applications for the clinical operations team.

•Performing statistical analysis on textual data. Building Machine learning/ Deep Learning models in the domain of Natural Language.

•Developed micro-app web visualization using R-Shiny and D3.js with active filters.

•Developed scripts and small program utilities to help during proof of concepts and implementation phases of Snowflake Cloud and NIFI real time stream data pipeline.

•Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.

•Create and build Dockers images for prototype deep learning models running on local GPU.

•Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

•Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forest, K-means clustering, KNN, PCA and regularization for Data Analysis.

•Performed Data Collection, Data Cleaning, Data Visualization and developing Machine Learning

Algorithms by using several packages: NumPy, Pandas, Scikit-learn and Matplotlib.

•Implemented various data pre-processing techniques to manipulate the unstructured, structured data

and imbalanced data like SMOTE.

•Clustered customers' actions data by using K-means clustering and Hierarchical clustering, then

segmented them into different groups for further analyses.

•Built Support Vector Machine algorithms for detecting the fraud and dishonest behaviors of customers

by using several packages: Scikit-learn, NumPy, Pandas in Python.

•Designed and developed NLP models for sentiment analysis.

•Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, PowerBI, Microstrategy.

•Developed and evangelized best practices for statistical analysis of Big Data.

•Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.

•Developed deep learning algorithm that generated hedging strategies providing 15% ROI per month with a standard deviation of 2.7%(results based on testing strategies on real data for 3 months)

•Designed the Enterprise Conceptual, Logical, and Physical Data Model for ‘Bulk Data StorageSystem ‘using Embarcadero ER Studio, the data models were designed in 3NF.

•Worked on machine learning on large size data using Spark and MapReduce.

•Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

•Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.

•Explored and analyzed the customer specific features by using SparkSQL.

•Performed Standardization, Transformations, Normalization, data imputation using Scikit-learn package in Python.

•Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.

•Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.

•Developed Spark/Scala, SAS and R programs for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.

•Conducted analysis on assessing customer consuming behaviours and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.

•Implement deep learning algorithms to identify fraudulent transactions

•Built regression models include: Lasso, Ridge, SVR and XGboost to predict Customer Life Time Value.

•Built classification models include: Logistic Regression, SVM, Decision Tree, RandomForest to predict Customer Churn Rate.

•Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.

Environment: AWS RedShift, EC2, EMR, Hadoop Framework, S3,HDFS, Spark(Pyspark, MLlib, Spark SQL), Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn),Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble),Deep Learning, Teradata, Git 2.x, Agile/SCRUM

Role: Data Scientist

Client: GAP, San Francisco, CA Mar 2017 to Dec 2017

Responsibilities:

•Tackled highly imbalanced Fraud dataset using under sampling, oversampling with SMOTE and cost sensitive algorithms with Python Scikit-learn.

•Wrote complex Spark SQL queries for data analysis to meet business requirement.

•Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop on AWS.

•Building Optimization models using Machine Learning, Deep Learning algorithms.

•Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing for supervised and unsupervised algorithms.

•Improved fraud prediction performance by using logistic regression, random forest and gradient boosting with Python Scikit-learn.

•Performed feature engineering, performed NLP by using some techniques like Word2Vec, BOW (Bag of Words), Tf-Idf, Doc2Vec.

•Performed Naïve Bayes, KNN, Logistic Regression, Random Forest, SVM and XGboost to identify whether a loan will default or not.

•Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss.

•Used various Metrics (RMSE, MAE, F-Score, ROC, AUC, accuracy, precision, sensitivity and specificity) to evaluate the performance of each model.

•Performed data cleaning, data featuring, Machine Learning algorithms using MLlib package in PySpark and working with deep learning frameworks.

•Actively involved in all phases of data science project life cycle including Data Extraction, Data Cleaning, Data Visualization and building Models.

•Experience in working with languages Python and R.

•Developed text mining models using Tensor Flow & NLP (NLTK, SpaCy and CoreNLP) on call transactions & social media interaction data for existing customer management.

•Experienced in Agile methodology and SCRUM process.

•Experience in Extract, Transfer and Load process using ETL tools like Data Stage, Data Integrator and SSIS for Data migration and Data Warehousing projects.

•Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.

•Used big data tools Spark (Pyspark, SparkSQL and MLlib) to conduct Realtime analysis of loan default based on AWS.

Environment: MS SQL Server 2014, Teradata, ETL, SSIS, Alteryx, Tableau (Desktop 9.x/Server 9.x), Python3.x(Scikit-Learn/Scipy/NumPy/Pandas), Machine Learning (Naïve Bayes, KNN, Regressions, Random Forest, SVM, XGboost, Ensemble), AWS Redshift, Deep Learning, Spark(PySpark, MLlib, Spark SQL), Hadoop 2.x, Map Reduce, HDFS, SharePoint.

AIRLIQUIDE, Radnor, PA

Associate Data Scientist Sep 2016 to Feb 2017

Responsibilities:

•Building Predictive models in Python.

•Designing and conducting A/B testing experiments.

•Devising pricing strategies and building pricing algorithms.

•Data visualization using python packages like matplotlib, seaborn, bokeh, altair.

•Understanding data model, creating effective data pipelines.

•Used SQL (MySQL, Postgres SQL).

•Deep-dive data analysis and reporting to the business team.

•Communicating results with technical and businesspeople.

BHHC, San Francisco, CA

ETL/Informatica Developer Jul 2014 to Mar 2016

Responsibilities:

•Worked on complete software development life cycle (SDLC) starting from requirement analysis, design, development, coding, testing, debugging and implementation to support of ETL processes.

•Extensively worked on, from Data Extraction to Building Stage, Data warehouse, Data Mart including loading Dimension and Facts table.

•Work closely with Managers, Project Managers, Technical Product Managers, clients, subject-matter experts, and data modelers to obtain requirements, objectives, and business rules for projects.

•Involved in Star Schema Data Modelling design.

•Involved in Installation and Configuration of all Components of Informatica and SQL developer.

•Created complex and heavy ETL coding to moving data from various source system (flat file, spreadsheet, IBM open Pages) into Oracle database.

•Implemented CDC (Change Data Capture) to update the dimensional schema to identify, capture and deliver changes made to enterprise data sources in real-time.

•Implemented effective date range mapping (Slowly Changing dimension type2) methodology for accessing the full history of accounts and transaction information.

Education:

MBA Management information System, Lincoln university California 2016

Contact this candidate