Data Python

Location:

Vancouver, BC, Canada

Salary:

80000

Posted:

November 20, 2020

Contact this candidate

Resume:

PRAVEEN

DATA SCIENTIST

****************@*****.***

+1-587-***-****

PROFESSIONAL SUMMARY:

5+ years of IT industry experience as a Data Scientist with specialized in implementing advanced Machine Learning, Deep Learning and Natural Language Processing algorithms upon data from diverse domains.

Experience in building highly efficient models to derive actionable insights for business environments leveraging exploratory data analysis, feature engineering, statistical modelling and predictive analytics.

Experiences in machine learning, data mining, structured and un-structured data analysis, and image data analysis, including feature extraction, pattern recognition, algorithm development, text mining, computer simulation, data modelling, databases design, model evaluation and deployment.

Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.

Proficient in Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.

Strong practical understanding of statistical modelling and supervised/unsupervised/reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics world.

Good familiarity in entire Data Science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment.

Experience in problem solving, algorithmic development, data science, machine learning, statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics; with extensive experience in predictive analytics and recommendation.

Extensive experience in various phases of software development like analysing, gathering and designing the data with expertise in documenting.

Hands on experience on clustering algorithms like K-means & Medoids clustering and Predictive and Descriptive algorithms.

Expertise in Model Development, Data Mining, Predictive Modelling, Descriptive modelling Data Visualization, Data Clearing and Management, and Database Management.

Good Knowledge of Apache Hadoop technologies like Pig, Hive, Scoop, Spark, Flume and HBase.

Experience using machine learning models such as random forest, KNN, SVM, logistic regressions and used packages such as ggplot, dpylr, lm, rpart, randomForest, nnet, tree, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), numpy, sci-kit learn, pandas, etc., in R, SAS and Python respectively.

Established classification and forecast models, automate processes, text mining, sentiment analysis, statistical models, risk analysis, platform integrations, optimization models, models to increase user experience, A/B testing using R, SAS, Python, SPSS, SAS E-miner, E-Views, tableau, etc.

Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiments), machine learning techniques, algorithms, data structures and data infrastructure.

Extensive hands-on experience and high proficiency with structured, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools.

Experienced in designing star schema (identification of facts, measures and dimensions), Snowflake schema for Data Warehouse, ODS Architecture by using tools like Erwin Data Modeler, Power Designer, E-R Studio and Microsoft Visio.

Having continuous learning approach in Elastic Search engine Lucene/Index based search, Kibana and other new tools.

Prowess in Excel Macros, Pivot Tables, VLOOKUP and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.

Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as MLOAD, TPump, Fast load and Fast Export.

Experience working with Enterprise Fraud management team developing new rules to detect Internal Fraud activity and enhancing the old Fraud detection rules.

In-depth experience in R, Python, Spark, PySpark, SQL, MongoDB, Scikit Learn, Hadoop, Amazon AWS, Microsoft Azure, REST APIs, Unix, LINUX, GIT, R Shiny & Shiny Dashboard.

Strong skills in Statistics Methodologies such as Hypothesis Testing, Principle Component Analysis (PCA), Correspondence Analysis.

Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K-Means Clustering and Association Rules.

Experience with Amazon Web Services (AWS) in planning, designing, implementing and maintaining system applications in AWS Cloud in Windows and Linux Environments.

Experienced in developing complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.

Proficient in Big Data, Hadoop, Hive, MapReduce, Pig and NoSQL databases like MongoDB, HBase, Cassandra.

Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server, DB2, PostgreSQL, Netezza and Teradata.

Strong experience in maintenance of PostgreSQL, Oracle, Big Data databases and updating the versions.

Experienced in statistical analysis using R, SPSS, Matlab and Excel.

Highly motivated team player with excellent Interpersonal and Customer Relational Skills, Proven Communication, Organizational, Analytical, Presentation Skills, and Leadership Qualities. TECHNICAL SKILLS:

Programming & Scripting

languages

R (Packages: Stats, Zoo, Matrix, data, table, OpenSSL), Python, SQL, C, C++, JAVA, JCL, COBOL, HTML, CSS, JSP, Java Script, Scala

Database SQL, MySQL, TSQL, MS Access, Oracle, Hive, MongoDB, Cassandra, PostgreSQL Statistical Software SPSS, R, SAS

Algorithms Skills Machine Learning, Neural Networks, Deep Learning, NLP, Bayesian Learning, Optimization, Prediction, Pattern Identification, Data / Text mining, Regression, Logistic Regression, Bayesian Belief, Clustering, Classification, Statistical modelling Data Science/Data Analysis

Tools & Techniques

Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural networks, AI, Teradata, Tableau, Power BI.

Development Tool R Studio, Notepad++, Python, Jupiter, Spyder IDE Python Packages Numpy, SciPy, Pandas, scikit-learn, Matplotlib, seaborn, stats models, Keras, TensorFlow, Theano, NLTK, Scrapy

Techniques Machine learning, Regression, Clustering, Data mining Machine Learning Naive Bayes, Decision trees, Regression models, Random Forests, Time-series, K- means, Gradient Boosting, XG Boost, SVM, KNN.

Cloud Technologies AWS (EC2, S3, RDS, EBS, VPC, IAM, Security Groups), Microsoft Azure, Rackspace Operating Systems Microsoft Windows, Linux (Ubuntu) Big Data Hadoop, MapReduce, Apache Spark, Hive, Pig PROFESSIONAL WORK EXPERIENCE:

Client: Acumen Capital Partners, Calgary, AB Jul 2019 to Present Role: Data Scientist

Responsibilities:

Implemented end-to-end systems for Data Analytics, Data Automation and Integration.

Responsible for data identification, collection, exploration & cleaning for modelling, participated in model development.

Utilized Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.

Implement various statistical techniques to manipulate data (missing data imputation, principle component analysis and sampling) and build predictive models.

Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop. Implemented a Python-based distributed random forest via Python streaming.

Used R, Python and Spark to develop variety of models and algorithms for analytic purposes

Identified the energy consumption parameters and building the model to identify what causes more consumption.

Performed Data Cleaning, features scaling, features engineering using pandas and Numpy packages in Python.

Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modelling.

Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.

Worked on fraud detection analysis on payments transactions using the history of transactions with supervised learning methods.

Collected data in Hadoop and retrieved the data required for building models using Hive.

Developed Spark Python modules for machine learning & predictive analytics in Hadoop.

Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Decision Trees, Logistic regression, Gradient Boosting, SVM and KNN.

Worked with applications like R and Python to develop neural network algorithms, cluster analysis.

Used cross-validation to test the models with different batches of data to optimize the models and prevent over fitting.

Used PCA and other feature engineering techniques for high dimensional datasets while maintaining the variance of most important features.

Created Transformation Pipelines for pre-processing large amount of data with methods such as imputing, scaling, selecting, etc.

Ensemble methods were used to increase the accuracy of the training model with different Bagging and Boosting methods.

Environment: Hadoop, HDFS, Hive, Pig Latin, spark, R, SQL, Tableau, Seaborn, Python (Numpy, Pandas, Scikit-learn, Matplotlib), Jupyter, GitHub, Linux, Windows

Client: TracRite Softwarwe Inc., Winnipeg, MB Mar 2018 to Jun 2019 Role: Data Scientist

Responsibilities:

Communicated and facilitated with different offices to accumulate business prerequisites.

Gathered information data from different sources, and performed resampling strategy to deal with the issue of imbalanced information.

Worked with ETL Team and Doctors to comprehend the information and characterize the uniform standard organization.

Conducted information purging by utilizing progressed SQL questions in SQL Server Database.

Worked with various data pools and DBAs to have access to data. Had working knowledge of NLP, NLTK or Text Mining

Trained and supervised learning up to 8 other team members for the SQL/Scala/Spark programming language and assist in the installation and upgrading of Python, Java and Spark

Used K-means clustering for grouping similar data and documented.

Extracted, transformed, and loaded data in PostgreSQL data base using Python scripts.

Split the information into various littler dataset in view of various conclusions, accountable for leading exploratory information investigation for three of determinations datasets (Diabetes, frosty/influenza, hypersensitivity).

Created solutions to detect useful patterns out of 2 TB log data. This helps bring out crucial application Errors. Provides engineering solutions proactively for all business units involved to effectively make a variety of error handling decisions, especially from prioritization point of view using R-Studio.

Created the entire pipeline of information pre-processing (crediting, scaling, name encoding) through Python pandas to prepare information to demonstrating part.

Built prescient models, utilizing Python scikit-get the hang of, including Support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network to foresee a potential readmitted case.

Performed Ensemble strategies, including Gradient Boosting, Random Forest, redid outfit technique to create more exact arrangements.

Designed and actualized cross-approval and factual tests including Hypothesis testing, ANOVA, Chi-square test to check models' criticalness.

Created an API by utilizing Flask and imparted the plan to application group and enable them to characterize the prerequisites of new application.

Used agile procedure and Scrum process for venture creating. Environment: SQL server, SQL Server Integration Services, ETL, Scala, Java, Spark, PostgreSQL, R-studio, Python, Jupyter notebook, Flask, SharePoint, Linux, Windows

Client: Shareworks, Calgary, AB Jan 2017 to Feb 2018 Role: Data Analyst

Responsibilities:

Conducted a range of statistical analyses to provide valuable data-driven insights for business decision making.

Worked with packages like ggplot2 and shiny in R to understand data and developing applications.

Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.

Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain SDLC and Migration process. Used Talend for Extraction and Reporting purpose.

Have actively taken part in Data Profiling, Data Cleansing, Data Migration, Data Mapping and actively helped ETL developers to Compare data with original source documents and validate Data accuracy.

Worked on Tableau, to create dashboards and visualizations.

Analysed customer data in Python and R to track correlations in customer behaviour, define user segments to implement process and product improvements.

Worked on cleaning, exploring and manipulation of source data and transform to target system using Python and tools such as Pandas, Numpy, Matplotlib and PostgreSQL.

Gathered and analysed business requirements, interacted with various business users, project leaders, developers and also took part in identifying different data sources.

Worked on Python which is used for analysing, designing, developing and implementing statistical/data models and integrating Python with database.

Prepared several analytical reports comprised of different data modelling techniques such as time series analysis, financial modelling, and trend mapping

Worked on different data model designs, Data Extraction, Transformations, Mappings, Loading and generating Customized Analytical Reports.

Analysed the business requirements and designed Conceptual and Logical Data models using Erwin and generated database schemas and DDL (Data Definition Language) by using Forward and Reverse Engineering.

Worked as data modeller/analyst by creating and developing relational and dimensional data models by using Erwin.

Identified and designed business Entities and attributes and relationships between the Entities to develop a Conceptual model and Logical model and then translated the model into Physical model.

Implemented Normalization and De-Normalization Techniques to build the tables, indexes, views and maintained and implemented stored procedures as per requirements.

Conducted design reviews with developers and business analysts. Extensively worked on Performance Tuning and understanding Joins and Data distribution.

Performed Data Analysis using Python by using Numpy and Pandas Library by taking the data from csv files, xml files and excel files.

Participated in creating charts and graphs of the data from different data sources by using matplotlib and Scipy libraries in Python.

Used ad hoc queries for querying and analysing the data, participated in performing data profiling, data analysing, data validation and data mining.

Developed complex ETL mappings for Stage, Dimensions, Facts and Data marts load.

Involved in Data Extraction for various Databases & Files using Talend.

Worked on Tableau for Data Analysis, Digging the data for source systems for analysis and deeply dive in the data for Predictive findings and for various data Analysis by using dash boards and visualization. Environment: Python, R, NumPy, Pandas, SciPy, Erwin, ER/ Studio, Matplotlib, PostgreSQL Oracle, TOAD, MS Excel, JIRA, Teradata, ETL, Tableau, MS- Office, Linux, Windows. Client: Aptiva Technologies India Pvt .Ltd., IN Sep 2015 to Dec 2016 Role: Data Analyst

Responsibilities:

Gathered and documented the requirements that are critical to the business mission and translate business requirements into report visualization specification design documents.

Involved with Business Unit for Gap Analysis to check the compatibility of the existing system infrastructure with the new business requirements.

Prepared UML Diagrams like Activity, logical, Component and Deployment views to assist development and engineering in understanding the requirements using Rational Rose.

Conducted Functional Walkthroughs, User Acceptance Testing, and supervised the development of User Manuals for customers.

Created Data Stage jobs to extract, transform and load data into data warehouses from various sources like relational databases, application systems, temp tables, flat files.

Performed data analysis for testing that included supporting the ETL Ab-Initio that interacts with both OLTP & Data Warehouse System.

Worked with Data Warehouse developers to evaluate impact on current implementation, redesign of all ETL logic.

Created Source to target Data mapping document of input /output attributes with the proper transformations which would help for the easy development of SSIS Bridge.

To maintain the consistency and quality of the data, worked with Data Governance, Data Profiling and Data Quality team and managed the Master Data from all the business units ensuring data quality standards across the enterprise.

Developed backend application programs, such as Views, Functions, Triggers, Procedures and Packages using SQL and PL/SQL language for the top management for decision making.

Identify, retrieve, manipulate, relate and/or exploit multiple structured and unstructured data sets from various sources, including building or generating new data sets as appropriate.

Involved in writing the SQL Scripts for report development, Tableau reports, Dashboards, Scorecards and handled the performance issues effectively.

Wrote SQL scripts to test mappings and developed Traceability Matrix of Business Requirements to Test Scripts & ensure Change Control in requirements for test case update. Environment: Rational Rose, ETL, OLTP, Data Warehouse, SQL, SSIS, SSRS, SSAS, Tableau, PL/SQL, Tableau. Education: Bachelor’s in Computer Science Engineering from Sunrise University, India.

Contact this candidate