Data Analyst Machine Learning

Location:

Scarborough, ON, Canada

Salary:

95000

Posted:

August 16, 2023

Contact this candidate

Resume:

Pragna Reddy

Data Analyst

**************@*****.***

647-***-****

Data Analyst:

Data warehousing and Business Intelligence Technologies with expertise in building, deploying, managing applications and services through a global network of Microsoft - managed Data Center. Specialized in design, implementation, integration, maintenance and testing of various web-based, Enterprise, Client/Server and Distributed applications on Data warehousing, Data Engineering, Feature engineering, big data, ETL/ELT, and Business Intelligence. As a Data Analyst, specialized in Azure Data Factory, Azure frameworks, Hadoop Ecosystem, Excel, Snowflake, relational databases, tools like Tableau, Alteryx, PowerBI Python and Data DevOps Frameworks/Azure DevOps Pipelines

Designed and developed scalable and cost-effective architecture in AWS /Azure/ Google cloud Big Data services for data life cycle of collection, ingestion, storage, processing, and visualization.

PROFESSIONAL SUMMARY:

5 years of IT experience as a Data Analyst with a focus on Machine Learning, Statistics, Data Mining, and Data Visualization. Proficient in Python programming, with applications in Credit Risk Management, Strategic Targeting, and Marketing Campaign Development. Extensive experience in Tableau reporting.

Well versed in various Natural Language Processing (NLP) methodologies including Word Embedding, Text Clustering and Classification, using Deep Learning and other conventional machine learning concepts.

Experienced in acquiring and scrubbing datasets, performing Data Engineering using statistical techniques, and conducting exploratory data analysis. Proficient in building diverse machine learning algorithms for predictive model development and designing impactful visualizations to drive business profitability.

Strong experience in the entire data warehousing lifecycle, including data analysis, design, development, implementation, and testing. Proficient in data conversions, data extraction, data transformation, and data loading (ETL).

Project scope across Data Science, Data Analytics projects in collaboration with management and client.

Leveraging AI technologies like ChatGPT to company’s needs.

Familiar with KYC procedures and guidelines for verifying customer identities and managing risk in financial transactions.

Proficient in AML (Anti-Money Laundering) regulations and adept at implementing procedures to prevent money laundering and terrorist financing activities.

Knowledgeable in Trade Reporting regulations, ensuring accurate and timely reporting of trade activities to regulatory authorities.

Outstanding preeminence in Data extraction, Data cleaning, Data Loading, Statistical Data Analysis, Exploratory Data Analysis, Data Wrangling, Predictive Modeling using, Python and Data visualization using Tableau.

Utilized QA testing methodologies to verify data accuracy and reliability, identifying and tracking defects.

Leveraged CPA designation and operational experience to improve financial processes and reporting accuracy.

Utilized extensive knowledge of Microsoft Excel and Google Sheets to analyze financial data, create detailed reports, and present findings to management.

Hands-on experience with a variety of financial and risk management tools, including Eagle PACE, Bloomberg, MSCI RiskManager, and Persefoni

Deep understanding of SAP applications, including SAP ECC, SAP TM, and Business Warehouse (BW versions 7.x and higher).

Advanced skills in virtual platforms, including their use for conducting meetings and presentations.

Proficient in design and development of various dashboards, reports utilizing Tableau Visualizations like bar graphs, scatter plots, pie-charts, geographic visualization and other making use of actions, other local and global filters according to the end user requirement.

Profound knowledge in Machine Learning Algorithms like Linear, Non-linear and Logistic Regression, Reinforcement Learning, Natural Language Processing, Fuzzy Logic, Random forests, Ensemble Methods, Decision tree, Gradient-Boosting, K-NN, SVM, Naïve Bayes, Clustering (K-means), Deep Learning.

Good understanding of Agile and Test-driven development (TDD) methodologies.

Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle, NoSQL databases such as MongoDB, HBase and Cassandra to handle unstructured data.

Solid understanding of RDBMS database concepts including Normalization and master in creating database objects such as tables, views, stored procedures, triggers, row-level audit tables, cursors, indexes and user-defined datatypes.

Prepared scripts to ensure proper data access, manipulation, and reporting functions with R programming languages.

Used Azure to streamline data management, significantly enhancing data retrieval efficiency.

Excellent communication skills, demonstrated through effective coordination with management, client, internal, and external teams.

Gained experience working with and reconciling Capital Markets or Institutional client transactional data, enhancing the accuracy and reliability of business intelligence outcomes.

Formulated procedures for integration of R programming plans with data sources and delivery systems.

Experience with common Machine Learning Algorithms like Regression, Classification and Ensemble models.

Very strong in developing data algorithms, doing data analysis, and applying data warehousing skills like Star schema, Snow flake schema, and Canonical schema.

Experience in building and maintaining SAS Data Management jobs, job flows within Visual process Orchestration, and architect data model.

Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, HBase, Sqoop, and Pig.

Developed expertise in DevOps and MLOps, improving efficiency and quality of the software delivery process.

Proficient in using Enterprise Resource Planning (ERP) systems to manage business processes.

Proficient in using MS Office Suite 2016/365, including Word, Excel, PowerPoint, and Outlook.

Utilized Source Code Management Tools such as Git, GitFlow, Bitbucket to maintain code quality and facilitate collaboration.

Managed multiple projects and priorities in a fast-paced, challenging work environment, demonstrating a strong ability to learn new concepts and technologies quickly

Experienced in using Apache Kafka and RabbitMQ for real-time data processing.

Deep understanding of OLTP, OLAP and data warehousing environment and tuning both kind of systems for peak performance.

Develop and deliver key initiatives in the areas of data management, Data Lake, Data Warehouse and analytics in line with respect to the timelines. Involved in the entire data science project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.

TECHNICAL SKILLS:

Programming Language

Python 3; SQL, Scala, R

Python Libraries

Pandas, Numpy, Matplotlib, Scikit-learn

Cloud Services

Azure, AWS, GCP

Operating System

Windows, Linux

ETL Tools

SSIS, SSRS, Informatica, DataStage, Talend

Software Methodologies

SDLC - Waterfall, Agile, SCRUM

Machine Learning

And Statistics

Regression, Random Forest, Clustering, Time-Series Forecasting, Hypothesis,

Explanatory Data Analysis

Machine Learning Algorithms

Linear Regression, Logistic Regression, Linear Discriminant Analysis (LDA),

Decision Trees, Random Forests with Adaboost and Gradient Descent Boosting, Support Vector Machines (SVM), Naïve Bayes, K – Nearest Neighbor,

Hierarchical clustering, K-means clustering, Density based clustering (DBSCAN).

Machine Learning Techniques

Principal Component Analysis, Single Value Decomposition,

Data Standardization Techniques, L1 and L2 regularization, RMS prop,

Hyper parameter tuning, KL Divergence, Resampling Techniques like

SMOTE, Cluster Centroid Methods, Ensemble Methods, Feature selection

and Feature Engineering, Cross Validation Methods(K-fold), Bleu Score.

Databases/Web Technology

MS SQL Server 15.0.18206.0, MS-Access 16.0, MySQL 8.0.21, GitHub 2.16, Oracle database 19.1.0, SAS

Documentation Tools

Office365 – Word, Excel, PowerPoint, Outlook, Access, one note

Data Modeling

MS Visio, Rational Rose, Erwin, ER/Studio, Star-Schema Modeling,

Snow Flake Schema, Tableau, PowerBI, Google Charts, IBM Cognos

Hadoop Stack

HDFS, MRV2(YARN), SQOOP, Flume, Hive

Other Software

Control M, Eclipse, PyCharm, Jupyter, Apache, Jira, Putty, Advanced Excel

Frameworks

Django, Flask, WebApp2

EDUCATION:

Auroras Technological and Research Institute June 2014 – May 2018

Bachelors of Technology in Information Technology

Hyderabad, India

PROFESSIONAL EXPERIENCE:

Role: Data Analyst

Client: Propel Holdings, Toronto, Ontario Feb 2022 – Till Date

Project Description: Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. In this implementation automated process, to identify the customers segments, who are eligible for loan amount so that they can specifically target customers.

Responsibilities:

Utilized industry-leading financial software, including Bloomberg, to gather and analyze market data, track investment performance, and identify emerging trends to support informed investment recommendations.

Proficient in Microsoft Excel, Word, and PowerPoint, producing comprehensive financial reports, investment memos, and client presentations with a high level of accuracy and professionalism.

Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed, and sent to an external entity.

Managed full SDLC processes involving requirements management, workflow analysis, source data analysis, data mapping, metadata management, data quality, testing strategy and maintenance of the model.

Conducted multivariate testing on website features to identify the best combinations for user experience, resulting in a 20% increase in average session duration.

Designed and implemented A/B testing to optimize conversion rates, leading to a 15% increase in user engagement.

Hands-on experience with HTML, CSS, and JavaScript, enabling robust website development and design.

Created and maintained Logical and Physical models for the data mart. Created partitions and indexes for the tables in the data mart.

Expertise in Adobe Analytics implementation using Adobe Launch; experienced in creating dashboards in Adobe & Google Analytics.

Configured web and mobile analytics for multiple clients, resulting in improved data tracking and insights.

Developed Perl scripts for automating system administration tasks such as backups and monitoring. Created web applications using Perl and CGI

Configured Docker images for production and development environments. Managed Docker containers using Kubernetes.

Managed data warehouse solutions with Azure Synapse Analytics, optimizing data storage, and processing for efficient querying and reporting.

Used the AWS Sagemaker to quickly build, train and deploy the machine learning models.

Superintended usage of Python Numpy, SciPy, Pandas, Matplot, Stats packages to perform dataset manipulation, data mapping, data cleansing and feature Engineering. Built and analyzed datasets using R and Python.

Used SAS Proc Import, Data and Proc Download procedures to extract the FIXED Format Flat files and convert into Teradata tables.

Integrated Confluence with JIRA to track project milestones and tasks, improving project efficiency by 20%

Led technical integration projects, ensuring seamless connectivity and functionality across diverse systems.

Leveraged UNIX tools and shell scripting to automate tasks, enhance system performance, and manage files and directories.

Proficient in a wide range of programming languages and data analysis tools including SPSS, Matlab, VBA, and C++, which have been instrumental in data preprocessing, analysis, and post-analysis tasks.

Proficient in Microsoft Office applications, including Word, Excel, PowerPoint, and Outlook, which I've used for data management, report writing, presentation creation, and email communication.

Leveraged GCP data technologies, including Google Cloud and BigQuery, and worked with Snowflake for efficient cloud-based data storage and analysis.

Utilized data intelligence and visualization platforms like Looker, Google Data Studio, and Tableau to create dynamic visualizations and reports, enabling stakeholders to gain insights from complex data sets.

Expanded data visualization skills to include proficiency with Datastudio and Qliksense, complementing existing expertise with Tableau and Power BI.

Developed skills in social reporting to track and analyze social media performance, and applied basic research/survey methodology to gather and analyze user feedback.

Cultivated understanding of HTML, CSS, and JavaScript, enhancing my ability to work closely with web development teams and troubleshoot web analytics tracking issues

Developed data extraction and analysis tools using C#, reducing the time taken to process data by 50%

Used Pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn, NLTK, and spaCy in Python for developing various machine learning algorithms.

Develop, optimize, standardize and implement data science and machine learning solutions at scale in data pipelines and distributed systems (e.g., Hadoop and Kubernetes ecosystem).

Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R and Python.

Involved in Data Modeling using Star Schema, Snowflake Schema.

Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.

Multi-layers Neural Networks built in Python Scikit-learn, Theano, Tensor Flow and keras packages to implement machine learning models and export them into protobuf and performed integration job

Analyzed complex business information, translated requirements into functional specifications, designed campaign pulls (Complex SQL scripting) from the Database (Cursors- Art Type splits, Control pull for promos and trigger campaigns).

Calculate Hadoop Storage for the project and deployed cluster in development environment and Installation of various Hadoop ecosystem components.

Utilized Microsoft Azure for cloud data storage and management, improving data retrieval efficiency by 25%.

Writing of MR and SQOOP programs to ingest data from Mainframes and MySQL databases.

Broadly designed easy to follow visualizations using Tableau software and published dashboards on web and desktop platforms.

Audited monthly response analysis report with the automated Cognos report and standardized the process.

Environment: Python Scikit-learn, Theano, Tensor Flow and keras packages, NumPy, SciPy, Pandas, Matplot, Stats, SAS, SAS PROC IMPORT, AWS, DATA and PROC DOWNLOAD, Star Schema, Snowflake Schema, Hadoop, Cognos, SQL, Teradata, SDLC-Agile/Scrum.

Role: Data Analyst

Client: Sevone, Markham, Ontario July 2020 – Jan 2022

Project Description: Project is about building data warehouse for PLM. PLM is Product lifecycle management. It carries out the data since beginning of the product design to product shipping to different stores. I was working for integrating data for PLM data from front hand tools to the backend database. Informatica PC was the tool used for the data integration loading.

Responsibilities:

Involved in analysis, specification, design, and implementation and testing phases of Software Development Life Cycle (SDLC) and used agile methodology for developing application.

Worked on Python Open stack API and used Python scripts to update content in the database and manipulate files.

Utilized SQL and data modeling tools (e.g., ERwin, PowerDesigner) for data definition and manipulation.

Developed and managed conceptual, logical, and physical data models to support new and existing applications.

Managed and analyzed data within the Transport Management Systems TMS, ensuring accurate and timely reporting of transportation metrics.

Collaborated with logistics and supply chain teams to optimize routing and scheduling through Transport Management Systems (TMS), resulting in a 10% reduction in transportation costs.

Collaborated with Data Analysts, data architects, and data scientists to ensure data model alignment with business requirements.

Led in the creation of a data warehouse model, resulting in a 20% reduction in report generation time.

Monitored KPIs and isolated causes of unexpected variances and trends to orchestrate expert-level data visualizations and presented recommendations to improve business blueprints

Imported SAS data files into Python (Pandas) to test the initial phase of the project.

Dealt with data ambiguity and performed lazy evaluation in PySpark for code optimization.

Created Spark Data Frames and performed quantitative analysis on the Data Frames.

Designed the visualization of the core competencies in the model parameter using ggplots2, Seaborn and Matplotlib.

Developed deep learning models POC are using python to predict the outcome for various datasets.

Worked directly with internal and external teams and account leadership to track and report on the performance.

Understanding information needs and extracting data from a variety of sources to create analytics deliverables that meet those needs.

Designed a data warehouse using Hive, created and managed Hive tables in Hadoop.

Understanding of Partitions, bucketing concepts in Hive and Designed Managed and External tables in Hive to optimize performance.

Designed SSIS Packages to extract, transfer and load (ETL) data from flat files to SQL Server using Business Intelligence Development Studio.

Created mappings, tasks, and workflows to accomplish these ETL processes. Also used Informatica to manage data quality and master data management.

Developed reports, prepared analysis, and made predictions based on data and analysis.

Prepared recommendations for decisions based on data and analysis.

Collaborative with exceptional skills in communication, coordination, teamwork, and managing conflicts effectively.

Skilled in building data warehouses and creating business intelligence solutions using Databricks and Gen2.

Experience in managing and reviewing Hadoop log files.

Designed, Developed and Deployed reports in MS SQL Server Environment using SSRS.

Utilized Informatica PowerCenter to extract, transform, and load data from various sources into a data warehouse

Developed and maintained ongoing data exploratory analyses against internal and client-provided data.

Intellectual curiosity, entrepreneurial drive, innovative structured thinking, and problem-solving skills.

Built predictive models on large datasets by utilizing advanced statistical modeling, machine learning, or other data mining techniques using Python.

Analyzing large data sets, including synthesizing quantitative results, deriving business implications and making clear recommendations.

Written stored procedures to insert and clean data into the data warehouse tables, fired triggers upon completion of load to obtain the log of all the tables loaded with job name and load date

Environment: Python, Spark, Anaconda Cloud, GitHub, MS SQL, SSIS, SSRS, SAS data files, Hadoop, Pandas, Numpy, Seaborn, Matplotlib, Jupyter Notebook, Informatica, Tableau, Tensor flow, PostgreSQL.

Role: Data Analyst

Client: Sonata, Hyderabad, India June 2018 – May 2020

Project Description: One of the client platform enables organizations to monitor shipping profiles and billing activities on a unified portal This project handles all Data movements to and from Teradata, and the Big Data Analytic Platforms through a standardized Data Integration Architecture using Common Design Patterns and re-usable components managed through a common intake and management process. Data warehouse to consolidate all the data sources that functional groups use to compute and standardize KPIs. Multi-Cloud (Azure AKS K8s, GKE K8s, Serverless) Data Processing Pipeline Architectures, SRE Site Reliability Engineering, Security Practice Implementation Preparing the end-to-end documents and architecture design

Responsibilities:

Ensured timely and accurate delivery of data, gather requirements for new claims, and design solutions in the form of scheduled reports and data feeds.

Support report development, coding, operational changes as requested by internal business unit stakeholders and process owners.

Researched, identified, interpreted trends and patterns in complex data sets using advanced statistics and data mining techniques.

Visualized the categorical and continuous features via mean response, histogram and bar plot using Matplotlib and Seaborn packages.

Monitored incoming data to ensure Russell Investments receives timely delivery and accurate data.

Conducted data analysis when necessary to determine root cause of data inconsistencies.

Acquire and automate additional data sources (internal and external) as needed to improve operating efficiency or support business/product needs.

Applied Dimensionality Reduction techniques like PCA, t-SNE to reduce the correlation between features and high dimensionality of the data to better use time and storage.

Analyzed claims using NLP and worked on NLTK package for application development.

Utilized Stemming, Lemmatization, Tokenization, TF-IDF, and Bag of Words Word2Vec to get most relevant and frequent words in the claimed documents.

Employed Ensemble Learning techniques such as Random Forests and Ada Gradient Boosting to improve the model performance by 15%.

Employed K-Fold cross-validation to test and verify the model accuracy.

Fake claims, feedbacks were flagged using Recurrent Neural Networks (RNN) and Long Shot Term Memory models using Keras.

Using Spark and Big Data technologies such as PySpark, SparkSQL, and Spark MLLib extensively and developed ML algorithms using MLLib libraries on AWS.

Ensured effective and up-to-date data management practices with a sincere focus on data quality.

Networked with other key data management associates globally/across Russell, evangelizing and implementing common best practices wherever possible.

Created and maintained documentation like work flow diagrams, processes, and desk notes.

Prepared a dashboard and story in Tableau showing the benchmarks and summary of model’s measure to the organization.

Migrating developed code from lower to higher environment DEV/INT/UAT/CERT/PROD and prepare release notes, snapshots for each environment.

Environment: Python, Numpy, Pandas, Seaborn, Matplotlib, Scikit-Learn, TensorFlow, Tableau, PostgreSQL, PySpark, SparkSQL.

Contact this candidate