Professional Summary:
Over ** + Years of Overall IT Experience as Data Scientist/Machine Learning and Data Warehouse applications using Informatica, Oracle and Teradata
Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents
patterns within data, analyzing data and interpreting results
Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles
Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
Proficient in: Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques
Adept in writing code in R and T-SQL scripts to manipulate data for data loads and extracts
Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy
Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing
Strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization
Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against deadlines
Good knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests
Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, and Flume for transferring unstructured data to HDFS
Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers
Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support
Strong in-depth knowledge in doing data analysis, data quality and source system analysis.
Independent, Self-starter, enthusiastic team player with strong adaptability to new technologies
Experience in Big Data Technologies using Hadoop, Sqoop, Pig and Hive.
Experience in writing Hive and Unix shell scripts
Excellent track record in delivering quality software on time to meet the business priorities.
Developed Data Warehouse/Data Mart systems, using various RDBMS (Oracle, MS-SQL Server, Mainframes, Teradata and DB2)
Highly Proficient in using Informatica Power Center, Power Exchange and explore on Informatica Data Services.
Technical Skills
Programming Skills
R language, Python, PL/SQL
Databases
Teradata 12/13/14, Oracle 9i/10g/11g/12c, MySQL, SQL Server 2000/2005, MS Access, DB2, Hadoop (HDFS)
Libraries
Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2
Operating Systems
Windows, Unix, Linux
Web Related
ASP.NET, VB Script, HTML, DHTML, JAVA, Java Script
Tools & Utilities
Teradata Parallel Transporter, Aprimo 6.1/8.X, Bteq, SQL Assistant, Toad, SQL Navigator, SQL*Loader, $U, HP Quality center, PVCS, Data Flux, UC4, Control-M
Domain Knowledge
Banking, Finance, Insurances, Health Care, Energy
Professional Experience
Safeway Pleasanton, CA Jan 2017 – Till Date
Sr. Data Scientist
Project Description:
Albertson’s Merger
United Market is independent organization owned by Albertsons companies. Albertson’s Marketing team need visibility into Sales, Promotional and marketing data to me merged with rest of the Organization. EDW data from United markets was extracted, mapped and integrated with Albertson’s data.
Responsibilities
This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation
Develop a pricing model for various product & services bundled offering to optimize and predict the gross margin
Built price elasticity model for various product and services bundled offering
Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions
prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs
Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys
Good hands on experience on Amazon Redshift platform
Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction
and support multiple marketing segmentation programs
Segmented the customers based on demographics using K-means Clustering
Explored different regression and ensemble models in machine learning to perform forecasting
Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI
Environment: MS SQL Server, R/R studio, Python, Redshift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook.
Sutter Health, Sacramento, CA Sep 2015 – Dec 2016
Data Scientist
Project Description:
AQRS (Ambulatory Quality Reporting System) is used to report the quality of health care cost incurred and preventive medicine that is used by doctors and physicians. It includes both commercial and Medicare patients.
Responsibilities
Analyze and Prepare data, identify the patterns on dataset by applying historical models
Collaborating with Senior Data Scientists for understanding of data
Perform data manipulation, data preparation, normalization, and predictive modeling
Improve efficiency and accuracy by evaluating model in R
Present the existing model to stockholders, give insights for model by using different visualization methods in Power BI
Used R and Python for programming for improvement of model
Upgrade the entire models for improvement of the product
Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values
Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling and Normalizing Variables
Developed a predictive model and validate Neural Network Classification model for predict the feature label
Performed Boosting method on predicted model for the improve efficiency of the model
Presented Dashboards to Higher Management for more Insights using Power BI
Environment: R/R Studio, Python, SQL Enterprise Manager, Git Hub, Microsoft Power BI, outlook.
Bank of the West, CA Sep 2014 – Aug 2015
Data Scientist
Project Description:
Customer Relationship Management (CRM) implemented by a Teradata Aprimo, to handle its contact with its customers. CRM software is used to support these processes, storing information about current and prospective customers. The interface helps to improve services provided directly to customers and to use the information in the system for targeted marketing and sales purposes.
Responsibilities:
Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various JRD sessions to meet the job requirements
Designed data profiles for processing, including running PL/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks
Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in R
Conducted data/statistical analysis, generated Transaction Performance Report on monthly and quarterly basis for all the transactional data from U.S., Canada, and Latin America Markets using SQL server and BI tools such as Report services and Integrate services (SSRS and SSIS)
Used R to generate regression models to provide statistical forecasting
Applied Clustering Algorithms such as K-Means to categorize customers into certain groups
Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse
Used Tableau and designed various charts and tables for data analysis and creating various analytical Dashboards to showcase the data to managers
Performed data management, including creating SQL Server Report Services to develop reusable code and an automatic reporting system and designed user acceptance test to provide end with an opportunity to give constructive feedback
Environment: R/R Studio, SAS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel
PayPal, CA Feb 2014 – Aug 2014
ETL and Teradata Developer
Project Description:
GCE (Global Credit Expansion), is to expand BillMeLater services globally. This project aimed at consolidating multiple source systems into Single Source of Truth for BI Reporting, Decision Support Systems.
Responsibilities:
Analysis, Design, Development, Testing and Deployment of Informatica workflows, BTEQ scripts, Python and shell scripts.
Source System Analysis and provide input to data modeling, and developing ETL design document as per business requirements.
Design, Developing and testing of the various Mappings and Mapplets, worklets and workflows involved in the ETL process.
Developed and Integrated Data Quality measures into ETL frame work using Informatica Data Quality (IDQ).
Experience in data profiling using IDQ for input into ETL Design and Data Modelling.
Extensively used ETL to transfer data from different source system and load the data into the target DB.
Developing Informatica mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.
Extracting data from various sources across the organization (Oracle, MySQL, SQL Server and Flat files) and loading into staging area.
Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, Power Exchange, IDQ, OCL Tool, UC4, Control-M, ER Viewer, Business Intelligence, Windows, HP Quality center,Unix, Linux.
Maryland State, Annapolis, MD June 2010 – Jan 2014
ETL Developer
Project Description:
Modernized Integrated Tax System (MITS) will enable Maryland state department of audit control to run analytics on Tax filings. MITS systems extract tax filing data from multiple sources for individuals and Organizations both filed electronically and manually. Data is integrated in EDW and feed to downstream applications in other state departments.
Responsibilities:
Developed Low level mappings for Tables and columns from source to target systems.
Wrote and optimized Initial data load scripts using Information and Database utilities.
Using Partitions to extract data from source and load it to Teradata using TPT load with proper load balance on Teradata server.
Wrote Complex Bteq scripts to incorporate Business functionality in transforming the data from Staging into 3rd normal form.
Participated in Teradata Upgrade project to upgrade from TD12 to TD13.10 to conduct regression testing.
Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, SSIS, SSRS, ER Viewer, Windows, HP Quality center, UNIX.
Care First (Blue Cross Blue Shields), Owings Mills, MD Dec 2008 – Jun 2010
Senior ETL Developer
Project Description:
Q The project scope is to build Departmental Data Mart for CareFirst Human resources and Administration. This Data mart consolidates data from PeopleSoft and external vendors. Employees Health plan information is integrated from PeopleSoft for all its employees on CareFirst plan as well as other offered plans. External Data is integrated using File Extracts on daily basis.
Responsibilities:
Created Uprocs, Sessions, Management Unit to schedule jobs using $U.
Conduct source System Analysis and developed ETL design document to meet business requirements.
Tuned Teradata Sql queries and resolved performance issues due to Data Skew and Spool space issues.
Developed Flat files from Teradata using fast export, Bteq to disseminate to downstream dependent systems.
Environment: Teradata, Oracle, PL/SQL, Informatica Power Center, $U, Business Objects, SSIS, Windows XP, UNIX Shell scripting.
Scott & White Hospital, Temple, TX Jan 2008 – Nov 2008
ETL Developer
Project Description
This project was executed to develop enterprise knowledge data warehouse which is intended to ultimately deliver the right information to the right people in the underwriting organization. System maintains claims, payments and financial information.
Responsibilities:
Documenting functional specifications and other aspects used for the development of ETL mappings
Design, Developing and testing of the various Mappings and Mapplets, worklets and
Optimized Performance of existing Informatica workflows.
Involved in fixing invalid Mappings, testing of Stored Procedures and Functions, Unit and Integration Testing of Informatica Sessions, Batches and the Target Data.
Environment: Oracle, SQL Server, DB2, Informatica Power Center, Erwin, Cognos, XML, Windows, Unix
XCEL Energy, Minnesota, MN Oct 2006 – Dec 2007
ETL Developer
Project Description
This project was designed for integrating various Data Marts that targeted specific Business processes including Marketing, Generation, Transmission and Distribution etc. The data warehouse has been designed using Erwin adopting Star Schema methodology. Cognos was used to analyze business decisions and to build long-term strategic plans.
Responsibilities:
Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer
Extracted data from various sources across the organization (Oracle, SQL Server and Flat files) and loading into staging area
Created and scheduled Sessions and Batch Process based on demand, run on time, or run only once using Informatica Workflow Manager and monitoring the data loads using the Workflow Monitor
Environment: Oracle, SQL Server, PL/SQL, Informatica Power Center, Erwin, Cognos, Windows, UNIX