Post Job Free

Resume

Sign in

Data Sql Server

Location:
Pleasanton, CA, 94588
Salary:
$80/hr
Posted:
October 23, 2017

Contact this candidate

Resume:

Professional Summary:

Over ** + Years of Overall IT Experience as Data Scientist/Machine Learning and Data Warehouse applications using Informatica, Oracle and Teradata

Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents

patterns within data, analyzing data and interpreting results

Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles

Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts

Proficient in: Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques

Adept in writing code in R and T-SQL scripts to manipulate data for data loads and extracts

Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy

Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing

Strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization

Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB

Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against deadlines

Good knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests

Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, and Flume for transferring unstructured data to HDFS

Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers

Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support

Strong in-depth knowledge in doing data analysis, data quality and source system analysis.

Independent, Self-starter, enthusiastic team player with strong adaptability to new technologies

Experience in Big Data Technologies using Hadoop, Sqoop, Pig and Hive.

Experience in writing Hive and Unix shell scripts

Excellent track record in delivering quality software on time to meet the business priorities.

Developed Data Warehouse/Data Mart systems, using various RDBMS (Oracle, MS-SQL Server, Mainframes, Teradata and DB2)

Highly Proficient in using Informatica Power Center, Power Exchange and explore on Informatica Data Services.

Technical Skills

Programming Skills

R language, Python, PL/SQL

Databases

Teradata 12/13/14, Oracle 9i/10g/11g/12c, MySQL, SQL Server 2000/2005, MS Access, DB2, Hadoop (HDFS)

Libraries

Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2

Operating Systems

Windows, Unix, Linux

Web Related

ASP.NET, VB Script, HTML, DHTML, JAVA, Java Script

Tools & Utilities

Teradata Parallel Transporter, Aprimo 6.1/8.X, Bteq, SQL Assistant, Toad, SQL Navigator, SQL*Loader, $U, HP Quality center, PVCS, Data Flux, UC4, Control-M

Domain Knowledge

Banking, Finance, Insurances, Health Care, Energy

Professional Experience

Safeway Pleasanton, CA Jan 2017 – Till Date

Sr. Data Scientist

Project Description:

Albertson’s Merger

United Market is independent organization owned by Albertsons companies. Albertson’s Marketing team need visibility into Sales, Promotional and marketing data to me merged with rest of the Organization. EDW data from United markets was extracted, mapped and integrated with Albertson’s data.

Responsibilities

This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation

Develop a pricing model for various product & services bundled offering to optimize and predict the gross margin

Built price elasticity model for various product and services bundled offering

Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering

Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning

Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions

prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs

Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys

Good hands on experience on Amazon Redshift platform

Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction

and support multiple marketing segmentation programs

Segmented the customers based on demographics using K-means Clustering

Explored different regression and ensemble models in machine learning to perform forecasting

Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring

Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI

Environment: MS SQL Server, R/R studio, Python, Redshift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook.

Sutter Health, Sacramento, CA Sep 2015 – Dec 2016

Data Scientist

Project Description:

AQRS (Ambulatory Quality Reporting System) is used to report the quality of health care cost incurred and preventive medicine that is used by doctors and physicians. It includes both commercial and Medicare patients.

Responsibilities

Analyze and Prepare data, identify the patterns on dataset by applying historical models

Collaborating with Senior Data Scientists for understanding of data

Perform data manipulation, data preparation, normalization, and predictive modeling

Improve efficiency and accuracy by evaluating model in R

Present the existing model to stockholders, give insights for model by using different visualization methods in Power BI

Used R and Python for programming for improvement of model

Upgrade the entire models for improvement of the product

Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values

Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling and Normalizing Variables

Developed a predictive model and validate Neural Network Classification model for predict the feature label

Performed Boosting method on predicted model for the improve efficiency of the model

Presented Dashboards to Higher Management for more Insights using Power BI

Environment: R/R Studio, Python, SQL Enterprise Manager, Git Hub, Microsoft Power BI, outlook.

Bank of the West, CA Sep 2014 – Aug 2015

Data Scientist

Project Description:

Customer Relationship Management (CRM) implemented by a Teradata Aprimo, to handle its contact with its customers. CRM software is used to support these processes, storing information about current and prospective customers. The interface helps to improve services provided directly to customers and to use the information in the system for targeted marketing and sales purposes.

Responsibilities:

Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various JRD sessions to meet the job requirements

Designed data profiles for processing, including running PL/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks

Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in R

Conducted data/statistical analysis, generated Transaction Performance Report on monthly and quarterly basis for all the transactional data from U.S., Canada, and Latin America Markets using SQL server and BI tools such as Report services and Integrate services (SSRS and SSIS)

Used R to generate regression models to provide statistical forecasting

Applied Clustering Algorithms such as K-Means to categorize customers into certain groups

Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse

Used Tableau and designed various charts and tables for data analysis and creating various analytical Dashboards to showcase the data to managers

Performed data management, including creating SQL Server Report Services to develop reusable code and an automatic reporting system and designed user acceptance test to provide end with an opportunity to give constructive feedback

Environment: R/R Studio, SAS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel

PayPal, CA Feb 2014 – Aug 2014

ETL and Teradata Developer

Project Description:

GCE (Global Credit Expansion), is to expand BillMeLater services globally. This project aimed at consolidating multiple source systems into Single Source of Truth for BI Reporting, Decision Support Systems.

Responsibilities:

Analysis, Design, Development, Testing and Deployment of Informatica workflows, BTEQ scripts, Python and shell scripts.

Source System Analysis and provide input to data modeling, and developing ETL design document as per business requirements.

Design, Developing and testing of the various Mappings and Mapplets, worklets and workflows involved in the ETL process.

Developed and Integrated Data Quality measures into ETL frame work using Informatica Data Quality (IDQ).

Experience in data profiling using IDQ for input into ETL Design and Data Modelling.

Extensively used ETL to transfer data from different source system and load the data into the target DB.

Developing Informatica mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.

Extracting data from various sources across the organization (Oracle, MySQL, SQL Server and Flat files) and loading into staging area.

Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, Power Exchange, IDQ, OCL Tool, UC4, Control-M, ER Viewer, Business Intelligence, Windows, HP Quality center,Unix, Linux.

Maryland State, Annapolis, MD June 2010 – Jan 2014

ETL Developer

Project Description:

Modernized Integrated Tax System (MITS) will enable Maryland state department of audit control to run analytics on Tax filings. MITS systems extract tax filing data from multiple sources for individuals and Organizations both filed electronically and manually. Data is integrated in EDW and feed to downstream applications in other state departments.

Responsibilities:

Developed Low level mappings for Tables and columns from source to target systems.

Wrote and optimized Initial data load scripts using Information and Database utilities.

Using Partitions to extract data from source and load it to Teradata using TPT load with proper load balance on Teradata server.

Wrote Complex Bteq scripts to incorporate Business functionality in transforming the data from Staging into 3rd normal form.

Participated in Teradata Upgrade project to upgrade from TD12 to TD13.10 to conduct regression testing.

Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, SSIS, SSRS, ER Viewer, Windows, HP Quality center, UNIX.

Care First (Blue Cross Blue Shields), Owings Mills, MD Dec 2008 – Jun 2010

Senior ETL Developer

Project Description:

Q The project scope is to build Departmental Data Mart for CareFirst Human resources and Administration. This Data mart consolidates data from PeopleSoft and external vendors. Employees Health plan information is integrated from PeopleSoft for all its employees on CareFirst plan as well as other offered plans. External Data is integrated using File Extracts on daily basis.

Responsibilities:

Created Uprocs, Sessions, Management Unit to schedule jobs using $U.

Conduct source System Analysis and developed ETL design document to meet business requirements.

Tuned Teradata Sql queries and resolved performance issues due to Data Skew and Spool space issues.

Developed Flat files from Teradata using fast export, Bteq to disseminate to downstream dependent systems.

Environment: Teradata, Oracle, PL/SQL, Informatica Power Center, $U, Business Objects, SSIS, Windows XP, UNIX Shell scripting.

Scott & White Hospital, Temple, TX Jan 2008 – Nov 2008

ETL Developer

Project Description

This project was executed to develop enterprise knowledge data warehouse which is intended to ultimately deliver the right information to the right people in the underwriting organization. System maintains claims, payments and financial information.

Responsibilities:

Documenting functional specifications and other aspects used for the development of ETL mappings

Design, Developing and testing of the various Mappings and Mapplets, worklets and

Optimized Performance of existing Informatica workflows.

Involved in fixing invalid Mappings, testing of Stored Procedures and Functions, Unit and Integration Testing of Informatica Sessions, Batches and the Target Data.

Environment: Oracle, SQL Server, DB2, Informatica Power Center, Erwin, Cognos, XML, Windows, Unix

XCEL Energy, Minnesota, MN Oct 2006 – Dec 2007

ETL Developer

Project Description

This project was designed for integrating various Data Marts that targeted specific Business processes including Marketing, Generation, Transmission and Distribution etc. The data warehouse has been designed using Erwin adopting Star Schema methodology. Cognos was used to analyze business decisions and to build long-term strategic plans.

Responsibilities:

Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer

Extracted data from various sources across the organization (Oracle, SQL Server and Flat files) and loading into staging area

Created and scheduled Sessions and Batch Process based on demand, run on time, or run only once using Informatica Workflow Manager and monitoring the data loads using the Workflow Monitor

Environment: Oracle, SQL Server, PL/SQL, Informatica Power Center, Erwin, Cognos, Windows, UNIX



Contact this candidate