Kondal **************@*****.*** 469-***-****
Data Scientist
Over 8 years of Experience in Designing, Administration, Analysis, Management in the Business Intelligence Data warehousing Web-based Applications and Databases and Experience in industries such as Retail, Financial, Accounting, Distribution, Logistics, Inventory, Manufacturing, Marketing, Services, Networking and Engineering
Experience in all the Latest BI Tools Tableau, Qlikview Dashboard Design and SAS.
Analyze and extract relevant information from large amounts of data to help automate for self-monitoring, self-diagnosing, self-correcting solutions and optimize key processes.
Experience in data architecture design, development, maintenance for Windows and Android device applications.
Developing LogicalDataArchitecture with adherence to Enterprise Architecture.
Experience on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, Qlikview data visualization tool and Anaplan forecasting tool.
Strong DataWarehousingETL experience of using Informatica 9.1/8.6.1/8.5/8.1/7.1 Power Center Client tools - Mapping Designer, Repository manager, Workflow Manager/Monitor and Server tools Informatica Server, Repository Server manager.
Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle10g/9i, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
Experience in applying PredictiveModeling and MachineLearning algorithms for Analytical projects.
Developing Logical Data Architecture with adherence to Enterprise Architecture.
Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometrictechniques.
Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
Experienced the full software life cycle in SDLC, Agile and Scrummethodologies.
Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
Excellent knowledge in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for improved database performance in OLTP, OLAP and Data Warehouse/Data Mart environments.
2+ years' experience in Agile background of software/data design, development, deployment to build services and customer support in Enterprise applications using Object Oriented Analysis and Design (OOAD).
Work on gigabytes of text and image files (2-D and 3-D) to solve real-world problems and visualize the data from the generating data reports using Google Data Studio for customer usability.
Good track record of working with complex data sets and translating data into insights to drive key business and product decisions.
Experience with Azure, SQL and Oracle PL/SQL.
Experience working with Amazon Web Services (AWS) product like S3
Involved in a Aveva start-up mode and contributed to projects using Amazon Web Services (AWS) to develop and deploy applications support on device and cloud
Hands on experience with scripting languages like Perl, Bash Shell and PHP (for automation)
Good understanding of scalable data processing to discover hidden patterns, conducting error analysis in the data for financial and statistical modeling.
Machine Learning
Regression, Classification, Clustering, Association, Simple Linear Regression, Multiple
linear Regression, Polynomial Regression, Decision Trees, Random Forest, Logistic
Regression, K-Nearest Neighbors(K-NN), Kernel SVM
R Language skills
Data Preprocessing, Web Scraping, Data Extraction, Dplyr, GGplot, Apply functions, Statistical Analysis, Predictive Analysis, GGplotly, rvest, Data Visualization.
Frameworks
Shogun, Accord Framework/AForge.net, Scala, Spark, Cassandra, DL4J, ND4J, Scikit-learn
Development Tools
Cassandra,DL4J,ND4J,Scikit-learn,Shogun,AccordFramework/AForge.net,Mahout, MLlib,H2O,ClouderaOryx,GoLearn, Apache Singa.
Modelling Tools
CA Erwin Data Modeler 7.1/4, Microsoft Visio 6.0, Sybase PowerDesigner16.5.
Version Controller
TFS, Microsoft Visual SourceSafe, GIT, NUNIT, MSUNIT
Software Packages
MS-Office 2003/ 07/10/13, MS Access, Messaging Architectures.
OLAP/ BI / ETL Tool
Business Objects 6.1 / XI, MS SQL Server 2008 / 2005 Analysis Services (MS OLAP,SSAS), Integration Services (SSIS),Reporting Services (SSRS), Performance Point Server (PPS),Oracle 9i OLAP,MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)
Web Technologies
Windows API, Web Services, Web API (RESTFUL) HTML5, XHTML, CSS3, AJAX, XML, XAML, MSMQ, Silverlight, Kendo UI.
Web Servers
IIS 5.0, IIS 6.0, IIS 7.5, IIS ADMIN.
Operating Systems
Windows Win8/XP/NT/95/98/2000/2008/2012, Android SDK.
Databases
SQL Server 2014/2012/2008/2005/2000, MS-Access, Oracle 11g/10g/9i and Teradata, big data, hadoop, Mahout, ML lib, H2O, Cloudera Oryx, GoLearn.
Database Tools
SQL Server Query Analyzer.
Wells Fargo Charlotte, NC (Data Scientist) Oct ‘16 to till date
Responsibilities:
Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
Designed the prototype of the Data mart and documented possible outcome from it for end-user.
Involved in business process modeling using UML
Developed and maintained data dictionary to create metadata reports for technical and business purpose.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server
Wrote simple and advanced SQL queries and scripts to create standard and adhoc reports for senior managers.
Collaborate the data mapping document from source to target and the data quality assessments for the source data.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
Participated in Business meetings to understand the business needs & requirements.
Prepare ETLarchitect& design document which covers ETLarchitect, SSISdesign, extraction, transformation and loading of Duck Creek data into dimensional model.
Provide technical & requirement guidance to the team members for ETL -SSISdesign.
Participated in Business meetings to understand the business needs & requirements.
Design ETL framework and development.
Design Logical & Physical Data Model using MS Visio 2003 data modeler tool.
Participated in stake holders meetings to understand the business needs & requirements.
Participated in Architect solution meetings & guidance in Dimensional Data Modeling design.
Coordinate and communicate with technical teams for any data requirements.
Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/Numpy/Pandas), R, SAS, SPSS, Mysql, Eclipse, PL/SQL, SQL connector, Tableau.
Automation Anywhere, Sanjose CA (Data Scientist) Aug ‘15 to Oct ‘16
Responsibilities:
Developed, tested and productionized a machine learning system for UI optimization, boosting CTR from 18% to 24% for the company’s website
Performed data preprocessing on huge data sets containing millions of rows including missing data imputation, noise and errortagging/removal, data consolidation and much more
Generalized feature extraction in the machine learning pipeline which improved efficiency throughout the system
Extracted customer time series data from millions of web logs using Apache Spark
Used predictive modeling with tools in SPSS, Python
Applied concepts of probability, distribution and statistical inference on customers’ data to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value etc
Developed SQL scripts for creating tables, Sequences, Triggers, views and materializedviews
Designed several high-performance prediction models using various packages in Python like Pandas, Numpy, Seaborn, SciPy, Matplotlib, Scikit-learn, Pandas-datareader, Statsmodels
Developed several ready-to-use templates of machine learning models based on specifications given and assigned clear descriptions of purpose and variables to be given as input into the model
PerformedHadoop ETL using Hive on data at different stages of pipeline
Developed models for Information Retrieval from financial Trade chats using Natural Language processing and Machinelearning
Collaborated with technologists and business stakeholders to drive innovation from conception to production
Developed MapReduce/SparkPythonmodulesfor machine learning & predictive analytics in Hadoop on AWS
Responsible for creating Hivetables, loading the structured data resulted from MapReducejobs into the tables and writing Hivequeries to further analyze the logs to identify issues and behavioral patterns
Developed architecture around models for multi-task learning, distributedtraining on multiple machines, and integration into consumer facing API
Involved in creating amonthly retention marketing campaign which improved customer retention rate by 15%
Prepared reports and presentationsusing Tableau, MSOffice, ggplot2that accurately convey data trends and associated analysis
Environment: Hadoop HDFS, MapReduce/YARN, HiveQL, Apache Spark, R, SPSS, Python, Google Analytics, Data Mining, Seaborn, SQL, Regression, Cluster analysis, Git hub, Tableau, Amazon EC2, Amazon RDS, WINDOWS/Linux platform
CVS Health, Woonsocket, RI (Data/Business Analyst) Jan’ 14 to Aug’15
Responsibilities:
Involved in requirementscollection, gapanalysis,reporting and documentcreation
Documented the complete process flow to describe programdevelopment, logic, testing, and implementation, applicationintegration, coding
Assessed completeness, consistency, and validity of customer data and created models and simulations
Explored and analyzed customer historical billing information to build a predictive model to forecast customers increasing or declining product use
Participated in the Agile planning process and dailyscrums, provided details to createstories based on technical solutions and estimates and worked with internal architects and, assisted in the development of current and target state data architectures
Documented the complete process flow to describe programdevelopment, logic, testing, and implementation, applicationintegration, and coding
Analyzed sales and performance records, and interpreted results.
Evaluated dataprofiling, cleansing, combination and extraction devices
Developed complex SQL queries to bring data together from various systems.
Performed Dataalignment and Datacleansing
Involved in Data Migration between Teradata and MS SQL server
Sourced and analyzed data from a variety of sources like MS Access, MS Excel, CSV and flatfiles
Used Visual Studio report builder to design report of varying complexity and maintain system design documents
Used ETL process to Extract, Transform and Load the data into stage area and data warehouse
Used Tableau and MSPowerPoint and MSExcel to produce reports
Environment: SQL Server 2005 Enterprise, MS Visio, MS Project, MS-Office, MS Excel, MS PowerPoint, MS Word, Macros, Teradata, Tableau, ETL, ER Studio, XML and Business Objects
Flexera Software, Chicago, IL (Data/Business Analyst ) Oct ‘12 to Dec’13
Responsibilities:
Involved in various activities of the project, like information gathering, analyzing the information, documenting the functional and non-functional requirements.
Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.
Extensively used Aginity Netezza workbench to perform various DDL, DML etc. operations on Netezza database.
Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models.
Involved in Perform Daily Monitoring of Oracle instances using Oracle Enterprise Manager, ADDM, TOAD, monitor users, table spaces, memory structures, rollback segments, logs, and alerts.
Used ER Studio Data/ Modeler for data modeling (data requirements analysis, database design etc.) of custom developed information systems, including databases of transactional systems and data marts.
Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
Involved in customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.
Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems.
Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the databases.
Involved in database testing, writing complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQL Developer and PL/SQL Developer.
Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, FastLoad, Multi Load, Fast Export, Tpump on UNIX/Windows environments and running the batch process for Teradata.
Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
Worked on Data warehouse concepts like Data warehouse Architecture, Star schema, Snowflake schema, and Data Marts, Dimension and Fact tables.
Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
Environment: Windows XP, SQL Developer, MS-SQL 2008 R2, MS-Access, MS Excel and SQL-PLU, Java, SSRS, SSIS.
Polaris, Pune (Data/Business Analyst) – Client Cisco Aug’09 to Sept’12
Responsibilities:
Developed Apex Classes, Controller Classes and Apex Triggers for various functional needs in the application.
Migrated data from external sources and performed Insert, Delete, Upsert & Export operations on millions of records. Designed and developed Service cloud and Integration.
Writing and executing customized SQL code for ad hoc reporting duties and used other tools for routine
Developed stored procedures and complex packages extensively using PL/SQL and shell programs
Involved in customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy SQL Server database systems
Used existing UNIX shell scripts and modified them as needed to process SAS jobs, search strings, execute permissions over directories etc.
Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
Involved in designing Context Flow Diagrams, Structure Chart and ER- diagrams
Worked on database features and objects such as partitioning, change data capture, indexes, views, indexed views to develop optimal physical data mode
Worked with SQL Server Integration Services in extracting data from several source systems and transforming the data and loading it into ODS
Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
.
Environment: Windows XP, SQL Developer, MS-SQL 2008 R2, MS-Access, MS Excel and SQL-PLU, Java.
Education Details:
Bachelor of Science from JNTU, Hyderabad Major as Computer Science