Data Analyst

Location:

Trumann, AR, 72472

Posted:

January 22, 2019

Contact this candidate

Resume:

Denzil P

Data Scientist

Email:*******@*****.***

Phone no :408-***-**** Ext :515

EDUCATION: Bachelor of Engineering, Computer Science University of Mumbai, 2009

TOOLS AND TECHNOLOGIES:

Bigdata/Hadoop Technologies

Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka.

Languages

HTML5, CSS3, XML, C, C++, R/R Studio, SAS Enterprise Guide, SAS, R, Perl, MATLAB, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Genism,Keras), SQL, PL/SQL, HiveQL, Java Script, Shell Scripting.

Java & J2EE Technologies

Core Java, JSP, Servlets, JDBC, JAAS, JNDI, Hibernate, Spring, Struts, JMS, EJB, Restful

Application Servers

Web Logic, Web Sphere, JBoss, Tomcat.

Databases

Microsoft SQL Server, MySQL, Oracle, DB2, Teradata, Netezza

NO SQL Databases

HBase, Cassandra, Mongo DB, Maria DB

Build Tools

Jenkins, Maven, ANT, Toad, SQL Loader.

Business Intelligence Tools

Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Development and Cloud Computing Tools

Microsoft SQL Studio, Eclipse, Net Beans, IntelliJ, Amazon AWS, Azure

Development Methodologies

Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing

API Git, SVM, GitHub, SVN and JUNIT

ETL Tools

Informatica Power Centre, SSIS

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos.

Data Modelling Tools

Erwin R, Rational Rose, ER/Studio, MS Visio, Oracle Designer, SAP Power designer, Enterprise Architect.

Operating Systems

All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Client: JB Hunt Lowell, Arkansas.

July 2017- Till Date

Role: Data Scientist

Description: J.B. Hunt Transport Services, Inc. is a trucking and transportation company that was founded by Johnnie Bryan Huntand based in the Northwest Arkansas city of Lowell.

Responsibilities:

Utilized ApacheSpark with Python to develop and execute BigData Analytics and Machine learning applications, executed machinelearning use cases under SparkML and Mllib.

Setup storage and data analysis tools in Amazon Webservices cloud computing infrastructure.

Used pandas, NumPy, Seaborn, SciPy, matplotlib, Sci-kit-learn, NLTK in Python for developing various machine learning algorithms.

Worked on different data formats such as JSON, XML and performed machine learning algorithms.

Worked with Data Architects and IT Architects to understand the movement of data and its storage.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.

Implemented Agile Methodology for building an internal application.

Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.

As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.

Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas)

Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN.

Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performedGapanalysis.

Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.

Updated Pythonscripts to match training data with our database stored in AWSCloudSearch, so that we would be able to assign each document a response label for further classification.

Data transformation from various resources, data organization, features extraction from raw and stored.

Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions

Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.

Updated Python scripts to match training data with our database stored in AWSCloudSearch, so that we would be able to assign each document a response label for further classification.

Data transformation from various resources, data organization, features extraction from raw and stored.

Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.

Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: R, ODS, OLTP, Bigdata, Oracle 10g, Hive, OLAP, DB2, Metadata, Python, MS Excel, Mainframes MS Vision, Rational Rose., Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.

Client: Visitors Coverage Inc, Santa Clara

Apr 2016 - Jun 2017

Role: Data Scientist

Description: Visitors Coverage Inc. is disrupting the global travel insurance industry by leveraging technology to redefine the way travelers purchase and manage travel insurance.

Responsibilities:

Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.

Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neuralnetworks, SVM, clustering to identify Volume using Scikit-learn package.

Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations.

Developed Spark/Scala,Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabelled data.

Evaluated models using Cross Validation, Log loss function, ROC curves and AUC for feature selection.

Analyse traffic patterns by calculating autocorrelation with different time lags.

Developed entire frontend and backend modules using Python on Django Web Framework.

Implemented the presentation layer with HTML, CSS, and JavaScript.

Involved in writing stored procedures using Oracle.

Addressed over fitting by implementing the algorithm regularization methods like L2 and L1.

Used Principal Component Analysis in feature engineering to analyse high dimensional data.

Identified and targeted welfare high-risk groups with Machinelearning algorithms.

Developed Tableau visualizations and dashboards using Tableau Desktop.

Created clusters to classify Control and test groups and conducted group campaigns.

Developed LinuxShellscripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza.

Developed triggers, stored procedures, functions, and packages using cursors and ref cursor concepts associated with the project using PL/SQL.

Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.

Performed data analysis by using Hive to retrieve the data from Hadoopcluster, SQL to retrieve data.

Used MLlib, Spark's Machine learning library to build and evaluate different models.

Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.

Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages.

Developed Map Reduce pipeline for feature extraction using Hive.

Created Data QualityScripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Communicated the results with operations team for taking best decisions.

Collected data needs and requirements by Interacting with the other departments.

Environment:Python, CDH5, HDFS, Hadoop, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL,PySpark.

Client: Target Corporation - Minneapolis, MN

Dec 2014 - Mar 2016

Role: Data Scientist

Description: Target Corporation is the second-largest discount store retailer in the United States, behind Walmart, and a component of the S&P 500 Index. Founded by George Dayton and headquartered in Minneapolis, Minnesota, the company was originally named good fellow Dry Goods in June 1902 before being renamed the Dayton's Dry Goods Company in 1903 and later the Dayton Company in 1910.

Responsibilities:

Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.

Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.

Used predictive modeling with tools in SAS, SPSS, R, Python.

Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.

Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.

Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions.

Created SQL tables with referential integrity and developed queries using SQL+,PL/SQL.

Involved with DataAnalysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats

Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.

Prepare ETLarchitect& design document which covers ETLarchitect, SSISdesign, Extraction, transformation and loading of DuckCreek data into dimensional model.

Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, NaiveBayes, fitting function etc to data with help of Scikit, Scipy, Numpy and Pandas.

Applied clustering algorithms i.e.Hierarchical, K-means with help of Scikit and Scipy.

Developed visualizations and dashboards using ggplot, Tableau

Worked on development of data warehouse, DataLake and ETL systems using relational and non relational tools like SQL, No SQL.

Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).

Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them

Expertise in BusinessIntelligence and data visualization using R and Tableau.

Validated the Macro-Economic data and predictive analysis of world markets using key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and RandomForest.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, My SQL, Eclipse, PL/SQL, SQL connector, Tableau.

Client: Charles Schwab,Austin,TX.

Apr 2013-Nov 2014

Role: Data Modeler/Data Analyst

Description: The Charles Schwab Corporation is a bank and brokerage firm, based in San Francisco, California. It was founded in 1971 by Charles R. Schwab and is one of the largest banks in the United States as well as one of the largest brokerage firms in the United States. The company provides services for individuals and institutions that are investing online.

Responsibilities:

Communicated effectively in both a verbal and written manner to client team.

Completed documentation on all assigned systems and databases, including business rules, logic.

Created Testdata and TestCases documentation for regression and performance.

Designed, built, and implemented relational databases.

Determined changes in physical database by studying project requirements.

Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.

Facilitated gathering moderately complex business requirements by defining the business problem

Utilized SPSS statistical software to track and analyze data.

Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly.

Used advanced MicrosoftExcel to create pivot tables, used VLOOKUP and other Excel functions.

Successfully interpreted data to draw conclusions for managerial action and strategy.

Created Data chart presentations and coded variables from original data, conducted statistical analysis as and when required and provided summaries of analysis.

Environment: Data Analysis, SQL, FTP, SFTP, XML, Web Services

Client: Ediko Systems Inc - Hyderabad, INDIA

Nov 2011-Mar 2013

Role: Data Analyst

Description: Ediko Systems Integrators, an IBM Premier Business Partner, is a specialist company delivering world-class business solutions leveraging IBM Technologies. EDIKO ensures the delivery of high-quality business integration solutions through the application of sound software architecture principles and using the latest IBM technologies together with agile project management techniques.

Responsibilities:

Processed data received from vendors and loading them into the database. The process was carried out on weekly basis and reports were delivered on a bi-weekly basis.

Documented requirements and obtained signoffs.

Coordinated between the Business users and development team in resolving issues.

Documented data cleansing and data profiling.

Wrote SQLscripts to meet the business requirement.

Analyzed views and produced reports.

Tested cleansed data for integrity and uniqueness.

Automated the existing system to achieve faster and accurate data loading.

Learned to create Business Process Models.

Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.

Good Understanding of clinical practice management, medical and laboratory billing and insurance claim with processing with process flow diagrams.

Assisted QA team in creating test scenarios that cover a day in a life of the patient for Inpatient and Ambulatory workflows.

Environment: SQL,data profiling, data loading,QA team.

Client: Hidden Brains - Hyderabad, INDIA

Jan 2010- Oct 2011

Role: Data Analyst

Description: Hidden Brains InfoTech Pvt. Ltd is an Enterprise Web & Mobile Apps Development Company. With an industry experience of over a decade, we offer a plethora of client-centric services by enabling customers to achieve competitive advantage through flexible and next generation global delivery models.

Responsibilities:

Implemented MicrosoftVisio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application

Worked with other teams to analyze customers to analyze parameters of marketing.

Conducted Design reviews and Technical reviews with other project stakeholders.

Was a part of the complete life cycle of the project from the requirements to the production support

Created test plan documents for all back-end database modules

Used MS Excel, MS Access,andSQL to write and run various queries.

Used traceabilitymatrix to trace the requirements of the organization.

Recommended structural changes and enhancements to systems and databases.

Conducted Design reviews and Technical reviews with other project stakeholders.

Maintenance in the testing team for System testing/Integration/UAT

Guaranteeing quality in the deliverables.

Environment: UNIX, SQL, Oracle 10g, MS Office, MS Visio.

Contact this candidate