Harsh Patel
Data Scientist
Email: ************@*****.*** Phone No: 567-***-****
Professional Summary:
Knowledge of Machine Learning, Statistical Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
Good in Big Data, Hadoop, No SQL database (MongoDb, HBase), Data Warehousing, Business Intelligence, Data Analytics & ETL concepts.
Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
Basic knowledge of Hadoop, Hive, Hbase, Map Reduce, Pig, Oozie, R, Sqoop, Flume, Zookeeper, Ambari, YARN, Tez and SAP Hana.
Strong knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
Excellent knowledge in OLTP/OLAP System Study with a focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.
Able to leverage a heavy dose of Mathematics, Applied Statistics, Advanced Analytics and Machine learning with visualization and a healthy sense of exploration.
Integrating R with Hadoop ecosystem using rhdfs, hiver, rhbase packages, SAP Hana, HDFS and R integration, PHP, Python, MySql, PostgreSQL, and MongoDB.
Making Web application in R using shiny package.
Knowledge of Scala, Spark,and Jaql.
dept in Data Quality Management to get, clean, process, and cross-verify the data in multiple sources.
Learning Machine learning course from Standford University through Coursera.
Domain Knowledge on E-Commerce, E-Learning, Travel, Health Care and Gaming.
I am an Active Team Player, Quick Learner, Planned and Committed Personality.
Involved in installing/configuring Hadoop 1.0 and its Eco system tools in CentOS6.x.
Ready to Work with the admin team in upgrading Hadoop 1.0 to 2.0 using Apache Ambari 2.0.1 and configured with HUE.
Worke up to 20 nodes, with dedicated nodes for namenode, Jobtracker, Secondary node.
Able to handle a data load up to 20 TB.
Extract data from log files into HDFS using Flume.
Develope Oozie workflow for scheduling and orchestrating the ETL process.
Extract data from SAP Hana, MsSql, MySql into HDFS using Sqoop.
Good in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.
Knowledge of R user with knowledge of statistical programming languages SAS.
Create and work on Sqoop (version 1.4.3) jobs with the incremental load to populate Hive External tables.
Develope Hive (version 0.10) scripts for end user/analyst requirements to perform ad hoc analysis.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Solve performance issues in Hive and Pig scripts with an understanding of Joins, Group,and aggregation and how does it translate to MapReduce jobs.
Ready to Use Tez execution to speed up the query execution time in Hive.
Writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
Knowledge of Kerberos authentication.
Implement and configure FAIR scheduler.
Good with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
Good in monitoring and managing cluster using Ambari through Nagios and Ganglia.
Work in Agile/SCRUM software environments.
Highly motivate team player with excellent interpersonal skills, effective communication, analytical and presentation skills.
Education:
Bachelor in Commerce.
Master in Commerce.
Master in Business Administration Specializations in International Business and General Management (M.B.A.)
Pursing master in Computer Science.
Technical Skills:
Languages
Java 8, Python, R
Machine Learning
Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbours (K-NN).
OLAP/ BI / ETL Tool
Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)
Packages
ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.
Web Technologies
JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL,
Tools
Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.
Big Data Technologies
Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.
Databases
SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.
Reporting Tools
MS Office (Word/Excel/Power Point/ Visio), Tableau,Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
ETL Tools
Informatica Power Centre, SSIS.
Version Control Tools
SVM, GitHub.
Project Execution
Methodologies
Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
BI Tools
Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse
Operating System
Windows, Linux, Unix, Macintosh HD, Red Hat.