Rudra T
*********@*****.***
Summary
Highly efficient Data Scientist with around 7 years of experience in, Statistical Modeling, Machine Learning, Data Mining with Large Data Sets of Structured and Unstructured Data and Performed Data Acquisition, Data Validation, Predictive Modeling and Data Visualization.
Solid knowledge and experience in Deep Learning techniques including Feedforward Neural Network, Convolutional Neural Network (CNN), Recursive Neural Network (RNN)
Implemented and analyzed RNN based approaches for automatically predicting implicit relations in text. The disclosure relation has potential applications in NLP tasks like Text Parsing, Text Analytics, Text Summarization, Conversational systems.
Actively contributed in all phases of the project life cycle including Data Acquisition (Web Scraping), Data Cleaning, Data Engineering (Dimensionality Reduction (PCA & LDA), normalization, weight of evidence, information value), Feature Selection, Features Scaling & Features Engineering, Statistical Modeling (decision trees, regression models, neural networks, SVM, clustering), Testing and Validation (ROC plot, k-fold cross validation) and Data Visualization.
Worked with various text analytics or Word Embedding libraries like Word2Vec, Count Vectorizer, GloVe, LDA etc.
Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts.
Worked on several python packages like NumPy, Pandas, Matplotlib, SciPy, Seaborn and Scikit-learn.
Proficient with high-level Logical Data Models, Data Mapping and Data Analysis.
Analysis of data visualization using tools like R, Microsoft Power BI, Tableau, Azure Machine Learning (AML).
Experience in using cloud services AWS including EC2, S3, AWS Lambda and EMR.
Great expertise with Hadoop, Spark & with Data tools such as PySpark, Pig, Hive and flume etc.
Strong experience employed with SQL Server 2008, RStudio, MATLAB, Oracle10i/11g/12c, Sybase.
Experience working with statistical and regression analysis, multi-objective optimization.
Good knowledge on Performance metrics to evaluate Algorithm's performance.
Worked with clients to identify analytical needs and documented them for further use.
Hands on advanced SQL experience summarizing, transforming, segmenting, joining datasets
Proficient at building and publishing interactive reports and dashboard with design customizations based on the stakeholders' needs in Tableau.
Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, Dbscan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest.
Worked on Gradient Boosting decision trees with XGBoost to improve performance and accuracy in solving problems. Also worked with several boosting methodologies like ADA Boost, Gradient Boosting and XGBoost.
Worked and extracted data from various database sources like Oracle, SQL Server, DB2, MongoDB and Teradata.
Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
Highly skilled in using Hadoop, HBase, Spark, and Hive for basic analysis and extraction of data in the infrastructure to provide data summarization.
Skills
Languages
R, SQL, Python, Shell scripting, Java, Scala, C++.
IDE
R Studio, Jupyter Notebook, zeppelin, Eclipse, NetBeans, Atom.
Databases
Oracle 11g, SQL Server, MS Access, MySQL, MongoDB,
Cassandra PL/SQL, T-SQL, ETL.
Big Data
Ecosystems
Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Impala, Spark
MLLib. PySpark.
Operating
Systems
Windows XP/7/8/10, Ubuntu, Unix, Linux
Packages
ggplot2, caret, dplyr, RWeka, gmodels, RCurl, tm, C50,
Wordcloud, Kernlab, Neuralnet, twitter, NLP, Reshape2, rjson,
plyr, pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn,
Beautiful Soup, Rpy2, Tensorflow, Pytorch, CNN, RNN
Web
Technologies
HTML, CSS, PHP, JavaScript
Data Analytics
Tools
R console, Python (NumPy, pandas, SciKit-learn, SciPy), SPSS
BI and
Visualization
Tableau, SSAS, SSRS, Informatica, QuickView
Version Controls
GIT, SVN
DXC Technology, Princeton, NJ June 2019 – Present
Data Scientist
DXC Technology is an American multinational corporation that provides B2B IT services. Bank project to build an algorithm that accurately classifies credit card holders among multiple classes based on the historical data available on multiple variables. Further, the aim was to improve bank's efficiency by reducing default rate while offering new products.
Experienced machine learning and statistical modeling techniques to develop and evaluate algorithms to improve performance, quality, data management and accuracy
Experienced in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Neural Networks, Principle Component Analysis and good knowledge on Recommender Systems.
Performed time series analysis using ARIMA model & Tableau
Proficient with a deep learning framework such as TensorFlow or Keras and libraries like Scikit-learn
Developed different machine learning algorithms that include logistic regression, K-means clustering, support vector machines, and more using pandas, NumPy, Scikit-learn, NLTK, Seaborn, Matplotlib libraries.
Implementation experiences in Machine Learning and deep learning, including Regression, Classification, Neural network, object tracking, Natural Language Processing (NLP) using packages like Tensor Flow, Keras, NLTK, Spacy.
Performed machine learning algorithms in R, Java and Python with different data formats JSON, XML
Understanding of data structures, data modeling and software architecture
Deep knowledge of math, probability, statistics and algorithms
Experience on analyzing the ML algorithms that could be used to solve a given problem and ranking them by their success probability
Closely worked in the team in creating efficient design and developing user interaction screens using Angular Js.
Utilized SparkSQL to extract and process data by parsing using Datasets or RDDs in HiveContext, with transformations and actions (map, flatMap, filter, reduce, reduceByKey).
Developed Spark MLlib functionality does not present in SparkML by converting DataFrames to RDDs and applying RDD transformations and actions.
Environment: Python 3.3, AngularJs, SciPy, Pandas, Scikit-learn, matplotlib, R Studio, SVN, SQL, Tableau, Oracle, Java, Github, SQLAlchemy.
CNET Global Solutions Inc, Dallas,TX August 2018 – May 2019
Data Analyst/Data Scientist
CNET Global Solutions enables its clients to outperform their peers by optimizing existing operations and achieving a faster time to market. This project is designed to manage data gathering and integrate large volumes of data, perform analysis, interpret results, and develop actionable insights and recommendations for use across the company.
Developed statistical modeling, visualization, machine learning, and data mining. Developed briefs for outside vendors and manage relationships.
Worked as the lead data strategist, identifying and integrating new datasets that can be leveraged through our product capabilities and work closely with the engineering team to strategize and execute the development of data products
Execute analytical experiments methodically to help solve various problems and make a true impact across various domains and industries
Use predictive models to improve customer experience, ad targeting, revenue generation, and more
Worked in distributed data and computing tools, including, MapReduce, MySQL, Hadoop, Spark, Hive, impala etc.
Worked in statistics and data mining techniques, including, Random Forest, GLM/regression, social network analysis, text mining, etc.
Obtained better predictive performance of 81% accuracy using ensemble methods like Bootstrap aggregation (Bagging) and Boosting (Light GBM, Gradient).
Used F-Score, Precision, recall evaluating model performance.
Applied concepts of probability, distribution, and statistical inference on the given dataset to unearth interesting findings using comparison, T-test, F-test, R-squared, P-value etc.
Good knowledge on Hadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database in SQL
Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data.
Environment: Python, SciPy, Pandas, SQL, SciKit-learn, matplotlib, NumPy, R Studio, Tableau, Teradata, Oracle, SQL Server 2005
One Technologies, LLC, Hyderabad, India August 2017 to July 2018
Data Analyst
One Technologies is the one of the leading credit score analytic company. Worked on score sense project to leverage Statistical learning & Machine learning algorithms to automate Alternate Asset Servicing. The automation helped FM to reduce errors and improve operational efficiency. Reduced model error by 20%. Further, developed BI reports that provided predictive analytics & Reporting with dashboards on analyst performance, future workload & anomalies based on data collected from discrete products.
•Defined Project Scope, project Charter & Business Case
•Prototype machine learning algorithm for POC (Proof of Concept)
•Performed Data Cleaning, features scaling, features engineering,
•Developed predictive models for use in machine learning platform using the SciKit-learn python framework
•Improved statistical models using learning curves, parameter curves, feature selection, and regularization.
•Performed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop cluster
•Developed Map Reduce pipeline for feature extraction
•Implemented Support Vector Machine (lite)
•Performed Principal Component Analysis (PCA) & Linear Discriminate Analysis (LDA)
•Defined the technical requirements of the analytic solutions.
•Defined the data requirements of the analytic solution.
•Worked on commercial data from desperate source systems, built data models and transformed data to provide added value in IT applications by streamlining processes, reducing cost, maximizing profits & rolling out business solutions that met one of the objectives
•Worked with AngularJS in implementing Single page applications (SPA) using Directives, Modules, Expressions, Angular JS Routing, Controller and Components.
•Made iterative changes to analytic/predictive models and decision logic embedded in operational applications and business process platforms
•Worked with multiple relational, dimensional, and OLAP databases
Environment: MS SQL, Oracle, HADOOP (HDFS), PIG, MySQL, RStudio, Python, AngularJs SciPy, Pandas, NumPy, Matplotlib, JAVA, SQLAlchemy.
Symbiosis technology, Hyderabad, India January 2017 to July 2017
Python Developer/ Data Analyst
Genius Brands International is our client and project application is about a Web-based laboratory information management (LIMS) system that lets users to manage the sample lifecycle, optimize laboratory executions, perform data retrievals, interface instruments and systems, and enable security and auditing aspects.
The work will involve the development of workflows triggered by events from other systems.
Develop easy to use documentation for the frameworks and tools developed for adaption by other teams.
Developed Hive UDFs and Pig UDFs using Python in Microsoft HDInsight environment.
Implemented end-to-end systems for Data Analytics, Data Automation and customized visualization tools using Python, R, Hadoop and MongoDB.
Used Pandas, NumPy, seaborn, SciPy, matplotlib, SciKit-learn in Python for developing various machine learning algorithms.
Performed data profiling to merge the data from multiple data sources.
Worked on csv, json, excel different types of files for the data cleaning and data analysis.
Used Python for statistical operations on the data and ggplot2 for the visualizing the data.
Worked with several use cases like campaign sales analysis, forecasting sales, KPI analysis.
Managed offshore projects and coordinated work for 24-hour productivity cycle
Designed and developed a horizontally scalable APIs using Python Flask.
Experience in developing entire frontend and backend modules using Python on Django and Flask Web Frameworks.
Worked on development of SQL and stored procedures on MYSQL, SQLAlchemy.
Environment: Python, JavaScript, Django Framework 1.3, Flask, HTML, CSS, SQL, MySQL, LAMP, JQuery, Apache web server, SQLAlchemy.
Imazzle Advertising Pvt. Ltd Hyderabad, India September 2015 – December 2016
Python Developer
This is an Advertising Company project, and main task is to make sure all the operations are performed automatically such as maintaining Customer Information, Generation of Bills & Reports and so on. This project includes modules such as Admin, Customer, Transaction, Bills and Reports.
Used OOPs concepts in overall design and development of web/system applications
Experienced working with a team of developers on Python applications for prioritizing tasks and for RISK management
Programmed utilities in Python that uses packages like SciPy, NumPy, Pandas.
Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS and JavaScript.
Experience in developing entire frontend and backend modules using Python on Django and Flask Web Frameworks.
Most of the client-side validation is done using JavaScript.
Designed and developed data management system using MySQL. Built application logic using Python 2.7.
Used Django APIs for database access and worked on databases like MySQL, Postgres
Used Python to extract weekly hotel availability information from XML files.
Participated in requirement gathering and worked closely with the team in designing and modelling.
Worked on development of SQL and stored procedures on MYSQL, SQLAlchemy.
Developed shopping cart for Library and integrated SOAP web services to access the payment.
Experience in writing application level code to interact with APIs, Web Services using JSON.
Designed and developed a horizontally scalable APIs using Python Flask.
Involved in Agile Methodologies.
Involved in Disaster Recovery Exercises
Environment: Python 2.6/2.7, JavaScript, Django Framework 1.3, Flask, HTML, CSS, SQL, MySQL, SOAP, LAMP, JQuery, Apache web server, Shell scripting, SQLAlchemy.
Sutherland Global Services, Hyderabad December 2013 – August 2015
SQL Developer
Sutherland builds processes for the digital age by combining the speed and insight of design thinking with the scale and accuracy of data analytics. Sutherland has customers across industries like financial services to Healthcare. My role is to assist Analytics department for the data extraction and cleaning as a data preprocessing steps to build models.
Involved with Business Analysts team in requirements gathering and in preparing functional specifications and changing them into technical specifications.
Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
Managed full SDLC processes involving requirements management, workflow analysis, source data analysis, data mapping, metadata management, data quality, testing strategy and maintenance of the model.
Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Designed SSIS packages to extract, transform and load existing data into SQL Server, used lots of components of SSIS, such as Pivot Transformation, Fuzzy Lookup, Merge, Merge Join, Data Conversion, Row Count, Sort, Derived Columns, Conditional Split, Execute SQL Task, Data Flow Task and Execute Package Task.
Created SSIS Packages that involved dealing with different source formats (flat files, Excel, XML, OLE DB) and different destination formats
Debugged and troubleshot the ETL packages by using a breakpoint, analyzing the process, catching error information by SQL command in SSIS
Developed SQL queries in SQL Server management studio, Toad and generated complex reports forth end users.
Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD, BTEQ, and Fast Load
Experience with Perl.
Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.
Environment: SQL Developer, SQL Navigator, Informatica Power Center 9.x., Oracle 11g/12c, SQL, PL/SQL, MySQL Workbench, Oracle Hints, UNIX, SQLAlchemy.
Education
Bachelor’s in Electronics and Communications Engineering, Bhagwant University
Master’s in Computer Science Engineering, Texas A&M University