Data Scientist Machine Learning

Location:

Arlington, TX

Posted:

November 12, 2024

Contact this candidate

Resume:

VARDHAN

Email: ***********@*****.*** PH: 682-***-****

LinkedIn: https://www.linkedin.com/in/thatikondavardhan

Data Scientist

PROFESSIONAL SUMMARY:

• A Data Scientist professional with 9+ Years of progressive experience in Data Analytics, Statistical Modeling, Visualization and Machine Learning. Excellent capability in collaboration, quick learning and adaptation.

• Experience in Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.

• Experience in integrating data, profiling, validating and data cleansing transformation and data visualization using R and Python.

• Theoretical foundations and practical hands - on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering, dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, confidence intervals, A/B testing, (iv) algorithms and data structures.

• Extensive knowledge on Azure Data Lake and Azure Storage.

• Experience in migration from heterogeneous sources including Oracle to MS SQL Server.

• Hands on experience in design, management and visualization of databases using Oracle, MySQL and SQL Server.

• In depth knowledge and hands on experience of Big Data / Hadoop ecosystem (MapReduce, HDFS, Hive, Pig and Sqoop).

• Experience in Apache Spark, Kafka for Big Data Processing & Scala Functional programming.

• Experience in manipulating the large data sets with R packages like tidyr, tidy verse, dplyr reshape, lubridate, Caret and visualizing the data using lattice and ggplot2 packages.

• Experience in dimensionality reduction using techniques like PCA and LDA.

• Intensive hands-on Boot camp on Data Analytics course spanning from Statistics to Programming including data engineering, data visualization, machine learning and programming in R, SQL.

• Experience in data analytics, predictive analysis like Classification, Regression, Recommender Systems.

• Good Exposure with Factor Analysis, Bagging and Boosting algorithms.

• Experience in Descriptive Analysis Problems like Frequent Pattern Mining, Clustering, Outlier Detection.

• Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Nave Bayes Model, Logistic Regression, SVM Model and Latent Factor Model.

• Hands-on experience on Python and libraries like Numpy, Pandas, Matplotlib, Seaborn, NLTK, Sci-Kit learn, SciPy.

• Expertise and knowledge in TensorFlow to do machine learning/deep learning package in python.

• Good knowledge on Microsoft Azure SQL, Machine Learning and HDInsight.

• Skilled in text analysis, sentiment analysis, and entity recognition using NLP techniques.

• Good Exposure in deep learning with Tensor flow in python.

• Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.

• Good knowledge in Tableau, Power BI for interactive data visualizations.

• In-depth Understanding in NoSQL databases like MongoDB, HBase.

• Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

• Extensive experience working with large language models (LLMs) such as BERT, GPT, and their applications in real-world scenarios.

• Good exposure in creating pivot tables and charts in Excel.

• Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports and distributed reports in multiple formats using SQL Server Reporting Services (SSRS) .

• Excellent Database administration (DBA) skills including user authorizations, Database creation, Tables, indexes and backup creation.

TECHNICAL SKILLS:

• Languages: Java 8, Python, R

• NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Apache OpenNLP, Stanford NLP, Sentiment Analysis, SVMs, ANN, RNN, CNN, TensorFlow, MXNet, Caffe, H2O, Keras, PyTorch, Theano, Azure ML

• Cloud: Google Cloud Platform, AWS, Azure, Bluemix

• Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

• Data Modelling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer

• Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

• Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

• Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

• ETL Tools: Informatica Power Centre, SSIS.

• Version Control Tools: SVM, GitHub

• BI Tools: Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

• Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat

PROFESSIONAL EXPERIENCE:

Data Scientist

Emc Insurance, Iowa May 2023 to Present

Responsibilities:

• Perform Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.

• Extracted the data from hive tables by writing efficient Hive queries.

• Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.

• Analyze Data and Performed Data Preparation by applying historical model on the data set in AZUREML.

• Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, MATLAB.

• Exploring DAG’s, their dependencies and logs using AirFlow pipelines for automation.

• Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.

• Designed and implemented AI-driven chatbots to enhance user interaction and automate customer service processes.

• Develop Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabelled data.

• Managed and optimized data warehousing solutions using Snowflake, improving query performance by [percentage] through strategic design and configuration.

• Work with NLTK library to NLP data processing and finding the patterns.

• Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.

• Analyze traffic patterns by calculating autocorrelation with different time lags.

• Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.

• Addressed over fitting by implementing of the algorithm regularization methods like L2 and L1.

• Use Principal Component Analysis in feature engineering to analyze high dimensional data.

• Create and design reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.

• Integrated deep learning predictions with inventory management systems, reducing stockouts by 20% and minimizing excess inventory by 15%.

• Demonstrated proficiency in deep learning frameworks, including TensorFlow, PyTorch, Caffe, and Theano, to build and fine-tune forecasting models.

• Applied advanced machine learning algorithms to optimize supply chain operations and improve overall efficiency.

• Perform Multinomial Logistic Regression, Random Forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.

• Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.

• Perform data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.

• Use MLlib, Spark’s Machine learning library to build and evaluate different models.

• Perform Data Cleaning, features scaling, features engineering using pandas and Numpy packages in python.

• Develop MapReduce pipeline for feature extraction using Hive and Pig.

• Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau.

• Communicate the results with operations team for taking best decisions.

• Collect data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, R, HDFS, Hadoop 2.3, Hive, Linux, Spark, IBM SPSS, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, Pyspark.

Data Scientist

Cummins Columbus, Indiana October 2021 to April 2023

Responsibilities:

• Implemented Data Exploration to analyze patterns and to select features using Python SciPy.

• Built Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different target groups.

• Built predictive models including Support Vector Machine, Random Forests and Bayes Classifier using Python Scikit-Learn to predict the personalized product choice for each client.

• Using R’s dplyr and ggplot2 packages, performed an extensive graphical visualization of overall data, including customized graphical representation of revenue reports, specific item sales statistics and visualization.

• Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation to verify the models significance.

• Designed an A/B experiment for testing the business performance of the new recommendation system.

• Supported MapReduce Programs running on the cluster.

• Proficient in fine-tuning and deploying BERT, GPT, and similar models for specific tasks like chatbots and language understanding

• Configured Hadoop cluster with Name node and slaves and formatted HDFS.

• Used Oozie workflow engine to run multiple Hive and Pig jobs.

• Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.

• Developed scalable data pipelines with Apache Spark to handle large volumes of data from multiple sources.

• Leveraged Apache Spark’s MLlib for building and deploying machine learning models, increasing predictive accuracy by 25%.

• Created and managed Spark Streaming applications to process real-time data feeds, improving data analysis timeliness.

• Integrated Apache Spark with Hadoop ecosystems to enhance big data processing capabilities.

• Performed Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.

• Developed multiple MapReduce jobs in java for data cleaning and pre-processing.

• Analyzed the partitioned and bucketed data and compute various metrics for reporting.

• Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.

• Worked on loading the data from MySQL to HBase where necessary using Sqoop.

• Developed Hive queries for Analysis across different banners.

• Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.

• Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and configuring launched instances with respect to specific applications.

• Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.

• Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.

• Created HBase tables to store various data formats of data coming from different portfolios.

• Worked on improving performance of existing Pig and Hive Queries.

• Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.

• Developed and implemented a deep learning model for SKU-level demand forecasting, improving forecasting accuracy by 25%.

• Established data governance practices in Snowflake, ensuring data accuracy, consistency, and compliance with industry standards.

• Employed TensorFlow and PyTorch to design and deploy deep learning models, leveraging convolutional neural networks (CNNs) and long short-term memory (LSTM) networks.

• Utilize SQL, Excel and several Marketing/Web Analytics tools (Google Analytics, AdWords) in order to complete business & marketing analysis and assessment.

• Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.

• Used Agile methodology and SCRUM process for project developing.

Environment: HDFS, Hive, Scoop, Pig, Oozie, Amazon Web Services (AWS), Python 3.x (SciPy, Scikit-Learn), Tableau 9.x, D3.js, SVM, Random Forests, Bayes Classifier, A/B experiment, Git 2.x, Agile/SCRUM.

Data Analyst/Data Scientist

Chevron Corporation, Santa Rosa, NM November 2018 to September 2021

Responsibilities:

• Study and understanding of the business and its functionalities by communication with Business Analysts.

• Analyzed the existing database for performance and suggested methods to redesign the model for improving the performance of the system.

• Supported ad-hoc, standard reporting and production projects.

• Designed and implemented many standard processes that are maintained and run on a scheduled basis.

• Created reports using MS Access and Excel. Applying filters to retrieve best results.

• Developed the Stored Procedures, SQL Joins, SQL queries for data retrieval, accessed for analysis and exported the data into CSV, Excel files.

• Developed Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.

• Implemented data transformation workflows using Apache Spark, enhancing data processing efficiency by 30%.

• Optimized existing Apache Spark jobs, reducing execution time by 40% through fine-tuning configuration settings.

• Implemented robust security measures, including role-based access controls and encryption, to ensure data compliance and integrity within Snowflake.

• Documented functional requirements and supplementary requirements in Quality Center.

• Setting up of environments to be used for testing and the range of functionalities to be tested as per technical specifications.

• Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

• Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.

• Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.

• Troubleshoot test scripts, SQL queries, ETL jobs data warehouse/data mart/data store models.

• Responsible for different Data mapping activities from Source systems to Teradata.

• Developed SQL scripts, stored procedures, and views for data processing, maintenance etc., and other database operations.

• Performed the SQL Tuning and optimized the database and created the technical documents.

• Imported the Excel Sheet, CSV, Delimited Data, advanced excel features, ODBC compliance data sources into Oracle database for data extractions, data processing, and business needs.

• Designed and optimized the SQL queries, pass through query, make table query, joins in MS-Access 2003 and exported the data into Oracle database server.

• Collaborated with cross-functional teams to integrate NLP solutions into existing systems, streamlining workflows and enhancing data-driven decision-making.

• Conducted research on the latest advancements in AI/ML and presented findings to stakeholders to inform strategic initiatives

• Analyzed data feed requirements for Risk Management, Customer Information Management, and Analytic Support.

• Familiar with data and content migration using SAS migration utility for products that rely on metadata.

• Developed CSV files and reported offshore progress to management with the use of Excel Templates, Excel macros, Pivot tables and functions.

• Improved accuracy and relevance of credit card clients planning process reports and budgets reports for make high-level decisions.

• Manage all UAT deliverables to completion with overlapping releases.

Environment: SAS Enterprise Guide 4.0, OLAP Cube studio, Stored Processes, SAS Management Console, Informatica 8.1, MS Excel, MS PowerPoint, MS Visio, MS Project Management, Teradata SQL Assistant, Enterprise Miner, SAS DI Studio, MS Access, MS Excel. SQL, SPSS, SQL, VBA, PL/SQL, Shell Scripting, Oracle, Oracle 10g.

Data Analyst/Data Scientist

Ceequence Technologies Hyderabad, India March 2016 to August 2018

Responsibilities:

• Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional work flow of information from source systems to destination systems.

• A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, Unix Commands, NoSQL, Hadoop.

• Used pandas, Numpy, seaborn, sciPy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.

• Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

• Analyzed sentimental data and detecting trend in customer usage and other services.

• Analyzed and Prepared data, identify the patterns on dataset by applying historical models.

• Collaborated with Senior Data Scientists for understanding of data.

• Used Python and R scripting by implementing machine algorithms to predict the data and forecast the data for better results.

• Designed and executed complex ETL processes using Apache Spark, improving data integration and quality.

• Employed Apache Spark’s GraphX for performing large-scale graph processing and analytics.

• Utilized Spark SQL for querying structured data, optimizing query performance through strategic indexing and partitioning.

• Deployed Apache Spark on cloud platforms (AWS, Azure) for cost-effective and scalable data processing solutions.

• Used Python and R scripting to visualize the data and implemented machine learning algorithms.

• Experience in developing packages in R with a shiny interface.

• Used predictive analysis to create models of customer behavior that are correlated positively with historical data and use these models to forecast future results.

• Predicted user preference based on segmentation using General Additive Models, combined with feature clustering, to understand non-linear patterns between user segmentation and related monthly platform usage features (time series data).

• Perform data manipulation, data preparation, normalization, and predictive modeling.

• Improve efficiency and accuracy by evaluating model in Python and R.

• Used Python and R script for improvement of model.

• Application of various machine learning algorithms and statistical modeling like Decision Trees, Random Forest, Regression Models, neural networks, SVM, clustering to identify Volume using scikit-learn package.

• Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values.

• Developed a predictive model and validate Neural Network Classification model for predict the feature label.

• Performed Boosting method on predicted model for the improve efficiency of the model.

• Presented Dashboards to Higher Management for more Insights using Power BI and Tableau.

• Hands on experience in using HIVE, Hadoop, HDFS and Bigdata related topics.

Environment: R/R studio, Python, Tableau, Hadoop, Hive, MS SQL Server, MS Access, MS Excel, Outlook, Power BI.

Data Analyst/Data Scientist

Cybage Software Private Limited Hyd India October 2014 to February 2016

Responsibilities:

• Developing Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management (MDM) Architecture involving OLTP, ODS and OLAP.

• Providing source to target mappings to the ETL team to perform initial, full, and incremental loads into the target data mart.

• Conducting JAD sessions, writing meeting minutes, collecting requirements from business users and analyze based on the requirements.

• Involved in defining the source to target data mappings, business rules, and data definitions.

• Transformation on the files received from clients and consumed by SQL Server.

• Working closely with the ETL, SSIS, SSRS Developers to explain the complex Data Transformation using Logic.

• Worked on DTS Packages, DTS Import/Export for transferring data between SQL Server.

• Monitored and debugged Apache Spark applications to resolve performance bottlenecks and ensure system reliability.

• Implemented data partitioning and bucketing strategies in Apache Spark to optimize data processing and storage.

• Developed custom Apache Spark applications to address specific business requirements and improve operational efficiency.

• Automated data ingestion and processing tasks with Apache Spark, reducing manual intervention and errors.

• Collaborated with data scientists to integrate Apache Spark into their workflows, enhancing data processing capabilities.

• Performing Data Profiling, Cleansing, Integration and extraction tools

• Defining the list codes and code conversions between the source systems and the data mart using Reference Data Management (RDM).

• Applying data cleansing/data scrubbing techniques to ensure consistency amongst data sets.

• Extensively using ETL methodology for supporting data extraction, transformations and loading processing, in a complex EDW.

Environment: MS Excel, Agile, Oracle 11g, SQL Server, SOA, SSIS, SSRS, ETL, UNIX, T-SQL, HP Quality Center 11, RDM (Reference Data Management).

EDUCATION DETAILS:

• SAVEETHA ENGINEERING COLLEGE

Chennai, India Bachelor of Science in Computer Science August 2009 - April 2013

Earned a 7.62/10.0 GPA and completed relevant courses in Data Analytics and Machine Learning.

Contact this candidate