Akhil Koppera
*****.*****@*****.***
Professional Summary:
Experience as a Data Scientist with 9+ years in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive Modeling, and Data Visualization.
Extensive experience in developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R, Python, and Tableau.
Strong knowledge in all phases of the SDLC from analysis, design, development, testing, implementation, and maintenance.
Expertise in Python data extraction and data manipulation, and widely used python libraries like Pandas and Matplotlib for data analysis.
Experience in Machine Learning, Datamining with large Data Sets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive Modeling, Data Visualization, Statistical Modeling, Data Mining, and Natural Language Processing.
Proficient in managing the entire data science project life cycle and actively involved in all the phases of the project life cycle including data acquisition, data cleaning, Engineering, features scaling, features engineering, statistical modeling, dimensionality reduction using Principal Component Analysis, and Factor Analysis and Data Visualization.
Adept and deep understanding of Statistical Modelling, Multivariate Analysis, problem analysis, model comparison, and validation.
Expertise in transforming business requirements into analytical models, designing algorithms, building models, and developing Data Mining and reporting solutions that scale across a massive volume of structured and unstructured data.
Extensive experience in generating Data Visualization using Python and R creating dashboards using tools like Tableau.
Intensive hands-on Boot camp on Data Analytics course spanning from Statistics to Programming including data visualization, machine learning, and programming in SQL.
Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, and other advanced statistical techniques.
Experience in building data models using machine-learning techniques for Regression, Clustering, and Associative mining.
Proficient in data visualization tools such as Python Matplotlib to create visually powerful and actionable interactive reports and dashboards.
Good oral and written communication skills. Strong interpersonal skills to successfully build long-term relationships with colleagues and business partners.
Technical Skills:
Machine Learning: Regression, Polynomial Regression, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN)
Project Execution
Methodologies: Data warehousing methodology, Joint Application Development (JAD)
Languages: Python, R
Databases: SQL Server, Oracle, SQL
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio), Tableau, Crystal reports, Business Objects.
Professional Experience:
Dell, Austin, TX Jan 2021 – Present
Data Scientist
Responsibilities:
Forecasting Sales for products sold by DELL, provide interactive forecasts for the products as new products are launched.
Suggesting AI-based catalog to maximize the profit by reducing the complexity of several products.
Gathered, analyzed, documented, and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables.
Analyzed and Prepared data, and identify the patterns on the dataset by applying historical models.
Performed Data Cleaning, features scaling, and features engineering using pandas and NumPy packages in python.
Participated in all phases of data mining, data cleaning, data collection, developing models, validation, and visualization, and performed Gap analysis. Improved efficiency and accuracy by evaluating models in Python and R.
Performed data manipulation, data preparation, normalization, and predictive modeling.
Worked on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generating data products to support customer segmentation. Used Python and R for programming for improvement of the model.
Used seaborn, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests & KNN for data analysis.
Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering.
Designed and developed analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
Used Python and R scripting to visualize the data and implemented machine learning algorithms.
Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning. Worked on Python and R applications showcasing machine learning for improving the forecast of business.
Utilized Python in a broad variety of machine learning methods including classifications, regressions, and dimensionally reduction.
Used DEEPER, GLuonts, fbprophet, Bayesian Optimizer, Parallel programming, and linear programming. Learn, NLP for data analysis and cleaning.
Designed data profiles for processing, including running SQL, Procedural/SQL queries, and using Python and R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.
Extracted data from the database using Excel, and SQL procedures and created Python and R datasets for statistical analysis, validation, and documentation.
Worked on statistical analysis tools and was adept at writing code in Advanced Excel, R, MATLAB, and Python.
Utilized Python and R in a broad variety of machine learning methods including classifications, regressions, and dimensionality reduction.
Designed and implemented end-to-end systems for Data Analytics and data visualization tools using Tableau and Python.
Built models using Statistical techniques like Machine Learning classification models like XG Boost, SVM, and Random Forest.
Worked on cleaning the data using exploratory data analysis and python libraries by replacing the missing values using imputation techniques.
Exxon Mobil Corporation, Irving, TX May 2019 -Dec2020
Data Scientist
Responsibilities:
Developed predictive models on large-scale datasets to address various business problems by leveraging advanced statistical modeling, machine learning, and deep learning.
Identified key processes within Machine Learning which can be improved significantly using advanced analytics/data science and thereby strive for continuous improvement.
Extracted meaning from huge volumes of data to help improve decision-making and to provide business intelligence through data-driven solutions.
Performed Data Cleaning, features scaling, and features engineering using pandas and Numpy packages in Python.
Used Python on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (Factor Analysis)
Implementation of Reinforcement learning techniques in the field of Machine learning by following Dynamic programming using Python.
Developed different Machine algorithms such as Logistic Regression, SVM, Decision trees, Random Forests, and XG Boost to predict customer insight, target marketing, potential lapse customers.
Developed Excel templates using Pivot Tables and multiple functions for customer past and future product consolidations and reconciliations.
Extracted the source data from Oracle tables, MS SQL Server, sequential files, and excel sheets.
Performed Data Analysis and Data Profiling using complex SQL queries on various source systems including Oracle and Teradata.
Write complex SQL queries with sub-queries, analytical functions, pivot functions, and inline views.
Built decisions mechanism about advancing certain compounds in the larger context of optimizing drug development, based on complex data pipelines and machine learning models.
McKesson Corporation, Irving, TX Oct 2017 – Apr 2019
Data Scientist
Responsibilities:
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Performed univariate and multivariate analysis of the data to identify any underlying pattern in the data and associations between the variables.
Performed data imputation using the Scikit-learn package in Python.
Participated in features engineering such as feature intersection generating, feature normalization, and label encoding with Scikit-learn preprocessing.
Used Python 3. X (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop a variety of models and algorithms for analytic purposes.
Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA, and regularization for data analysis.
Analyzed assessing customer consuming behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
Built regression models include: Lasso, Ridge, SVR, and XGBoost to predict Customer Life Time Value.
Built classification models including Logistic Regression, SVM, Decision Tree, and Random Forest to predict Customer Churn Rate.
Used F-Score, AUC/ROC, Confusion Matrix, MAE, and RMSE to evaluate different Model performances.
Designed and implemented recommender systems that utilized Collaborative filtering techniques to recommend courses for different customers and deployed to the AWS EMR cluster.
Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
TechnipFMC, Hyderabad Jun 2013- Dec 2016
Data Analyst
Responsibilities:
Used Temporary and Transient tables on diff datasets.
Cloned Production data for code modifications and testing.
Time traveled to 56 days to recover missed data.
Combine data from multiple data sets to provide a comprehensive picture and analysis of client usage and trends.
Involved in loading data from the edge node to HDFS using shell scripting.
Responsible for all backup, recovery, and upgrading of all of the PostgreSQL databases. Monitoring databases to optimize database performance and diagnosing any issues.
Extensive experience with Warm Standby (PostgreSQL 8.x and earlier), and Hot Standby (PostgreSQL 9.x and greater).
Setup and maintenance of Postgres master-slave clusters utilizing streaming replication.
Installing and Configuring PostgreSQL from source or packages on Linux machines. Experience designing database structures, indexes, views, and partitioning.
Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
Developed NLP models for Topic Extraction, Sentiment Analysis.
Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
Work with NLTK library to NLP data processing and finding the patterns.
Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python.
Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas)
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naive Bayes.