Machine Learning Data Scientist

Location:

Round Rock, TX

Posted:

May 16, 2024

Contact this candidate

Resume:

Name: NagaKrishna Charitha Chennamsetty

Mobile: +1-737-***-****

Email: ********.*****@*****.***

LinkedIn: https://www.linkedin.com/in/nagakrishna-charitha-chennamsetty-ba05a2126/

Data Scientist

SUMMARY:

Having 7.3 years of experience as a Data Scientist with a demonstrated history of working with Machine Learning Algorithms, Statistical methods, Neural Networks, and various Predictive analytics.

Expertise in creating pipelines, data flows, and complex data transformations and manipulations AWS Storage Explorer.

Developed large language models for specific NLP tasks by using hugging Face Transformers, TensorFlow, and PyTorch.

Demonstrated proficiency in deep learning techniques, including Convolutional Neural Networks, Recurrent Neural Networks, and transformer models to solve complex AI challenges.

Designed and implemented Databricks pipelines to orchestrate data processing tasks, enhancing automation and reducing manual intervention.

Experienced in visualization filters, page level filters, and report level filters in Power BI and creating groups and content packages in Power BI and scheduling refresh using Power BI gateways

Skilled in automating the data processing and deployment process using AWS Lambda and other serverless technologies.

Worked on integrating the AWS Lambda functions with other AWS services such as S3, DynamoDB, and API Gateway.

Hands-on experience in R, Python, and using Machine Learning Algorithms like SVR, Random forests, Decision trees, gradient-boosting, Time-series, Linear and Non-linear Regression, and Clustering (K-means).

Worked on statistical techniques like Regression, classification, clustering, and Time Series.

Hands on experience in Building and evaluating machine learning models using R libraries like caret, xgboost, and tidy models.

Expertise in unsupervised machine learning algorithms like Kmeans clustering, Hierarchical clustering, and density-based clustering

Proficient in AI tool stacks including TensorFlow, PyTorch, sci-kit-learn, and Keras, utilizing their capabilities for AI model development.

Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.

Worked on different libraries related to Data Science and Machine learning like Scikit-learn, NumPy, SciPy, Matplotlib, pandas, JSON, SQL, etc.

Collaborated with cross-functional teams, including data engineers, software developers, and domain experts, to deliver end-to-end AI solutions.

TECHNICAL SKILLS

SDLC

Agile, Scrum, Waterfall, Kanban

Cloud Platforms

Amazon Web Services (AWS)

ETL/ BI Tools

Informatica, SSIS, Tableau, PowerBI

Ticketing Tools

JIRA

Data Processing Framework

Hadoop, Kafka

Operating Systems

Linux, Windows

Databases

SQL Server, MongoDB

Programming Languages

R, Python (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), java

Version Control

Git, Subversion, GitHub

Data Formats

CSV, Text, XML, JSON

EDUCATION:

Bachelor of Technology in Electrical and Electronics, India

EXPERIENCE:

Hewlett Packard’s, CA July 2023 to Till Date Data Scientist

Responsibilities:

Developed forecasting models using LSTM, regression and ensemble models which allowed clients and internal Account Management teams to scale better and auto-scale pricing.

Configured and optimized EMR clusters for performance, scalability, and cost-effectiveness, utilizing instance types, instance fleets, and auto-scaling policies based on workload requirements.

Implemented security measures within Snowflake to protect sensitive data used in machine learning models.

Adhered to data governance policies, ensuring compliance with industry regulations.

Played a vital role in development of automatic reporting system in R, which helped in reducing data analysis time by 50% and increase in report accuracy.

Built real-time data streaming pipelines on AWS using Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics to ingest, process, and analyze high-volume streaming data.

Operated Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, and Scikit-learn in Python at various stages in developing machine learning models and utilized machine learning algorithms such as linear regression, Random Forests, Decision Trees, K-means, & KNN.

Integrated Kinesis with other AWS services like S3, DynamoDB, and Lambda to enable seamless data processing, storage, and analysis in serverless architectures.

Worked on performance tuning of Spark Applications for setting the right Batch Interval time, the correct level of Parallelism, and memory tuning.

Managed the A/B testing process and user surveys to assess the real-world impact of NLP-based applications.

Involved in managing the data coming from different sources and loading of structured and unstructured data.

Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.

Identified potential non-compliant customers for software audits by analyzing millions of sales transactions

Anticipated the sales opportunities based on product correlations, dependencies, and maintenance trends.

Environment: Machine Learning Algorithms, Time Series Models, SQL Server Integration Services (SSIS), AWS, Power BI, Import and Export Data wizard, Visual Studio SQL Profiler, Big Data.

Virgin Media, India Apr 2020 – July 2023

Senior Data Scientist

Responsibilities:

Involved in implementing the dashboard projects, interactive, big data analytics, and visualization solutions.

Handled the missing values, outliers, and anomalies using tools like Python, Pandas, and AWS Data Wrangler.

Operated SQL and AWS Athena to query and manipulate the data stored in Amazon S3.

Implemented data processing tasks and batch jobs using Lambda functions, leveraging event triggers from S3, Kinesis, DynamoDB, or SQS to execute code in response to changes in data or application state.

Optimized Lambda functions for performance, cost, and resource utilization, fine-tuning memory allocation, concurrency settings, and execution time to meet application requirements and budget constraints.

Worked on Python NumPy, SciPy, and Pandas packages to perform dataset manipulation.

Built and analyzed datasets using R, MATLAB, and Python.

Used R programming language to graphically analyze the data and perform data mining.

Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, and Python.

Developed Spark Applications for various business logics using Python.

Generated the DDLs for tables as per the Field Mapping Document in the target (Snowflake) database.

Involved in refining and updating the forecasting models and methodologies based on new data and feedback.

Worked on Principal Component Analysis and Truncated to analyze the high dimensional data.

Handled large datasets using Partitions, Broadcasts in PySpark, Effective & efficient Joins, Transformations, and others during the ingestion process.

Extensively worked on open-source tools - R Studio(R) and Jupyter Notebook (Python) for regular statistical analysis.

Experienced in working with Spark ecosystem using Spark SQL and Python queries on different formats like Text files, and CSV files.

Environment: Python Jupyter Notebook (Anaconda 3), Agile, Data Quality, R Studio, Tableau, Data Governance, Supervised & Unsupervised Learning, Java, NumPy, SciPy Pandas, Shell Scripting, Spark 2.0, AWS, Glue, Redshift.

Uber, India Jan 2017 – Mar 2020

Data Scientist

Responsibilities:

Developed predictive modeling on test data and generated the classification reports to identify the fraud transactions.

Involved in monitoring ML model performance, detecting and addressing the model drift to maintain predictive accuracy and effectiveness in dynamic data environments.

Integrated Snowflake with MLOps tools for seamless model deployment and monitoring.

Updated and automated processes for predictive model validation, deployment, and productization of machine learning models for stock-out prediction.

Created various types of data visualizations using Matplotlib, and Seaborn.

Good Experience in Tableau Analytics. Build Dashboards for clients using Tableau.

Used Principal Component Analysis to analyze high dimensional data in feature engineering and also eliminated unrelated features.

Created and designed reports that will use gathered metrics to infer and draw logical conclusions about past and future behavior.

Experience in end-to-end design and deploying rich Graphic visualizations with drill-down and Drop-down menu options and Parameterized using Tableau.

Structured the complex receivables for the customers by reconciling and developing Mobile Collection, Automated Direct Debit, and integrating the reports with Online Systems and Clients.

Developed various solution-driven views and dashboards by developing different chart types including Pie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, and Scatter Plots in Power BI. Created a dashboard for Transportation system financial information which includes revenue and expenditure by routes.

Utilized R programming Statistical techniques for analysis.

Developed and deployed an application Reckoner using R Shiny.

Environment: Python 3.x (Scikit -Learn/ Scipy/ Numpy/ Pandas/ Matplotlib/ Seaborn), R (ggplot2/ caret/ trees/ rules), Tableau (9.x/10.x), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering /Hierarchical Clustering/ Ensemble methods/ Collaborative filtering

Contact this candidate