Data Scientist Machine Learning

Location:

Dallas, TX

Posted:

May 01, 2025

Contact this candidate

Resume:

NARENDRA REDDY GADE

+1-214-***-**** Dallas, TX

**************@*****.*** linkedin.com/in/narendrareddyg PROFESSIONAL SUMMARY

Experienced Generative AI Engineer and Data Scientist with 8+ years of expertise in building and deploying AI solutions. Skilled in Python, PyTorch, and deep learning frameworks, with a strong background in machine learning and data science. Currently focused on Generative AI projects, developing and fine-tuning large language models (LLMs) to solve complex problems and drive business impact. Proven ability to deliver scalable, production-ready solutions and extract actionable insights from data.

WORK EXPERIENCE

Generative AI Engineer Apr 2024 – Apr 2025

GlobalLogic (Hitachi Group) Bangalore, IN

Project Description: Building a GenAI solution which will generate the business documents for incidents happening in facility centers. These documents should provide details about the incidents.

• Designed and developed an end-to-end document ingestion pipeline using Azure Blob Trigger Functions and EventHub, enabling real-time processing and indexing of document data in Azure AI Search.

• Built a microservice to extract textual and tabular data from PDFs, generate embeddings using models like text- embedding-ada-002 and Hugging Face variants, and insert vectors into Azure AI Search for retrieval.

• Implemented a RAG-based architecture leveraging Azure OpenAI and prompt engineering to generate context- aware responses using indexed data from PDF documents and Azure AI Search.

• Conducted PoCs to evaluate chunking techniques (sentence-level, paragraph-level) and experimented with Azure AI Document Intelligence to improve content segmentation for LLM input.

• Built a Retrieval-Augmented Generation (RAG) system using Azure OpenAI and Azure AI Search to generate contextual answers from document data, improving response accuracy by 30% in internal testing.

• Achieved over 90% accuracy in tabular data extraction from PDFs using PDFPlumber, outperforming other parsing tools in structured data recovery tests.

• Developed a GenAI-based document generation solution to create automated business reports from incident data using historical context and OpenAI LLMs.

• Wrote and refined prompts to extract accurate, formatted responses from LLMs for use in document creation workflows.

• Implemented an Azure Timer Trigger Function to handle retries for failed records, improving reliability and reducing manual intervention.

• Logged processing statuses and ingestion events in PostgreSQL, supporting traceability, monitoring, and auditing.

• Worked closely with Japanese clients, Product Owners, and Architects to align technical solutions with business use cases and functional requirements.

Generative AI Engineer Jul 2023 – Mar 2024

GlobalLogic (Hitachi Group) Bangalore, IN

Project Description: The aim of the project is to build a GenAI chatbot, which will act as virtual assistant to help the customers for their questions regarding services and product instructions and thus improving Civic Engagement

• Designed and developed a GenAI-based virtual assistant using Amazon Lex, enabling users to self-serve for civic service and product-related queries, leading to a 40% improvement in user query resolution during pilot testing.

• Created a searchable knowledge base by indexing customer service documents into AWS Kendra, reducing average content retrieval time by 60% compared to traditional static FAQ systems.

• Built a Retrieval-Augmented Generation (RAG) pipeline that preprocessed user inputs, retrieved relevant passages from Kendra, and dynamically invoked LLMs to generate accurate, context-aware responses.

• Engineered and fine-tuned prompts to boost LLM response accuracy by 35%, ensuring high clarity and domain relevance in chatbot answers.

• Conducted end-to-end performance and UX testing, achieving 95% response consistency across varied user intents and improving chatbot reliability for production readiness.

• Delivered multiple live demos to stakeholders, incorporating feedback that led to a 20% improvement in user satisfaction scores during user testing.

• Mentored junior team members on RAG workflows, AWS AI tools, and prompt engineering best practices, contributing to faster onboarding and skill development across the team. Machine Learning Engineer Jun 2022 – Jun 2023

Persistent Systems Bangalore, IN

Project Description: The aim of the project is to build an ML application for ServiceModeranization in the retail domain. Based on customer information and corresponding transaction data, ML Models predict the customer affinities and attribute scores. Based on scores, Application prepares the shopping cart with replenishment, alternate, private and promotional products as recommendations.

• Developed an ML-powered retail recommendation application that predicts customer affinities and attribute scores using transaction and demographic data, driving personalized cart recommendations for replenishment, alternate, private, and promotional products.

• Performed data preprocessing and EDA using NumPy, Pandas, PySpark, and Azure Databricks, ensuring high- quality, structured datasets for effective model training.

• Built and orchestrated end-to-end data pipelines using Azure Data Factory, automating data ingestion, transformation, and model execution workflows.

• Designed and trained regression models to calculate affinity scores with high accuracy, which directly informed dynamic product recommendations in the application.

• Developed an automation framework for scenario-based functional testing, validating the recommendation engine’s precision and response across multiple use cases.

• Integrated development work with Azure DevOps pipelines, facilitating version control, CI/CD, and cross-team collaboration.

• Contributed to service modernization initiatives, improving product relevance and boosting cross sell and upsell potential within the customer shopping journey.

Data Scientist Nov 2021– May 2022

Persistent Systems Bangalore, IN

Project Description: The aim is to build a Generalized ML Model using different client’s data, without collecting data centrally. Using Federated Learning instead of sharing all client's data over central repository, Models will get train locally and model parameters are aggregated using as a server.

• Designed and implemented a Federated Learning framework to detect cyberattacks using decentralized client data, improving data privacy while enabling robust ML model training without centralizing sensitive datasets.

• Trained models locally on diverse client data (e.g., CICIDS, NCC, MAWI), aggregating model parameters on a central server to form a generalized intrusion detection model with 15–20% improved generalizability across varied environments.

• Performed data cleaning and transformation using NumPy and Pandas, standardizing features and managing inconsistencies across datasets with different formats and attribute types.

• Investigated feature importance for multiple attack types (e.g., DoS, port scanning, brute force), optimizing model inputs for higher detection precision.

• Addressed challenges related to data imbalance, differing data scales, and varying sample sizes across clients, ensuring federated model convergence and fairness.

• Built and compared centralized, ensemble, and federated machine learning models (Random Forest, Logistic Regression, Gradient Boosting), achieving 10–18% performance gain in F1-score with the federated approach over centralized models in cross-client validation.

• Evaluated models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC, validating consistency and robustness across datasets.

• Proposed novel optimization techniques to improve parameter aggregation in federated learning; authored a research paper detailing the approach and experimental results. Machine Learning Engineer Jan 2019 – Oct 2021

Tata Consultancy Services Chennai, IN

Project Description: The aim of the project is to predict possible customer churn based on existing customer’s data and churned customer data.

• Developed a machine learning solution to predict customer churn in the insurance sector, enabling proactive retention strategies and improving customer lifetime value.

• Analyzed policyholder data including demographics, claims history, premiums, and interactions to engineer features indicative of churn behavior, improving model signal quality.

• Performed data preprocessing, imputation, and encoding using Pandas, NumPy, and Scikit-learn, ensuring clean and structured input for ML pipelines.

• Built and evaluated multiple classification models (Logistic Regression, Random Forest, XGBoost), selecting the best-performing model with AUC of 0.89 and churn prediction accuracy of 87%.

• Utilized SHAP and feature importance plots to identify key churn drivers (e.g., claim frequency, premium changes), providing explainability to business stakeholders.

• Implemented hyperparameter tuning using GridSearchCV, improving model performance by ~12% over baseline.

• Integrated the final model into a dashboard using Streamlit to visualize churn risk scores, segment customers, and support retention campaign targeting.

• Enabled the business team to identify high-risk customers with 80%+ confidence, contributing to an estimated 15– 20% reduction in churn during the pilot phase.

Mainframe Developer Sep 2016 – Nov 2018

Tata Consultancy Services Chennai, IN

Project Description: The project was focused on the development of an application that can sync the real-time securities data (FID1) with Master Security Descriptions (MSD).

• Contributed to the development of a real-time synchronization system to align Master Security Descriptions (MSD) with incoming FID1 securities data, ensuring up-to-date and accurate financial records across platforms.

• Analyzed and mapped the impact of critical fields across modules and applications, helping identify and isolate sensitive updates to avoid cascading data inconsistencies.

• Designed and implemented logic to block unauthorized or high-risk updates to critical security fields in the legacy system, enhancing data integrity and compliance.

• Developed a new system to automatically fetch and sync data from the FID1 repository, performing real-time updates to master records for thousands of securities.

• Conducted module-wise and application-level impact analysis to ensure system stability and seamless integration of the new synchronization logic.

• Performed unit testing and validation on core modules, achieving a 95%+ pass rate and ensuring production- EDUCATION

Bachelors of Technology, JNTU Hyderabad 2012 - 2016 Specialization: Computer Science Engineering.

TECHNICAL SKILLS

Python, Artificial Intelligence, Large Language Models, Langchain, Machine Learning, Deep Learning Neural Networks, Natural Language Processing, Prompt Engineering, NumPy, Pandas, scikit-learn, TensorFlow, Keras, Pytorch, SQL, linear regression, logistic regression, decision trees, random forests, clustering algorithms (k-means, hierarchical clustering),, LSTMs, Transformers, Matplotlib, Seaborn, AWS, Microsoft Azure, Git, Azure Devops, Docker. CERTIFICATIONS

• Cutshort Python Advanced

• AZ-900 : Microsoft Azure Fundamentals

• Fundamentals Of The Databricks Lakehouse Platform Accreditation PUBLICATIONS

• Anomaly Detection via Federated Learning

Contact this candidate