Data Analyst

Location:

Maryland Line, MD

Salary:

75000

Posted:

December 30, 2025

Contact this candidate

Resume:

Ram Sai Ganesh Manyala

DATA ANALYST

New York, USA 667-***-**** ************@*****.*** https://www.linkedin.com/in/ram-sai-ganesh-manyala/

SUMMARY

Experienced Data Analyst with 3+ years of expertise in SQL, Python, R, and SAS for data extraction, transformation, and analysis, ensuring data integrity and accuracy for business insights.

Proficient in data visualization tools including Tableau, Power BI, and Looker, creating interactive dashboards and reports to support data-driven decision-making.

Strong background in big data technologies such as Hadoop, Spark, and AWS (Redshift, S3, Athena), enabling efficient data processing and large-scale analytics.

Skilled in ETL development, statistical modeling, and machine learning using NumPy, Pandas, Scikit-learn, and TensorFlow, with experience in EHR systems (Epic, Cerner) and finance data analytics while ensuring compliance with HIPAA and GDPR regulations.

SKILLS

Methodologies:

SDLC, Agile, Waterfall

Programming Language:

Python, SQL, Scala, R

Packages:

NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn, ggplot2

Visualization Tools:

Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP)

Cloud Technologies:

AWS (EC2, S3, Redshift, Athena, Glue, DynamoDB), Azure, Snowflake

Database:

MySQL, PostgreSQL, MySQL, MongoDB, SQL Server, Oracle

Other Technical skills:

Machine Learning Algorithms, ETL Tools, Statistics, ServiceNow, SSIS, SSRS, MapReduce, Alteryx, Probability distributions, Confidence Intervals, ANOVA, Hypothesis Testing, Regression Analysis, Linear Algebra, Advance Analytics, Data Mining, Data Visualization, Data warehousing, Data transformation, Data Storytelling, Business Analysis, Clustering, Classification, Regression, A/B Testing, Forecasting & Modelling, Data Cleaning, Data Wrangling, Informatica MDM, Jira, UAT, JAD, Git, GitHub, Visual Studio Code, PyCharm

WORK EXPERIENCE

PNC Financial Services, MD Data Analyst May 2024 - Present

Performed extensive data cleaning and preprocessing using Python (Pandas, NumPy) to handle missing values, outliers, and normalization of continuous financial variables, ensuring data quality and consistency for credit risk model development.

Built scalable ETL pipelines on Databricks using PySpark to handle millions of rows of financial transactions, reducing processing time by 40% compared to traditional methods.

Extracted and integrated data from diverse sources, including transactional records, loan applications, and customer demographic data, accounting for 10% of the required data for predictive credit risk analysis.

Created interactive dashboards in Power BI to visualize key performance indicators (KPIs) such as loan default rates, customer segmentation, and credit utilization patterns.

Utilized Databricks notebooks for distributed processing of loan transaction data, integrating Spark-based ETL transformations and machine learning pipelines within a collaborative environment.

Queried and analyzed large-scale financial datasets using Snowflake, optimizing SQL workflows for faster credit risk scoring and reporting within the cloud-based data warehouse.

Applied the Statsmodels library in Python to perform time series analysis and linear regression for modeling credit score trends, aiding in feature selection and hypothesis testing.

Wrote complex SQL queries to segment customers based on credit risk factors, such as payment history, credit score, and loan tenure, to create cohorts for analysis and predictive modeling.

Integrated the predictive credit risk model into PNC’s loan management system, enabling real-time risk scoring and allowing credit officers to proactively manage high-risk accounts, resulting in a 25% reduction in loan defaults within the first six months of deployment.

Analyzed the distribution of key financial variables, such as credit scores, loan amounts, and default status using histograms, box plots, and density plots in Matplotlib to identify skewness, outliers, and general data distribution.

Capgemini, India Data Analyst Jun 2021 – Aug 2023

Developed a Patient Readmission Risk Prediction model to assess the likelihood of hospital readmissions, enabling data-driven decision-making for care management and improving patient outcomes.

Extracted, integrated, and cleaned diverse data sources including patient demographics, medical history, and external health indicators from Electronic Health Records (EHR) and claims data using SQL queries for structured and unstructured data retrieval.

Built and validated machine learning models using Scikit-learn to predict patient readmission risks, applying logistic regression and random forest algorithms for classification with improved AUC scores.

Utilized R for statistical analysis and visualization of patient readmission trends, performing ANOVA and regression analysis to identify key risk factors influencing hospital returns.

Automated ETL pipeline using Apache Airflow, reducing data pipeline execution errors by 10% and ensuring timely, consistent updates of patient data from various sources into the healthcare data warehouse for analysis and model predictions.

Utilized AWS Lambda to automate real-time transformation and cleaning of incoming health data, reducing processing time by 15% and ensuring tasks such as missing value imputation, encoding, and feature transformation were executed automatically upon data upload to S3.

Employed Excel VLOOKUP functions to join patient datasets across multiple spreadsheets for quick validation of demographic and treatment data during exploratory analysis.

Leveraged Git for version control and collaborative development of Python scripts and Jupyter notebooks for predictive modeling and ETL automation, ensuring seamless team integration and code tracking.

Created interactive dashboards in Tableau, enabling healthcare providers and administrators to monitor key patient health indicators and readmission risks for proactive intervention.

Engineered new features such as comorbidity scores, length of stay, and medication adherence rates using clinical domain knowledge to enhance model performance, leading to a 5% increase in prediction accuracy; created custom patient risk categories using Python and Pandas for better care segmentation.

Analyzed patient-level data using SAS to identify statistically significant predictors of readmission, supporting data validation and cross-verification with Python-based models.

Aggregated large patient datasets using Excel PivotTables to summarize key metrics such as average length of stay, readmission rates, and treatment effectiveness, providing a quick overview of patient population trends.

EDUCATION

Master of Science in Data Science - University of Maryland Baltimore county, MD, USA – May 2025

Bachelor of Technology in Electronics and Communication Engineering – Andhra University, India – April 2023

Contact this candidate