Ramachandruni Naga Venkata Sai
DATA ANALYST
NJ, USA M.no: 847-***-**** Email: **********@*****.***
LinkedIn - www.linkedin.com/in/nagavenkatsair3008
SUMMARY
Data Analyst with 4 + years of expertise in SQL, Python, and SAS for data manipulation, analysis, and predictive modelling,
ensuring actionable insights and data-driven decision-making.
Proficient in data visualization tools such as Tableau, Power BI, and Excel (Advanced VBA, Pivot Tables, Power Query) to
create interactive dashboards and reports for business intelligence.
Strong background in ETL processes, data warehousing (Snowflake, Redshift, SQL Server, Oracle), and big data technologies
(Hadoop, Spark, AWS, Azure Databricks) to manage and analyze large datasets efficiently.
Experience in healthcare and finance domains, leveraging expertise in EHR systems (Epic, Cerner), financial modelling, machine
learning (scikit-learn, TensorFlow), and data governance (HIPAA, GDPR) for compliance and optimization.
SKILLS
Methodologies: SDLC, Agile, Waterfall
Programming Language: Python, SQL, Scala, R
Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn, ggplot2
Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP)
Cloud Technologies: AWS (EC2, S3, Redshift, Athena, Glue, DynamoDB), Azure, Snowflake
Database: MySQL, PostgreSQL, MySQL, MongoDB, SQL Server, Oracle
Other Technical skills: Machine Learning, Statistics, ServiceNow, SSIS, SSRS, Alteryx, Probability distributions,
Confidence Intervals, ANOVA, Hypothesis Testing, Regression Analysis, Linear Algebra, Advance
Analytics, Data Mining, Data Visualization, Data warehousing, Data transformation, Data
Storytelling, Business Analysis, Clustering, Classification, Regression, A/B Testing, Forecasting &
Modelling, Data Cleaning, Data Wrangling, Informatica, Jira, UAT, Supply Chain Management, GitHub
WORK EXPERIENCE
KPMG, CHI Data Analyst May 2024 – Present
Extracted and aggregated financial transaction, portfolio, and market data for 75,000+ clients from various financial systems
using PostgreSQL and complex SQL joins to support investment risk management and portfolio analysis initiatives.
Utilized Python (Pandas, NumPy) for advanced data cleaning, transformation, and client segmentation, improving data quality
and consistency by 13% across multiple sources, resulting in more accurate and reliable financial insights.
Designed and deployed interactive dashboards in Power BI, enabling real-time visualization of investment performance
metrics, including portfolio growth, asset allocation, and risk indicators for senior stakeholders.
Developed a logistic regression model to predict the likelihood of portfolio underperformance (based on market conditions,
client behavior, and historical performance) within the next 6 months, supporting proactive investment strategies.
Utilized Azure Data Factory to automate ETL pipelines, extracting financial data from various systems (investment platforms,
banking, and trading systems), reducing data processing time by 5% and ensuring faster and more efficient data transformation
for analysis.
Enabled data parallelism for model training and evaluation using Spark's distributed processing framework, allowing for
faster and more efficient analysis of complex financial datasets, leading to enhanced decision-making in portfolio management.
Worked in an Agile environment, collaborating with investment analysts, risk managers, IT, and data governance teams to
enhance data definitions, reporting cadence, and stakeholder engagement in financial projects.
HCL Tech, India Data Analyst Feb 2021 – Aug 2023
Created interactive visualizations in Tableau to present the patient retention model’s results, tracking key performance
metrics (KPMs) and patient segmentation insights for healthcare stakeholders.
Conducted comprehensive exploratory data analysis (EDA) to identify key patient retention indicators, such as treatment
adherence, appointment frequency, and patient satisfaction, providing insights into the root causes of patient churn.
Extracted and transformed data from large-scale healthcare databases using SQL, working with millions of records related to patient
demographics, treatment history, appointment scheduling, and patient-provider interactions to support retention analysis.
Developed a predictive patient retention model using logistic regression and decision trees in scikit-learn, achieving 25%
accuracy in predicting patient dropout risk based on historical treatment and visit data.
Collaborated with the care coordination team to implement personalized retention strategies, resulting in 18% improvement in patient
engagement and retention efforts by effectively targeting high-risk patients with tailored care plans and follow-up reminders.
Employed AWS S3 for scalable and secure storage of large patient datasets, ensuring efficient management of treatment data for
retention analysis and facilitating easy access for real-time insights and model updates.
Automated data pipeline processes in Python using Apache Airflow, scheduling and monitoring workflows that transformed
and loaded patient data from EHR systems into cloud-based storage for real-time analysis.
Created pivot tables in Excel to summarize and analyze patient retention patterns across different demographics, treatment
plans, healthcare services, and patient engagement factors.
EDUCATION
Master of Science in Computer Information Technology - Eastern Illinois University, Charleston, IL, USA
Bachelor of Technology in Computer Science - JNTU, Hyderabad, India