Harish Reddy Data Analyst
*******.*@************.*** 617-***-**** New York, USA LinkedIn Git Summary
Results-driven Data Analyst with 5+ years of experience transforming complex data into strategic insights and actionable business solutions. Expertise in advanced data modeling, predictive analytics, and automation of ETL processes using Python, SQL, PySpark, and Databricks. Demonstrated success enhancing decision-making efficiency by over 30% through real-time Power BI dashboards and driving marketing ROI improvements of up to 25%. Passionate collaborator and problem solver, skilled in delivering scalable analytics solutions across diverse industries, with a great foundation in cloud technologies (AWS, Azure) and agile methodologies. Technical Skills
• Programming & Scripting: Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn), SQL (T-SQL, PL/SQL), R, DAX, MATLAB, HTML/CSS, Bash, PowerShell
• Databases & Storage: MySQL, MS-SQL, Amazon Redshift, Snowflake, Big Query, PostgreSQL, MongoDB, SparkSQL, Oracle
• ETL & Data Processing: Apache Kafka, Apache Airflow, AWS Glue, Azure Data Factory, Databricks, Informatica, Hadoop, Hive, PySpark, AWS Lambda, Apache NiFi
• Visualization & Reporting: Power BI, Tableau, Google Analytics, Salesforce, Zendesk, MS Excel (Advanced functions, Pivot Tables), Google Data Studio, Amazon QuickSight, QlikView, Looker
• Cloud Platforms & Tools: AWS (EMR, EC2, S3, Redshift, Lambda, Glue), Azure (Data Factory, Synapse Analytics), Google Cloud Platform, Docker, GitHub, Jenkins, Jira, CI/CD Pipelines, Postman, Confluence
• Others: Predictive Modeling, Machine Learning (Regression, Classification, Clustering), NLP Techniques, A/B Testing, Statistical Analysis, Time Series Forecasting, Data Modeling (Dimensional & Relational), Data Governance, Agile (Scrum, Kanban), SDLC, Data Quality Assurance
Professional Experience
Data Analyst, Adobe 01/2024 - Present Remote, USA
• Enhanced targeted marketing campaigns by 25%, employing predictive analytics and advanced customer segmentation models built using Python (Scikit-learn, Pandas) and PySpark.
• Developed interactive real-time Power BI dashboards, improving executive decision-making speed and operational efficiency by over 30%, directly contributing to significant revenue growth.
• Optimized ETL pipelines in Databricks and AWS Glue, achieving a 40% reduction in data processing times for datasets exceeding 10TB through automation and streamlined workflows.
• Leveraged SQL and Snowflake for complex data modeling and analysis, uncovering actionable insights which improved marketing conversion rates by 15%.
• Automated recurring analytics tasks and report generation with Python scripts, reducing manual effort by 50%, significantly boosting accuracy and productivity.
• Collaborated cross-functionally using Agile methodologies and Jira, effectively translating 95% of stakeholder requirements into robust business intelligence solutions, increasing internal stakeholder satisfaction and data reliability. Data Analyst Intern, Northwest Bank 05/2023 - 12/2023 Remote, USA
• Conducted advanced customer segmentation on a dataset of 10 million+ records using K-means clustering and regression analysis
(Python, scikit-learn, SQL), achieving a 25% improvement in marketing campaign effectiveness.
• Optimized complex SQL queries utilizing advanced features (Window Functions, CTEs) in AWS Redshift, enhancing data retrieval efficiency by 30% and significantly reducing query runtime.
• Engineered automated ETL workflows leveraging AWS Lambda, S3, and Apache Airflow, managing over 15TB of transactional data and reducing processing latency by 20%.
• Built predictive models using machine learning algorithms (Logistic Regression, Random Forest) to identify high-value customer segments, resulting in a 20% increase in conversion rates and 15% improvement in customer retention. Data Analyst, Tata Consultancy Services 05/2019 - 05/2022 Hyderabad, India
• Developed and maintained Power BI dashboards and interactive reports, enhancing visibility into customer analytics and directly improving quarterly revenue metrics by informing strategic business decisions.
• Implemented predictive analytics using Logistic Regression and Random Forest in Python to optimize digital marketing strategies, resulting in a 20% year-over-year growth in website engagement.
• Streamlined real-time data ingestion processes using Apache Kafka, improving data availability by 23%, and orchestrated efficient workflow automation with Apache Airflow.
• Executed end-to-end ETL processes and managed data pipelines using PySpark, Hadoop, Hive, and AWS Glue, significantly improving data processing efficiency and scalability. Education
Master of Science, Data Analytics Engineering Northeastern University, Boston, USA 09/2022 - 05/2024 GPA: 3.85
• Relevant Coursework: Data Mining, Big Data Analytics, Machine Learning, Statistical Analysis, Predictive Modeling, NLP Bachelor of Technology, Electronics and Communication Sastra University, Tamil Nadu, India 06/2016 - 05/2020 GPA : 3.8
• Relevant Coursework: Python, Data Base Management Systems, Java, C, C++ Projects
Enhancing Digital Safety through Predictive Modeling for Malware Detection Python, NumPy, Pandas, Scikit-learn, Feature Engineering, Statistical Analysis, A/B Testing
• Developed a robust classification model achieving 93% accuracy in identifying malware on mobile devices, significantly improving digital safety.
• Applied advanced feature engineering techniques and statistical methods, enhancing predictive analytics reliability through rigorous A/B testing and experimentation.
Real-Time Data Prioritization Pipeline for Intelligent Transportation Systems AWS Glue, AWS Lambda, Amazon S3, Python, Big Data Processing, Real-Time Analytics, Machine Learning
• Built a scalable real-time ETL pipeline leveraging AWS Glue, Lambda, and S3 for managing and processing terabyte-scale datasets efficiently.
• Implemented a real-time machine learning model, achieving a 20% increase in carrier prioritization accuracy, enabling dynamic decision-making and operational efficiency.
Brain Stroke Detection through Predictive Modeling Python, NumPy, Pandas, Scikit-learn, Matplotlib, Exploratory Data Analysis, Cross-validation
• Developed predictive models (Random Forest, XGBoost) with 80.66% accuracy for detecting brain strokes.
• Conducted comprehensive Exploratory Data Analysis (EDA) and optimized performance using stratified cross-validation, improving clinical predictive accuracy.
Online Customer Market Segmentation (RFM Analysis) Python, R, PostgreSQL, Pandas, Power BI, Customer Analytics, Predictive Modeling, Data Visualization
• Performed customer segmentation using RFM analysis, identifying high-value segments and enabling targeted marketing strategies.
• Built interactive Power BI dashboards for visual analytics, significantly enhancing insights into customer behavior and driving improvements in profitability metrics.
Certification
Microsoft Certified: Power BI Data Analyst Associate