HARSHINI KOSURI
DATA ANALYST
Phone: 209-***-**** Email: *************@*****.*** LinkedIn
SUMMARY
Seasoned IT professional with over 3 years of specialized experience in data analysis, market research, analytics, and data modeling utilizing Python, Big Data technologies, AWS, Hadoop, and SQL.
Demonstrated proficiency in crafting optimized SQL queries, predominantly within Oracle and SQL Server environments.
Possesses a profound comprehension of Business Intelligence and Data Warehousing principles, with a strong focus on ETL processes and the Software Development Life Cycle (SDLC).
Skilled in Python-based data visualization techniques, employing Matplotlib, Seaborn, and Plotly to generate informative charts and graphs.
Proficient in leveraging AWS services such as EC2, S3, Lambda, and EMR for data storage, computation, and analysis purposes. Holds the AWS Data Engineer – Associate certification.
Hands-on experience with key data-centric Python and R libraries such as pandas, ggplot2, NumPy, and Seaborn.
Generated comprehensive reports and visualizations using tools like Power BI, Power Pivot, and Power Map.
Proficient in implementing machine learning models and algorithms through Python's Scikit-learn library.
Established seamless connections with databases using Python, streamlining data extraction and querying procedures.
Demonstrated expertise in creating impactful visualizations and dashboards using tools like Tableau and Matplotlib.
Designed and published dynamic reports, dashboards, and data narratives utilizing Tableau across multiple platforms.
Acquired practical experience in Natural Language Processing (NLP) and demonstrated proficiency in statistics, linear algebra, and optimization algorithms.
Utilized Spark for advanced analytics, seamlessly integrating it with Hive, SQL, Oracle, and Snowflake.
Optimized existing Hadoop algorithms leveraging Spark's capabilities, including Spark Context and Spark-SQL
Actively engaged in all stages of the SDLC, with a particular focus on the agile development model. Regularly participated in SCRUM meetings, facilitating productive discussions and efficient development progress. TECHNICAL SKILLS
EDUCATION
M.S. in Software Engineering- San Jose State University GPA 3.5 Jan 2022 - Dec 2023
B.S. in Computer Science- Stanley College of Women in Engineering GPA 3.7 Aug 2016 – Sept 2020 PROFESSIONAL EXPERIENCE
Data Analyst – BCBS, CA Aug 2023- Present
Developed predictive analytics models using Python and Tableau, integrating Pandas, NumPy, and Matplotlib for enhanced data interpretation.
Achieved a 20% increase in data accuracy by conducting data gathering and reconciliation using Excel.
Decreased query execution time by 40% through the utilization of AWS Athena's advanced querying capabilities for analyzing complex datasets.
Reduced data-related errors by 70% by implementing automated data quality checks triggered by AWS Lambda functions.
Enhanced predictive accuracy by 25% through the development of predictive analytics use-cases with Python.
Identified optimal BI solutions for generating reports, dashboards, and scorecards, using Tableau to analyze data and systems.
Improved system performance by 25% by establishing and monitoring a scalable distributed system based on Hadoop HDFS.
Provided strategic planning support by generating analytical reports with recommendations for management, utilizing ETL processes to gather data from various external sources.
Optimized and tuned several complex SQL queries for better performance and efficiency, while designing, developing, and maintaining Tableau functional reports based on user requirements. Data Analyst - Dell Technologies, India Jul 2019-Dec 2021
Leveraged Seaborn in Python to craft visually compelling statistical graphics, such as heatmaps, violin plots, and pair plots, driving deeper insights and facilitating data-driven decision-making.
Implemented complex SQL queries to ensure data integrity and optimize retrieval processes, resulting in streamlined operations and enhanced decision-making accuracy.
Spearheaded the management of legal data for a small firm using Excel and cloud-based tools, leading to a notable improvement in data accuracy and expediting decision-making processes.
Showcased expertise in SQL query optimization, including stored procedures and triggers, resulting in significantly improved data retrieval efficiency and maintenance procedures.
Developed and deployed interactive Power BI reports and dashboards, providing real-time insights into key business metrics, leading to more informed strategic decisions.
Utilized Power BI's advanced functionalities to create dynamic and interactive reports, fostering increased stakeholder engagement and comprehension of complex datasets, thereby driving actionable outcomes.
Conducted thorough statistical analysis utilizing tools such as SPSS and SAS, extracting actionable insights from large datasets to inform strategic decision-making processes, resulting in optimized business outcomes.
Optimized Power BI reports and dashboards for enhanced performance, ensuring smoother navigation and quicker access to insights for end-users, resulting in improved operational efficiency. Programming/Scripting
Languages:
SQL, Hive QL, Scala, Python, Java, Unix Shell Scripting. Databases/Warehouses: MS-SQL SERVER, Oracle, MS-Access, MySQL,Teradata, PostgreSQL, DB2, Snowflake. Big Data Technologies: HDFS, Yarn, Map Reduce, Pig, Hive, HBase, Apache Spark, Scala, Impala, Kafka. Libraries
NumPy, Pandas, Seaborn, Scikit-Learn, tidy verse, Pyspark. Plotly Matplotlib
NoSQL Database: Cassandra, MongoDB.
Cloud Technologies: MS Azure, GCP, AWS
Reporting Tools/ETL Tools: Informatica, Talend, SSIS, Tableau, Power BI. Version Controlling: Git, SVN
Development Tools: Eclipse, NetBeans, Microsoft Office Suite (Word, Excel, PowerPoint, Access) Operating Systems: Windows, Macintosh, Linux, Ubuntu, Unix. Others: Data Structures and Algorithms, Machine learning, Jupyter Notebook, Docker, Kubernetes, Jenkins, Jira, Agile/Scrum, Waterfall.
Designed and executed A/B tests to evaluate the impact of different strategies on key metrics, enabling data-driven decision-making and optimization of business processes, leading to tangible improvements in performance and efficiency.
Leveraged Azure cloud services to augment ETL processes, resulting in enhanced data integration, transformation, and loading efficiencies, thereby improving overall data management and analysis capabilities.
Integrated Azure Data Factory into ETL workflows, automating data movement and orchestrating complex data transformations seamlessly, leading to streamlined operations and improved productivity. PROJECTS
Flight Delay Predictor [Google Colab, Flask, AWS Redshift, Tableau]
Built a Machine Learning model and evaluated the performance. Implemented algorithms Naïve Bayes, Classification and regression trees and logistic regression and evaluated model performance through rigorous testing and fine tuning, enhancing accuracy and optimizing decision-making processes.
Created a Web application using HTML, flask to predict the probability of flight delay and created Tableau visualization board to depict statistical trends of delays over the months for enhanced decision making Electric Vehicles Trend in Washington [MySQL, AWS Redshift, AWS S3, Power BI]
Analyzed a dataset of 0.1M rows with 15+ features of electric vehicles sold in Washington State from 2017 to 2021, providing a statistical view on the rise in EV sales.
Conducted Exploratory Data Analysis, including data distribution visualization, outlier removal, mean/median imputation for missing values, and data reformatting.
Performed preprocessing using AWS Glue for data storage in AWS S3 and data warehousing using Redshift, culminating in the creation of an interactive Power BI dashboard showcasing EV sales trends and the impact of lithium-ion battery cost reductions. Music generation using deep learning [Python, LSTM, Transformers, Streamlit]
Explored LSTM and Transformer neural network architectures for music generation, advancing AI-driven music creation.
Developed novel AI music approaches using deep learning models effectively.
Created an intuitive Stream lit-based interface for easy neural network model selection, enhancing user engagement. CERTIFICATIONS
AWS Data Engineer – Associate
Microsoft Azure AZ 900, AI 900, DP 900
Astronomer Apache Airflow – Fundamentals
PCAP (Certified Associate in Python Programming)