Priyanka Chauhan Data Analyst
***************@*****.*** +1-334-***-**** LinkedIn GitHub
Summary
Results-oriented Data Analyst with over 3 years of experience delivering insights through advanced analytics, machine learning, and business intelligence solutions. Skilled in developing scalable ETL pipelines, statistical models, and interactive dashboards using Python, SQL, Power BI, Tableau, and AWS. Adept at leveraging big data tools such as Spark, Hadoop, and Snowflake to extract actionable insights. Proven track record of optimizing performance, improving reporting accuracy, and enhancing data-driven decision- making for cross-functional teams.
Technical Skills
• Languages & Analytics: Python, R, SQL, Scala, NumPy, Pandas, SciPy, Excel (Advanced Functions, Macros, Power Query)
• Data Visualization: Tableau, Power BI, Matplotlib, Seaborn, Google Data Studio
• Machine Learning & Modeling: scikit-learn, TensorFlow, Keras, XGBoost, LightGBM, Logistic Regression, Time Series Forecasting
• Cloud & Big Data: AWS (S3, EC2, Glue, SageMaker), GCP (BigQuery), Azure (Synapse, ADF), Hadoop, Spark, Databricks
• Databases: MySQL, PostgreSQL, MongoDB, Cassandra, Azure SQL, NoSQL
• ETL & Pipelines: Airflow, Data Factory, Data Wrangling, Data Cleaning, Workflow Automation
• Statistical Tools: RStudio, SPSS, SAS, Stata
Professional Experience
Business & Data Analyst, Quarks System January 2018 – May 2021 India
• Developed and maintained robust, scalable ETL pipelines using Google Cloud Platform (BigQuery), Snowflake, and Databricks, automating data ingestion and transformation across multiple systems. This reduced reporting delays by 40% and improved pipeline stability and performance.
• Designed and deployed interactive dashboards with Power BI and Tableau, visualizing business KPIs, revenue trends, and customer behavior analytics. These dashboards became central tools in strategic planning and improved executive reporting efficiency by 50%.
• Conducted advanced data wrangling, cleaning, and feature engineering using Python (Pandas, NumPy) and complex SQL queries, preparing high-quality, analysis-ready datasets for predictive modeling and deep-dive analytics.
• Engineered big data workflows using Apache Spark and Scala on Databricks, improving the performance of large-scale data processing jobs and reducing compute costs by 30% through code optimization and caching techniques.
• Built and deployed predictive models (XGBoost, Random Forest, ARIMA, Prophet) for use cases such as sales forecasting, customer segmentation, and churn prediction, leading to data-driven decision-making and a 20% increase in customer retention.
• Implemented data quality checks and monitoring frameworks using Airflow, custom Python scripts, and GCP tools to ensure data accuracy, pipeline reliability, and timely failure alerts. This improved data trustworthiness across teams.
• Automated complex reporting and data validation workflows using Locker, workflow automation tools, and advanced Excel (Power Query, Macros), saving over 70% of manual effort and reducing dependency on repetitive tasks.
• Led cross-functional communication by presenting analytical insights and business recommendations through PowerPoint, SharePoint, and real-time dashboards, aligning data outcomes with business strategy and operational goals.
• Managed both relational and NoSQL databases (MySQL, PostgreSQL, MongoDB, Cassandra), optimizing query performance, indexing, and schema design for faster analytical query execution and more efficient data retrieval. Project
Image Enhancement and Caption Generation using CNN Duration: July 2024– December 2024 Tools & Technologies: Python, TensorFlow, Keras, OpenCV, Pandas, NumPy, Matplotlib, MSCOCO Dataset
• Designed a deep learning pipeline that applied Convolutional Neural Networks (CNNs) for high-resolution image feature extraction and enhancement, improving image quality metrics like PSNR and SSIM for downstream analysis.
• Combined CNN and LSTM architectures to build a caption generation system capable of producing human-like descriptions from image embeddings, integrating NLP with visual data for richer data interpretation.
• Preprocessed and structured image metadata using Python (Pandas, NumPy) for training and validation, enabling scalable dataset handling across large image corpora such as MSCOCO.
• Implemented custom evaluation metrics and visualization plots using Matplotlib and Seaborn to assess model performance, including BLEU score trends and feature map overlays.
• Focused on real-world applications such as automated content tagging, image classification, and visual analytics—demonstrating expertise in machine learning, predictive modeling, and data interpretation. Education
Master of Science in Computer Science (AI & ML), Troy University Troy, AL January 2023 - December 2024
Bachelor of Business Administration, PITM Kolkata, India August 2012 – July 2015