Sai Subrahmanya Akhil Badampudi
Richardson, TX +1-315-***-**** *******.*********@*****.*** LinkedIn GitHub
(Open to Relocate)
SUMMARY
Results-driven Data Engineer with 4+ years of experience designing scalable ETL pipelines, building data warehouses, and delivering advanced analytics solutions across academia and global enterprises. Proven expertise in Python (PySpark, Pandas, Scikit-Learn), SQL, Power BI, Azure, and Databricks, with hands-on experience in cloud-based data platforms and real-time data processing using Kafka and Airflow. Demonstrated success in driving AI and ML initiatives, optimizing supply chain analytics, and enabling data-driven decisions through interactive dashboards and robust data models. Actively seeking full-time opportunities to contribute and work in dynamic environments
SKILLS
● Programming Languages: Python(Pandas, Numpy, PySpark, Scikit Learn, TensorFlow), SQL, Linux.
● Relational Databases & NoSQL: SQL Server, MySQL, Hive, Hadoop, MongoDB
● Data Visualization & Analysis: Power BI, Tableau, QuickSight, Excel
● Cloud platforms: Databricks, Azure, Microsoft Fabric, GCP, AWS, GitHub
● Machine Learning: Regression, Classification, Feature Engineering, Model Evaluation
● Tools: Apache Spark (Big Data), SSIS, SSMS, Kafka, Airflow WORK EXPERIENCE
o9 Solutions Remote, US
Data Engineer Jun 2024 - Present
Designed and automated a scalable ETL solution on Azure, leveraging Databricks and Azure Data Factory to implement a Medallion Architecture and enhance Supply Chain management.
● Designed a scalable ETL solution on Azure following the Medallion Architecture (Bronze, Silver, Gold layers) in Azure Databricks to enhance demand forecasting accuracy.
● Created and optimized complex SQL queries to extract and manipulate data from relational databases, supporting strategic reporting efforts.
● Developed scalable data processing scripts using Python and libraries like Pandas and NumPy for thorough analysis and insights,
● Configured and utilized Azure Data Lake Storage Gen2 (ADLS Gen2) as the primary storage, ingesting historical sales, promotional calendars, inventory, and master data into the Bronze Layer.
● Developed robust PySpark notebooks within Azure Databricks to perform extensive data cleansing and transformations, moving data to the Silver Layer (curated, refined data).
● Aggregated and enriched Silver Layer data into high-value business tables in the Gold Layer, leveraging Delta Lake tables for efficient storage and querying.
● Integrated the Gold Layer data into Azure Synapse Analytics as the final data warehouse, structuring data as Measure Groups and Dimensions to facilitate immediate consumption by the functional team.
● Leveraged Azure Data Factory (ADF) for comprehensive pipeline orchestration, automating daily data ingestion, weekly forecast generation, and monthly model retraining.
● Streamlined data delivery and operational efficiency by implementing DevOps best practices for continuous integration and deployment (CI/CD) of data pipelines.
● Collaborated closely with functional teams to gather detailed requirements, ensuring the delivery of datasets that precisely met their business needs and specifications. LTIMINDTREE Hyderabad, India
Data Engineer Oct 2020 - Jan 2023
Client: Sephora, USA & CANADA
Designed and optimized end-to-end data pipelines and warehouses, leveraging Power BI to transform raw data into actionable business insights for Supply Chain management
● Architected end-to-end ETL pipelines using SSIS and SSMS, consolidating multiple data sources into a star schema warehouse to streamline data transformation and cleansing processes
● Optimized SQL queries and stored procedures within these pipelines, reducing runtime by 42% and significantly improving data throughput
● Established comprehensive data quality checks and automated alerts, achieving 99% data pipeline uptime and ensuring high data accuracy from raw sources to analytical datasets
● Designed and developed interactive Power BI dashboards and reports, building robust data models (star schema) with complex DAX measures and calculated columns to provide actionable supply chain insights
● Mastered data extraction and transformation using Power Query, creating efficient and reusable queries to connect, clean, and reshape diverse data sources for Power BI solutions
● Leveraged Power BI Service for comprehensive deployment and management, publishing and sharing reports, configuring data gateways for secure, scheduled refreshes, and managing app workspaces for stakeholder access
● Implemented advanced visualization techniques and performance optimizations within Power BI, utilizing advanced visuals and drill-through functionality, to enhance user experience and report efficiency
● Proactively monitored and resolved data anomalies, minimizing downtime and ensuring smooth data operations essential for real-time reporting and dashboard accuracy
● Collaborated with business stakeholders to define and gather data requirements for supply chain analytics, supporting agile development and integrating change requests ARTHASHASTRA INTELLIGENCE Hyderabad, India
Data Analytics Engineer Intern Jan 2020 - Oct 2020 Analyzed and visualized large COVID and finance datasets to uncover key trends, improve data accuracy, and deliver actionable insights that supported strategic planning and informed business decisions
● Streamlined dataset preparation for large COVID and finance datasets using Python, Pandas, and NumPy in Jupyter Notebooks for fast and accurate statistical analysis.
● Performed data loading, cleaning, and preprocessing using Python Pandas to prepare datasets for analysis, ensuring accuracy and consistency.
● Created compelling visualizations with Matplotlib and Seaborn to present insights from complex datasets.
● Conducted correlation analysis and statistical tests to uncover key trends and relationships, enabling better business decision-making.
● Delivered clear, actionable insights that supported strategic planning across public health and financial domains PROJECTS
Energy Demand Prediction & Peak Load Analysis for a Utilities Company
● Improved utility demand planning by developing a Python based machine learning pipeline using Pandas and Scikit learn to forecast peak July electricity consumption for 5,000+ households, integrating large scale energy, weather, and housing datasets sourced from AWS S3.
● Enhanced forecasting accuracy through advanced regression modeling and cross validation techniques, providing actionable insights for load balancing and peak demand reduction strategies.
● Enabled data driven decision-making for utility stakeholders by processing, cleaning, and structuring diverse datasets into an analysis ready format, ensuring high data integrity and reliability Sustainable Energy and Its Effect on National Growth
● Designed and developed 4 interactive Tableau dashboards to analyze sustainable energy adoption and its correlation with economic growth across 20+ countries, enabling stakeholders to identify trends and make data driven policy recommendations.
● Created a comprehensive Tableau Story to present a clear narrative on how sustainable energy trends impact global economic growth, combining interactive visuals with key insights CERTIFICATIONS
Microsoft Azure Data Engineer Associate
Databricks Data Engineer Associate
EDUCATION
SYRACUSE UNIVERSITY, NEW YORK, US
MS in INFORMATION SYSTEMS Jan 2023 - Dec 2024
4.00/4.00 GPA