Data Engineer Intern

Location:

Bloomington, IN

Posted:

July 02, 2025

Contact this candidate

Resume:

Rutwiz Gangadhar Gullipalli

+1-857-***-**** # Mail ï LinkedIn § Github Portfolio

Profile

Detail-oriented Data Engineer with over 3 years of experience in designing, developing, and deploying scalable data pipelines and real-time streaming solutions. Proficient in Python, Apache Kafka, Apache Spark, and Airflow, with hands-on expertise in building automated ETL workflows on AWS. Adept at working with large-scale data warehousing platforms like Snowflake and Amazon Redshift, with a strong focus on data reliability, cost-efficient cloud infrastructure, and production-ready systems. Work Experience

Cloudport AI Apr 2024 – Dec 2024

Data Engineer Intern Chicago, IL

• Developed robust, cloud-native ETL pipelines using Python, SQL, and AWS services (S3, Lambda) to automate ingestion and transformation of over 10 million blockchain transaction records weekly, significantly improving data accessibility and operational efficiency.

• Implemented modular pipeline components and reusable code functions for data wrangling and enrichment tasks, ensuring maintainability, version control, and code reusability across analytical teams. ValueLabs Mar 2021 – Dec 2022

Data Engineer Hyderabad, India

• Architected and optimized large-scale ETL pipelines using PySpark, Kafka, and SQL to process terabytes of structured and semi- structured data into Azure Data Lake Gen2, improving data pipeline throughput and reducing latency in production analytics.

• Designed and implemented dimensional data models (star and snowflake schemas) within Snowflake and Azure Synapse, achieving 45% performance gains in data retrieval for BI and reporting use cases.

• Automated data workflows using Azure Data Factory, orchestrating batch and streaming ingestion processes with integrated monitoring, alerting, and retry logic to meet SLA compliance for enterprise data delivery. IBM Apr 2020 – Dec 2020

Data Analyst Intern Hyderabad, India

• Built scalable data cleansing and standardization scripts using Python (Pandas) and SQL, improving the quality and reliability of enterprise datasets for business reporting across HR, Finance, and Marketing teams.

• Assisted in building centralized metadata dictionaries and documentation for data lineage, pipeline ownership, and governance compliance to support internal audit and analytics teams. Projects

Spotify Global Top 100 ETL Pipeline

• Designed and developed a robust, serverless end-to-end ETL pipeline using Python, Apache Airflow, and AWS cloud-native services (Lambda, S3, Athena, Glue) to automate the weekly extraction, transformation, and loading of Spotify’s Global Top 100 song data for analytics and reporting.

• Integrated Athena with Power BI and QuickSight dashboards to enable non-technical stakeholders to derive insights into artist popularity, genre trends, and regional engagement, resulting in a 60% reduction in manual analysis time and a significant boost in data-driven decision-making across departments. Bitcoin Environmental Impact Analysis

• Conducted a comprehensive environmental impact study of Bitcoin by aggregating blockchain transaction datasets, hash rate statistics, and energy consumption metrics from multiple APIs and CSV data dumps, leveraging Python (Pandas, NumPy) and SQL to preprocess and analyze over 5 million data points.

• Enabled seamless refresh of dashboards and data pipelines through scheduled batch jobs and API-based data pulls, ensuring real-time updates with less than 5-minute delay; improved data refresh performance by 45% through query optimization and incremental updates.

Netflix API Analytics Pipeline

• Architected a high-throughput, horizontally scalable data ingestion and transformation pipeline to process and analyze API access logs from Netflix’s backend services in near real-time using Apache Kafka for message streaming and Snowflake for storage and analytical processing.

• Implemented multi-stage transformation logic using PySpark to convert semi-structured log data (JSON) into flattened, structured tables with enriched metadata including user agent, request response times, status codes, and geolocation enrichment using external APIs.

Certifications

• AWS Certified Data Engineer - Associate

• Snowflake Snowpro Core Certification

Technical Skills

• Languages: Python, SQL, Java, Scala, R

• Big Data & Warehousing: Apache Spark, PySpark, Hadoop, Snowflake, BigQuery, Redshift

• ETL & Orchestration: Airflow, AWS Glue, ADF, Informatica, Dataflow, Cloud Composer

• Cloud Platforms: AWS (S3, Lambda, Kinesis, EMR), Azure, GCP (DataPlex, Pub/Sub)

• Databases: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, DynamoDB

• DevOps & Infra: Git, Jenkins, Docker, Terraform, CI/CD, Linux

• Streaming & Messaging: Apache Kafka, Confluent Kafka

• Visualization & ML: Power BI, Tableau, Looker, Qlik, Vertex AI, LangChain, FAISS, SageMaker Education

Northeastern University, Boston 2023 - 2025

Masters in Data Analytics 4.0 CGPA

Contact this candidate