Post Job Free
Sign in

Data Engineer, Data Scientist

Location:
Dallas, TX
Posted:
May 19, 2025

Contact this candidate

Resume:

Nayan Bhiwapurkar

+1-623-***-**** ****************@***.*** linkedin.com/in/nayan-bhiwapurkar Portfolio: nayanbhiwapurkar.github.io EDUCATION

Master of Science - Data Science, Analytics, and Engineering August 2023 - May 2025 Arizona State University, Tempe, AZ, US GPA: 4.0/4.0 Relevant Courses: Statistics for Data Analysts, Data Processing at Scale, Artificial Intelligence, Data Mining, Data Visualization, Statistical Machine Learning, Deep Learning, Numerical Optimization, Statistical Learning and Pattern Recognition. Bachelor of Engineering - Computer Science August 2016 - May 2020 University of Pune, Pune, MH, India GPA: 8.55/10.0 Relevant Courses: Data Mining and Warehousing, Data Analytics, Artificial Intelligence, Machine Learning, Engineering Mathematics. PROFESSIONAL EXPERIENCE

Data Engineering Intern HyperData - Tempe, AZ, US June 2024 - August 2024

• Optimized cloud infrastructure for API scalability and reliability using AWS, Terraform, Docker, and Airbyte for data synchronization.

• Enhanced data processing pipelines for better efficiency and throughput by leveraging Python and PostgreSQL to handle larger datasets.

• Developed and refined SQL scripts to advance data transformation tasks and improve the API’s functionality for diverse use cases.

• Executed API performance testing, benchmarking, and created documentation with usage examples. Data Engineer Cognizant Technology Solutions - Pune, MH, India November 2020 - June 2023

• Led a retail data engineering project, processing daily data volumes of about 100GB using Apache Spark, Python, SQL, and Amazon S3. Designed structured data models, including dimension and fact tables, to provide valuable context for point-of-sale analysis.

• Implemented an incentive program based on sales performance, boosting motivation among sales teams. Designed customer retention strategies using targeted coupons, increasing engagement with infrequent buyers.

• Enhanced data pipeline performance by applying Spark optimizations like caching and broadcast joins. Automated workflows with Airflow, CRON jobs, and integrated Azure CI/CD pipelines.

• Pioneered real-time monitoring services for a Covid Vaccine Survey System using Java, SpringBoot, Apache Kafka, Splunk, and Dynatrace, reducing system response time by 20% and improving user experience by 15%.

• Engineered data ingestion pipelines with Apache Kafka, optimizing data flow and storage efficiency by 25%.

• Spearheaded the migration from a SOAP-based service to a scalable RESTful microservices architecture, improving code quality by 30%. Documented APIs, wrote JUnit tests with Mockito, and debugged critical issues to boost system reliability and developer productivity. Software Engineering Intern Cognizant Technology Solutions - Pune, MH, India December 2019 - April 2020

• Developed a high-performing, Twitter-like social media platform leveraging Java, Spring Boot, Angular, REST APIs, and Microservices.

• Enhanced scalability with Kafka MQ and MongoDB, and implemented real-time monitoring with Grafana, Zipkin, and Prometheus.

• Automated deployment with AWS Code Pipeline and Docker Build, improving efficiency by 40% and delivering a robust CI/CD pipeline. ACADEMIC PROJECTS

Spotify ETL Pipeline Cloud-based Big-Data Processing and Analytics Project

• Built an automated ETL pipeline to extract Spotify API data, transform it using Apache Spark in AWS Glue, and load it into Snowflake.

• Utilized AWS Lambda for data extraction, Amazon S3 for raw and transformed data storage, and Snowpipe for automated ingestion.

• Scheduled daily pipeline execution with Amazon CloudWatch and created Power BI dashboards for data visualization and insights. Real-Time Data Streaming Pipeline Snowflake CDC and Data Warehousing Project

• Built a real-time data pipeline using Apache NiFi, AWS S3, and Snowflake with Change Data Capture (CDC) for efficient data replication.

• Configured Snowpipe, Streams, and Tasks in Snowflake to populate target tables with SCD1 (overwrite) and SCD2 (historical tracking).

• Deployed NiFi on AWS EC2 (via Docker) to process and stream data, ensuring seamless updates and analytics-ready datasets. InnSight – Generative AI chatbot for Hotel Recommendations Hacklytics, Georgia Tech

• Engineered an advanced AI-driven hotel search and recommendation system utilizing a Retrieval-Augmented Generation (RAG) model with Qdrant Vector DB and integrated Traversaal.ai’s Ares API for real-time, semantic query processing and enriching data.

• Employed a conversational chatbot leveraging OpenAI functions and hosted on Intel Developer Cloud, using all-MiniLM-L6-v2 and llama models from Hugging Face to enhance user interaction and provide context-aware hotel recommendations. TECHNICAL SKILLS

Languages: Python, SQL, SparkSQL, Java, Scala, R, Bash Libraries: PySpark, Pandas, NumPy, Matplotlib, Seaborn, Plotly, SciKit-Learn, SciPy, TensorFlow, Keras, PyTorch Big Data Technologies: Apache Spark, Hadoop, Hive, Kafka, Airflow, Snowflake, dbt, Databricks, Azure Data Factory, Data Modeling Cloud Technologies: AWS (S3, Lambda, Glue, IAM, EMR, Redshift, Kinesis), Azure, GCP (BigQuery), Docker, Kubernetes, Terraform Databases and Tools: Relational (PostgreSQL, Oracle SQL, MySQL), NoSQL (MongoDB, Neo4j), Tableau, Power BI, Looker, Git, GitHub Web App Frameworks: Spring Boot, Django REST, Flask RESTful, FastAPI, Angular Others: Jira, Confluence, Jenkins, Notion, Jupyter, Google Colab, Microsoft Office (Excel), Postman, Streamlit Certifications: AWS Certified Cloud Practitioner, ASU Project Management



Contact this candidate