Post Job Free
Sign in

Data Engineer Machine Learning

Location:
Santa Clara, CA
Posted:
June 21, 2024

Contact this candidate

Resume:

Siddhi Kulkarni

408-***-**** ****************@*******.*** Linkedin Github

SKILLS

Languages: Python, C++, Javascript, SQL Database: Postgres, MySQL, Oracle, MongoDB, Snowflake Web Technologies: Frameworks(Django, ReactJS), REST APIs, Bootstrap, HTML, CSS, Cloud& Containers ( AWS EC2, S3, RDS, RedShift) Machine Learning: Support Vector Machine, k-Nearest Neighbours, Neural Networks, LSTM, Semi-supervised Learning, scikit-learn, NumPy, pandas, Matplotlib, Seaborn

Data Science: Data Analysis, Data preprocessing, Statistical Inference and Modeling, Data Interpretation. Technologies: Git, Hadoop, Apache Spark, Hive, Docker, Kafka, PySpark, Postman, Github WORK EXPERIENCE

Data Engineer Intern at Vee Ventures, United States Jun 2023 – Aug 2023

• Pioneered the development of a comprehensive data analytics framework, harnessing Amazon PostgreSQL and Lambda.

• Led the creation of an event-based ETL pipeline using AWS cloud-native services to transform trade data collected from distributed warehouses and store it in a relational database.

• Successfully integrated diverse external data sources, such as JSON, CSV, and relational databases and ingested data from heterogeneous sources, producing directly renderable data as a REST service.

• Optimized real-time data processing pipelines, leveraging Apache AirFlow for workflow automation to enhance efficiency and reliability.

• Proactively worked with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

Data Engineer at ZS Associates, India Aug 2020 – Jun 2022

• Designed data models for optimal storage and retrieval by following best practices in data modeling, translating business requirements into scalable solutions within the data architecture.

• Identified and addressed performance bottlenecks in SQL queries and processes by optimizing database structures, indexes, and query execution plans, resulting in improved overall system performance using Amazon Redshift.

• Identified and implemented process improvements, automating manual tasks and optimizing data delivery, including developing Python scripts to reduce data errors by 50% and enhance accuracy.

• Streamlined NBA DevOps and boosted SLA delivery from 50% to 97% within a span of 3 months which positively impacted ZS Associates' operational performance and payout.

• Recognized by the firm for ‘Integrated Analytics Outstanding Contribution’ award in 2021. Software Developer Intern at Semporro, India Jun 2019 – Sep 2019

• Worked on full-stack web content management application, from requirements gathering and system analysis to design, development, testing, deployment, and client interaction.

• Created microservice-based architectures for features including user authentication using JWT, schedules and subscriptions, as well as integrated third-party APIs such as Razorpay for subscription payments.

• Used Celery as a task queue and RabitMQ, Redis as messaging broker to execute asynchronous tasks. EDUCATION Advanced Data Structures and Algorithms, Object Oriented Programming Santa Clara University - Santa Clara, CA Sep 2022 – Jun 2024 Master of Science in Computer Science and Engineering Big Data, RDBMS Pune Institute of Computer Technology - Pune, India Aug 2016 –Jun 2020 Bachelor of Engineering in Computer Science

ACADEMIC PROJECTS

Custom ELT Project Docker, PostgreSQL, Python Mar 2023 – May 2023

• Orchestrated a Docker-based ELT project with PostgreSQL, showcasing adept use of Docker Compose. Developed a Python script for efficient data extraction and loading, demonstrating strong data handling skills. Predictit Data With Airflow And Snowflake Python, Airflow, Snowflake Sep 2023 – Dec 2023

• Leveraged Airflow with Managed Workflows for Apache Airflow (MWAA) for seamless orchestration, utilizing a Python operator for data ingestion. Established an efficient data flow, storing raw data in an S3 bucket and transferring it to Snowflake for analytical storage.

• Constructed a Directed Acyclic Graph (DAG) defining the entire project workflow, including data scraping tasks and a final "ready" task for comprehensive DAG referencing.

NewsFlow: Aggregating Real-time Headlines Python, Django REST, Kafka, MongoDB Jan 2024 – Mar 2024

• Engineered a robust data flow, extracting news from RSS feeds, validating quality, and channeling data into Kafka topics A and B for seamless synchronization between MongoDB, Elasticsearch, and MinIO storage.

• Implemented a Command Query Responsibility Segregation (CQRS) pattern for effective data management, facilitating separate models for updating and reading information.



Contact this candidate