Yashesh Pandya
Seattle, WA • 203-***-**** • ad462z@r.postjobfree.com • www.linkedin.com/in/yashesh-pandya/ SUMMARY
• Nearly 4 years of experience designing and implementing robust data pipelines for diverse industries.
• Proficient in Apache Spark for processing large-scale data, ensuring optimal performance. Experienced in data modeling, ETL development, and translating complex business requirements into scalable solutions.
• Skilled in AWS for scalable, cost-effective data solutions. Proficient in Apache Airflow for seamless coordination and monitoring of data workflows.
• Skilled in Python, SQL for precise data validation, performance tuning, and delivering growth-oriented insights through cross- functional collaboration. Adaptable to emerging tech for innovative solutions. SKILLS
Programming Languages: Python, R, SQL, SAS
Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow IDEs: Visual Studio Code, PyCharm, Eclipse
Visualization Tools: Tableau, Power BI, Excel
Databases: MySQL, MS SQL Server, MongoDB, Oracle Database Cloud Technologies: Amazon Web Services (AWS), Azure, Google Cloud Machine Learning: Linear Regression, Logistic Regression, Decision Trees, Supervised Learning, Unsupervised Learning, Classification, SVM, Random Forests, Naive Bayes, KNN, K Means
Methodologies: SDLC, Agile, Waterfall
Operating Systems: Windows, Linux, MacOS
Other Technical Skills: Hadoop, Spark, Kafka, Snowflake, Redshift, Big Query, Apache Airflow, Informatica, Talend, Git, GitHub, SVN
Soft Skills: Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving EDUCATION
Master of Data Science - University at Buffalo, New York Jan 2022 – Jun 2023 Bachelor of Engineering in Computer – Gujarat Technological University, India Aug 2016 – Jan 2021 PROFESSIONAL EXPERIENCE
Data Engineer Freddie Mac, New York Jun 2022 – Current
• Conducted critical project at Freddie Mac to optimize data processing and pipelines for mortgage-backed securities, enhancing decision-making speed and accuracy.
• Developed high-performance and efficient data pipeline handling 1.5TB daily data ingestion from diverse sources.
• Deployed Apache Spark for distributed and parallel data processing, achieving 30% reduction in processing time.
• Utilized Apache Airflow to orchestrate and monitor data workflows, ensuring timely execution and improved efficiency.
• Employed Amazon S3 and Redshift for scalable data storage and retrieval, supporting the system's ability to handle increasing data volume.
• Collaborated closely with data analysts and stakeholders to understand requirements and deliver accurate solutions.
• The project significantly improved data processing speed and accuracy, positively impacting operations.
• Expertise in cutting-edge technologies and collaborative approach delivered a high-impact data processing solution. Data Engineer Exert Infotech, India Jun 2019 – Dec 2021
• Led challenging project at Exert Infotech to fine-tuning data execution and analysis for financial services client.
• Improved data accuracy, reduce processing time, and enhance data reliability for better decision-making.
• Leveraged Apache Spark for distributed data processing, achieving 40% reduction in processing time.
• Implemented data validation and quality checks using Python and SQL, resulting in 98% data accuracy.
• Utilized cloud-based solutions like AWS S3 and Glue for seamless scalability and efficient data storage and ETL processes.
• Collaborated closely with multidisciplinary team, data analysts, and business stakeholders.
• Ensured data security and privacy throughout the project and showcased expertise in data engineering, leading complex initiatives with tangible results.
• Empowered client's financial services operations with faster data insights and improved responsiveness to market changes.
• Project's success positively impacted client's data-driven decision-making capabilities, benefiting financial performance.