Machine Learning Software Development Engineer

Location:

Bellevue, WA

Posted:

November 11, 2024

Contact this candidate

Resume:

Vaibhav Piska

Seattle, WA +1-703-***-**** *******************@*****.*** www.linkedin.com/in/mvpiska/ PROFILE:

Accomplished Software Development Engineer and Data Engineer with extensive experience in backend engineering, cloud technologies, data analysis, and machine learning. Proven ability to design and optimize scalable applications using AWS, Java, Python, and a wide range of data engineering tools. Skilled in building ETL pipelines, automating data workflows, and implementing machine learning models to derive actionable insights from complex datasets. Demonstrated expertise in cloud platforms, microservices architecture, and delivering high-performance software solutions. Adept at collaborating with cross- functional teams and mentoring junior engineers. Strong track record of leveraging big data technologies to solve real-world problems and improve business outcomes. Seeking opportunities to drive impactful cloud-based solutions in data engineering and software development.

SKILLS

Programming Languages: Java, Python, C, C++, JavaScript (Node.js), TypeScript Big Data & Machine Learning: Hadoop, Spark (Streaming, PySpark), Airflow, Kafka, Scikit-learn, KNN, Random Forest, SVM Cloud Computing: AWS (S3, Lambda, DynamoDB, CloudWatch, EC2), Glue, Docker Databases: SQL (SQL Server, MySQL), Redshift, DynamoDB, MongoDB Full Stack & APIs: Spring, Hibernate, Microservices, Maven, RESTful APIs Data Visualization: Power BI, Tableau

Development Tools: Git, Jenkins, Docker, CloudFormation, Linux WORK EXPERIENCE

Software Development Engineer – Amazon.com Seattle, WA Sep 2022 – Present

• Spearheaded the automation of reimbursement workflows for the Data Warehouse and Distribution team using Java, AWS Lambda, S3, and Glue, eliminating 150+ manual hours weekly and improving operational efficiency by 20%.

• Designed and developed highly optimized microservices using Java and Spring Boot, handling 10 million+ transactions daily and improving system performance by 30%.

• Implemented robust Java-based caching, rate-limiting, and load-balancing mechanisms to handle traffic spikes, ensuring consistent system reliability and reducing latency by 20%.

• Led a major Java-driven optimization of ETL pipelines, integrating AWS Glue, Python, and Redshift, cutting data processing times by 25% and increasing data accuracy for over 10 million financial transactions.

• Streamlined CI/CD pipelines using Jenkins, Git, Docker, and AWS CloudFormation, reducing deployment times by 40% and improving release reliability by 15%.

• Architected monitoring solutions with CloudWatch and Grafana, coordinating with cross-departmental teams to set metrics that cut troubleshooting time by 30% and amplified system-wide transparency for security and compliance.

• Developed and enforced privacy-centric data access controls with AWS IAM and KMS, creating secure role-based access protocols and encryption standards that met rigorous compliance requirements.

• Enhanced Java-based inventory management systems by implementing event-driven microservices, allowing real-time updates of product availability, improving system responsiveness by 25%.

• Integrated Java with AWS SQS for reliable message queuing between services, reducing message processing delays by 15% and improving overall system scalability during peak loads. Data Analyst Capstone Project – NASK Incorporated – VA Jan 2022 – May 2022

• Developed machine learning models, including K Nearest Neighbor and Random Forest, with Python to predict arrest status from the Chicago Crime dataset, achieving 88% accuracy in classification results, improving data-driven decision- making for public safety.

• Designed and implemented efficient data pipelines in Python, reducing computational time by 30% through optimized algorithms that streamlined data preparation and model training processes.

• Built and deployed interactive dashboards using Brewlytics, enhancing real-time data analytics and providing crime data insights to stakeholders through an improved user interface.

• Enhanced data processing speed by applying multi-threaded algorithms in Python and C++ on Linux-based environments, reducing latency and enabling rapid data ingestion for analysis.

• Analyzed crime pattern trends and identified high-risk areas by visualizing performance data, which allowed actionable recommendations that reduced targeted crime recurrence by 15%.

• Optimized machine learning models in Python to increase accuracy by 12% for high-risk area prediction, leveraging iterative parameter tuning and cross-validation.

• Automated daily data updates using Python scripts to streamline data ingestion and preprocessing, reducing manual interventions and improving data freshness and reliability for daily reporting. Data Engineer – CEDAR IT Hyderabad, India Jan 2019 – Nov 2020

• Engineered automated ETL pipelines using Python and SQL, transforming data workflows to cut manual efforts by 50% and achieve a 30% increase in processing efficiency.

• Optimized complex SQL queries for datasets exceeding 1 million records, cutting query execution time by 25% and significantly lowering data processing costs.

• Led a predictive modeling project utilizing linear regression and ridge regression with Python, enhancing forecast accuracy by 20% for car value predictions on a dataset of over 2 million records.

• Built dynamic dashboards with Tableau and Power BI, delivering insights that improved decision-making efficiency by 15% for business stakeholders by visualizing real-time data trends.

• Orchestrated automated data validation frameworks to boost accuracy and compliance, achieving a 20% reduction in data discrepancies and elevating report quality and reliability.

• Migrated legacy data systems by creating Python automation scripts, reducing manual migration efforts by 80%and improving system scalability for future data growth.

• Designed complex SQL queries for real-time data retrieval in a car value prediction model, reducing query execution time by 20% for large-volume streaming data.

• Integrated external car pricing data via Python-based ETL pipelines, increasing machine learning model prediction accuracy by 15% by enhancing data enrichment and completeness. PROJECT

The Gender Pay Gap in the General Social Survey Jan 2022 – May 2022

• Conducted an extensive analysis of gender pay disparities from 1974 to 2018 using Python and Pandas, identifying critical factors influencing wage gaps and tracking long-term income inequality trends.

• Developed and fine-tuned predictive models with Scikit-Learn to analyze how variables like gender, occupation, and education impacted wage differentials, providing insights for policy recommendations.

• Created data visualizations with Matplotlib to illustrate trends over time, enabling stakeholders to understand shifts in societal norms and their influence on workplace policies.

• Collaborated with a team using Git and GitHub for version control and project management, ensuring seamless integration of updates and efficient workflow coordination throughout the research project. Tech stack: Python, Jupyter Notebook, CSV, Pandas, NumPy, Scikit-Learn, Matplotlib, Git, GitHub Utilization Of Support Vector Machine for Analyzing Women Safety in Indian States Jan 2020 – Jun 2020

• Developed a Support Vector Machine (SVM) model to categorize women’s safety concerns across Indian states based on Twitter data, enabling targeted interventions by identifying high-risk regions.

• Enhanced classification accuracy by incorporating Naïve Bayes and Random Forest algorithms alongside SVM, resulting in a more reliable model for pinpointing unsafe areas.

• Processed and cleaned large-scale social media data using Python and Pandas, ensuring high-quality data input for model training, which increased model performance.

• Published research findings in the International Journal of Grid and Distributed Computing, raising public awareness about women’s safety and promoting educational efforts in high-risk regions.

• http://sersc.org/journals/index.php/IJGDC/article/view/29977 Tech stack: Support Vector Machine (SVM), Python, Pandas, Naïve Bayes, Random Forest, Data Collection and Analysis Tools, Social Media Data Mining

EDUCATION

Master of Science in Data Analytics George Mason University, Fairfax, VA Jan 2021 – Aug 2022 GPA: 3.87/4.00

Relevant Courses: Applied Statistics and Visualization, Big Data Technologies, Data Mining for Business Analytics, Analytics and Decision Analysis

Bachelor of Technology in Computer Science Engineering GITAM University, Hyderabad, India Aug 2016 – Jul 2020 GPA: 3.60/4.00

Relevant Courses: Data Structures and Algorithms, Database Management Systems, Object-Oriented Programming (OOP), Machine Learning, Software Engineering and Web Technologies.

Contact this candidate