Post Job Free
Sign in

Software Engineer Machine Learning

Location:
Redwood City, CA
Posted:
March 07, 2025

Contact this candidate

Resume:

ZHITING YANG

Principal Software Engineer

San Jose/Union City, CA 510-***-**** ********@*****.*** LinkedIn Results-driven, analytical, and performance-focused Software Engineering Professional with a wealth of experience in Machine Learning, Artificial Intelligence, distributed cloud storage platforms, cloud Infrastructure services, website development, and real-time streaming data pipeline. Expertise in Go, Python, C++, Java, TypeScript, React, NodeJS and AngularJS and designing and implementing highly scalable distributed systems and shared services infrastructure in a hybrid cloud environment.

KEY EXPERTISE

Applied AI & Machine Learning System Design & Architecture Troubleshooting & Problem-Solving Continuous Integration & Deployment Database Management Advanced Programming & Coding Performance Optimization Software Development Lifecycle Team Leadership & Training TECHNICAL SKILLS

ChatGPT, Copilot, LLMs, RAG, HuggingFace, DeepSeek, Spark, Change point prediction, SVM, Golang, Python, C++, Java, TypeScript, React, NodeJS, AngularJS, Shell, GCP, AWS, Azure, Azure 900 Certificate, OpenStack, Hadoop, Cassandra, Postgres, Aurora, Elasticsearch, Oracle, MySQL, Jira, Git, ArgoCD, Kubernetes, OpenShift, Kafka, Rabbit MQ, Grafana, Prometheus, Kibana, Docker, Zookeeper, Mesos, Restful, Spring, Hibernate, Pipeline, Thrift, WebSocket, gRPC, Protobuf PROFESSIONAL EXPERIENCE

Principal Software Engineer Calix Inc, San Jose, CA Oct 2023 – Nov 2024 Architected and implemented cloud microservices to process router messages, focusing on optimizing data storage and ensuring the smooth operation of cloud-based infrastructure. Worked with various databases, managed production issues, and mentored colleagues on key technical areas, including data observability. Troubleshot complex issues, optimized data flow, and implemented solutions to improve system performance and reliability.

● Managed and optimized data storage with Cassandra, Presto, PostgreSQL, Aurora, Redis, and ElasticSearch, diagnosing data inconsistencies and improving database performance.

● Designed and developed RESTful and Thrift-based APIs, proposing and implementing both short- and long-term solutions for production issues, using Java, NodeJS, Python, Golang, Scala, and Shell scripting.

● Led cross-functional projects and mentored new colleagues on data observability and microservices design, using tools like Prometheus and Grafana to track system health.

● Developed scalable microservices leveraging Kafka for distributed messaging and AWS tools (S3, EC2, IAM), Kubernetes, and ArgoCD for cloud infrastructure management and deployment.

● Troubleshot complex issues with Kafka messages, CPU spikes, and network instability, collaborating with teams across regions in an agile environment.

● Optimized data storage architectures, resolved production issues such as user account deletion workflows, and supported CI/CD processes with Bamboo, Git, and Azkaban.

● Monitored system performance using Prometheus, Grafana, Zabbix, and Kibana, detect bottlenecks, and track KPIs for continuous improvement.

● Developed GeoUI, a global visualization platform for router deployments, using JavaScript, TypeScript, and React. Senior Engineer Commvault Systems Inc., Santa Clara, CA Oct 2019 – Jul 2023 Architected, Developed and maintained a distributed cloud storage platform, focused on ensuring the platform's functionality, addressing customer complaints, and upgrading S3 ObjectStore production processes. Troubleshot issues related to OpenShift and Kubernetes, enhancing object synchronization, and collaborating with cross-functional teams to improve system performance. Developed and Optimized key S3-type functions and conducting extensive testing to ensure smooth operations.

● Resolved customer complaint by identifying and fixing an API issue where columns in a specific S3 bucket were hidden, drafting a script to manually delete the hidden columns.

● Researched and diagnosed customer issues in OpenShift and Kubernetes, collaborating with the test and Kubernetes teams to resolve CSI version issues in Golang and Docker.

● Developed and fixed S3-type functions for Bucket Expiration/Lifecycle, Object Tagging, and object deletion, ensuring API results aligned with Amazon S3 standards.

● Identified and optimized a performance issue with the Expiration function, enhancing the overall efficiency of the platform.

● Configured and tested Hadoop, RClone, Humio, and CyberDuck, performing integration and performance tests with Commvault backend systems for Linux OS and clusters.

● Assisted in encryption development and the CentOS 8 upgrade, supporting system security improvements and updates.

Senior Software Engineer Hedvig Inc., Santa Clara, CA Aug 2016 – Oct 2019 Architected and deployed distributed storage clusters, ensuring their scalability, reliability, and compatibility with multiple cloud environments. Developed and upgraded object storage functions, optimizing backup processes, and conducting performance and compatibility testing to ensure seamless operation. Improved system performance and functionality.

● Upgraded Hedvig’s OpenStack customized object storage from Kilo to higher versions(Mitaka, Newton, Ocata, Pike), successfully launching Mitaka for customer testing.

● Reconstructed Hedvig S3 bucket and object functions in Thrift and Python, while wrapping AWS S3 API for backups from the Hedvig cluster in Java.

● Developed and implemented SQL Server and Hyper-V backup and restore functions, streamlining disaster recovery processes from Hedvig S3 to the Hedvig cluster.

● Conducted performance testing using COSBench for Hedvig OpenStack/S3, ensuring compatibility with Hadoop and enhancing the S3 adapter’s performance.

● Built a LogCenter for exception and error analysis, identifying and resolving a time series page loading performance issue.

● Built a module for Synchronizing Hedvig S3 data to public clouds like Azure Blob and Amazon S3, enhancing cloud interoperability and data access, optimizing multipart uploads and improving data transfer performance.

● Set up CORS to allow external access to client web applications, improving user experience and accessibility.

● Executes AI/ML approaches to predict the cluster healthy and change point of Time Series Data. IoT Cloud Services Developer Acer Cloud Inc., Sunnyvale, CA Dec 2015 – Jul 2016 Monitored and troubleshot product and development environment where developers contribute daily to code. Contributed to all phases of the software development lifecycle and fixed assigned bugs. .

● Designed and developed high-volume, low-latency, enterprise-grade infrastructure services, enhancing system efficiency and reducing response times for large-scale operations.

● Created and launched cloud infrastructure services and IoT space for consumer platforms, deployed on 100M+ devices using Java 8, AWS/S3, RabbitMQ, Cassandra, Oracle, MySQL, and Tomcat.

● Supported future growth by researching and presenting alternative technologies for architectural review, enabling informed decision-making and scalable solutions for upcoming needs. Manager LinkedRoad Inc., Milpitas, CA May 2015 – Dec 2015 Architected and developed the entire website for LinkedRoad Inc. Implemented Bitcoin price prediction using time series data.

● Led the technical team in implementing the long-distance bus website (takebus.com), supporting routes across 12 states and 61 cities, resulting in enhanced user access and streamlined booking capabilities.

● Optimized scalability functions, improving system performance and capacity by leveraging Android, Spring, AWS/ EC2, AngularJS, Restful/Jersey, JSON, Cassandra, and Python, resulting in a more efficient and reliable platform.

● Designed and developed a Bitcoin price prediction system using time series change point analysis in Python, achieving accurate forecasts and improving market insights. Computer Programmer Fuhu Inc., Milpitas, CA Jan 2013 – May 2015 Architected and implemented a payment module for 10M+ users, including purchase, redemption, refund, delay capture, and promotion using AWS, Spring, Hibernate, Pipeline, Restful, JSON, Hadoop, Protobuf, Cassandra, Oracle, and MySQL. Solution Consultant Watchdata Technologies Pte Ltd., Singapore Mar 2010 – Aug 2011 Architected and designed a mobile application for accessing SIMpass and SDpass, extended products from SIM and SD cards, by analyzing marketing requirements and creating solutions for the sales team. Technical Manager SingaLab Pte Ltd., Singapore Dec 2008 – Mar 2010 Architected and developed the Media Development Authority website using Maven, Spring, Hibernate, Cron Quartz, JSF, Velocity Template, ICEFace, SOA, Workflow, and Oracle. Technical Manager Lonsuo (Beijing) Technologies Co., Ltd., Beijing, China Sep 2007 – Jul 2008 Created modules/ plugins for xTrans, which supported internet logins for mobile phones using Mina, JNI, JMX, SWING, and JDO.

Test Engineer Advanced Systems Development Co. Ltd. (IBM), Beijing, China Jan 2007 – Aug 2007 Tested IBM Quick 8.0 and Sametime on IBM HTTP servers, WebSphere application servers and portals, LDAP, and web servers.

Software Engineer Time Interactive Company, Beijing, China Dec 2005 – Dec 2006 Developed an information platform for the Wireless Bureau, implementing information monitors and control platforms. EDUCATION

Master of Science, Computer Science Aug 2011 - May 2012 University of Albany, State University of New York Albany, NY Master of Software Engineering Sep 2002 - Dec 2004 National University of Defense Technology Changsha, China Bachelor of Mathematics Sep 1997 – Jul 2001 Shanghai Normal University Shanghai, China



Contact this candidate