Post Job Free

Resume

Sign in

Devops Engineer Machine Learning

Location:
Santa Clara, CA, 95054
Posted:
April 02, 2024

Contact this candidate

Resume:

SUBRAMANYAM CHEMMARTHI

Architect - DevOps, CloudOps, SRE, MLOps

Overview:

Highly skilled DevOps, CloudOps Architect with 12+ years of experience desgning and implementing successful DevOps strategies. Skilled in automation, orchestration, security, and evaluating new technologies to drive efficiency and productivity. Ability to guide teams with DevOps best practices, including improved scalability, performance, reliability, and speed to market; providing guidance in technical design, and plan team activities, and providing direction to team members.

Technical Skills:

Infra Automation: Terraform, Ansible, CloudFormation, Terra grunt.

Containerization: Docker, Docker Swarm, Kubernetes

Cloud Providers: AWS, GCP, Azure

CI/CD Tools: Git, Jenkins, Gitlab, Argo CD, Jfrog, Hashi Corp Vault

DORA metrics: Apache dev lake

IDP: Backstage

Languages: Python, SQL, Bash-Script

Monitoring: Prometheus, Grafana, Datadog, New Relic, Splunk

Observability: Loki, Tempo, Faro, Beyla, Open Telemetry

Machine Learning: AWS SageMaker

Service Discovery: Istio, Hashi Corp Consul

Artifact Registry: Docker hub, Harbor registry and AWS ECR

Databases: MySQL, PostgreSQL, Redis

NoSQL: Elasticsearch, Cassandra, Aerospike, MongoDB

Graph DB: Tiger DB, Neo4j

Big Data eco-system: Hadoop, Kafka, zookeeper, spark

Virtualization: VMware vSphere

ITIL Services: Change, Incident and problem management

Business Intelligence: Tableau

Certifications:

AWS certified solutions architect

Educational Qualification:

B.E. (EEE) from Anna University, Chennai 2006

Professional Experience:

Walmart - Bentonville, AR May 2023 – Present

Sr. Architect

Designing, building, and operating multi cloud platforms in AWS, Azure.

Automate cloud infra provisioning with Terraform and Terra grunt.

Ensure successful delivery of solutions with quality, reusability, and sustainability.

Security implementation with Wazuh, Tenable, AWS Guard duty, AWS cloud trail

Setup Prometheus, Mimir Alert manager for metrics in observability

Provision and configure RED metrics with Beyla.

UI observability metrics configured with Faro.

Setup log management with Loki and traces management with Tempo

Provision and upgrade Kubernetes clusters in AWS and Azure.

Configure builds (CI) with Jenkins pipeline, GitHub actions and SonarQube.

Setup continuous deployment (CD) with ArgoCD and Kustomize

Optimize the cost with better FinOps controls in place for cloud environment.

Measure DORA metrics using Apache Dev lake for engineering metrics.

Researched and evaluated new DevOps technologies and tools for improvement.

Work closely with cross platform teams and clients for timely delivery.

Configure and provision AWS SageMaker Machine Learning platform.

Tokopedia, Indonesia (E-commerce) Nov 2020 - May 2023

Architect

Converting the technical requirements of projects into suitable architecture.

Ensure successful delivery of solutions with quality, reusability, and sustainability.

Responsible for configuring, optimizing, and documenting IaaS, Config management practices.

Plan, configure, build, test and deploy applications using CI/CD pipelines and git ops.

Manage and provision container based and serverless infra for workloads.

Analyzing root cause of incidents and suggesting best solution to fix permanently.

Improve cloud financial management to reduce operational costs.

Automate and conduct reliability tests by simulating production load.

Oversee backups and restoration procedures as part of disaster recovery management.

Implement security best practices at software design, infra provision and data protection.

Maintain SLA with observability metrics using APM and monitoring tools.

Origin Energy, UK & Australia July 2019 - Nov 2020

Senior DevOps Engineer

Design and implement infrastructure automation with terraform and ansible.

Define workflow and deploy components with zero-downtime updates.

Design, review and implementation of standards and best practices to ensure quality and high availability in infrastructure stack.

Building applications in docker and deploying in docker swarm and Kubernetes clusters.

Setup alerting and monitoring for infrastructure and applications.

Building and setting up new deployment tools and infrastructure

Working on ways to automate and improve development and release processes.

Testing, analyzing, and reviewing code of others and approve.

Ensuring that systems are safe and secure against cybersecurity threats.

Writing technical and engineering documentation.

Wayz.ai, China Oct 2018 - July 2019

Senior DevOps Engineer

Involved in setup of lifecycle for DevOps design, architect, testing and deployment.

Develop and set up infra as a service with cloud formation and terraform.

Instrumental bringing in new open-source tools for improvement and reducing cost.

Cross-functional interaction with product management, dev, QA for building infrastructure

Automation off provisioning EMR cluster and big data Hadoop cluster with Ambari

InterTrust – Berkeley, CA Sep 2017 - Oct 2018

Senior DevOps Engineer

Design and implement different datacenters in AWS multi zones.

Define and deploy Infrastructure as a code using terraform, ansible & cloud formation.

Automated CI/CD process for deploying applications in different environments.

Define and adhere to SLAs for managing applications in production environment.

Automated patch management on Ubuntu and Redhat servers.

Design and implement monitoring servers using Data Dog and PagerDuty tools for alerting.

VIT, San Ramon, CA Oct 2015 - Aug 2017

DevOps Engineer

Provide and maintain appropriate CI/CD tools like Jenkins, Run Deck.

Building and maintaining cloud-based applications with AWS.

Implemented configuration management with Puppet, Ansible.

Automation of operating systems tasks with python and bash.

Setup development, test, release, and support processes for better operations.

Replicon Inc, Calgary / NTTDATA, Santa Clara, CA Apr 2012 - Oct 2015

Cloud Operations Escalations Engineer

Installation and configuration of load balancing and reverse-proxy using HA Proxy and Squid applications using puppet automation scripts.

Implemented a High Availability Cluster using pacemaker and Coro sync cluster engine with puppet scripts.

Creation of instances in AWS using key pairs for access and make it available to add in ELBs for high availability.

Analyzing of hardware and software failures for various redhat/CentOS servers (core dump and log file analysis)

Performed automated installation/configuration of applications and services using puppet for redhat/CentOS.



Contact this candidate