Srivasthav
SRE / Linux / Devops Engineer
Mail: ****.***@*****.***
phone: 480-***-**** LinkedIn: https://www.linkedin.com/in/srivasthav-d-1014b759/
PROFESSIONAL SUMMARY:
Experience As an SRE/Cloud/Linux, DevOps practices with around 11 years of IT experience, excellent background in wide variety of professional system support and solution-based IT services for RedHat Linux systems administration and AWS. Experienced in administering and managing system in multiplatform environment consisting of Physical, Cloud Based-AWS, Containers/Kubernetes/ Dockers/Terraform and virtualized systems by incorporating best practices in industry.
Build highly available content distribution sites by using CI/CD tools with focus on Ansible, Docker, Maven, Jenkins, Terraform, Kubernetes etc. with hosting in RedHat Satellite and Cloud Environment.
Kubernetes 1.29 is used to orchestrate the deployment, scaling, management of Docker Containers.
Configured Ansible to manage AWS environments and automate the build process for core AMIs used by all application deployments including Autoscaling, and CloudFormation scripts.
Learning to reuse Terraform modules to standardize cloud resource provisioning across environments.
Integrated Terraform with CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI) to enable automated infrastructure deployments. Remote state management using Terraform backends such as AWS S3 with DynamoDB state locking.
Learning Terraform workspaces and variables for environment-based infrastructure management.
Learning Kubernetes security best practices, including pod security policies and network policies.
Led migration of monolithic applications to containerized microservices in Kubernetes.
Automated infrastructure provisioning using Terraform, Helm, and Kubernetes Operators.
Automated deployments using Helm charts and CI/CD pipelines (e.g., Jenkins, Argo CD, GitOps).
Worked on issues related to Hive table tasks, HBase, Kafka lag issues, DRC, QPS, optimizing performance.
Perform Flink task restarts and fix issues related to container timeout exception, adjust sampling rate during outages to consume lag soon. Adjust Rate limits of Kafka cluster to consume lag better during host and infrastructure related issues.
Worked with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
Development of automation of Kubernetes clusters with Ansible, writing playbooks.
Created AWS Lambda which can automatically run code in response to multiple events, such as HTTP requests via Amazon API Gateway, also created Lambda for modifications to objects in Amazon S3 buckets, table updates in Amazon DynamoDB.
Experience working with various AWS services like Ec2,S3,ELB,Auto scaling, Route53, SNS, Cloud Watch, RDS, Dynamo DB, VPC, Cloud Formation, Lambda, Cloud Front, ECS, EKS.
Used Terraform, Ansible and Kubernetes for infrastructure provisioning and application deployment
Hands on integrating with Terraform with Ansible, Packer to create and Version the AWS Infrastructure.
Experience in working on version control systems like SVN and GIT and used Source code management client tools like Bit Bucket, Git Prime, GitHub, and other command line applications etc.
Excellent understanding and hand-on expertise in creating new Custom resources in Kubernetes
Knowledge in configuring and setting up of various security architectures for distributed enterprise applications involving LDAP and LTPA for implementing SSO (Single Sign On) for JAVA applications.
Experience in setting up of Nodes, Data-sources, Virtual Hosts, configuring Session Management.
Designed, implemented reusable APIs for different plugins that needed for enhanced customization options
Worked on Integrations Team, responsible for growing and maintaining the private and public API, as well as external integrations in and out of infrastructure.
Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL. Experience in Oracle supplied packages, Dynamic SQL, Records and PL/SQL Tables.
Certifications:
Aws Certified Devops Administrator
Education:
Graduated in E.C.E, Bachelor of Technology, 2012, India.
Master of Science in Computer Science, Northwestern Polytechnic University, 2017, California.
Technical Skills:
Operating Systems
CoreOS, RHEL 7.X, 8.X & CentOS 7.X.
Languages
JavaScript, jQuery, CSS, HTML, Bootstrap, Java, Python, MySQL, REST, SOAP, Python.
Versioning Tools
Bit Bucket, Subversion Clear case, GIT and Perforce.
CI Tools/Build Tools
Hudson/Jenkins, Bamboo, ANT, MAVEN.
Automation Tools
Docker, Ansible, Jenkins, Groovy, Kubernetes, Terraform.
Web/App servers
Apache, WebSphere, WebLogic, Jboss, Tomcat
RDBMS
Oracle, SQL SERVER, MY SQL, DB2, Cassandra.
Networking Services
SNMP, SMTP, TCP/IP, IPX/SPX, OSPF, BGP, IGRP, EIGRP.
AWS Services
EC2, S3, ELB, Auto-scaling Services, EBS, Cloud Front, Relational Data Base, VPC, Route 53, Cloud Watch, Cloud Trial, IAM,SNS.
PROFESSIONAL EXPERIENCE:
Client: TikTok USDS September 2022 – Present
Location: San Jose, CA.
Role: Site Reliability Engineer
Project Summary: Responsible for improving lifecycle of Ads platform systems/services which process critical events like click, pause, convert, dispatch, billing— from Union dispatch, Ad Log, Octopus, Billing followed by streaming tasks which generates reports to advertisers, performed reviews, deployment, operations.
Responsibilities:
Worked on issues related to Hive table tasks, Hbase, Kafka lag issues, DRC, QPS issues, optimization.
Perform Flink task restarts and fix issues related to container timeout exception, adjust sampling rate during outages to consume lag soon. Adjust Rate limits of Kafka cluster to consume lag better during host and infrastructure related issues.
Developed reusable Terraform modules to standardize cloud resource provisioning across environments.
Integrated Terraform with CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI) to enable automated infrastructure deployments.
Remote state management using Terraform backends such as AWS S3 with DynamoDB state locking.
Used Terraform workspaces and variables for environment-based infrastructure management.
Learning Kubernetes security best practices, including pod security policies, network policies.
Involved in migration of monolithic applications to containerized microservices in Kubernetes.
Learning infrastructure provisioning using Terraform, Helm, and Kubernetes Operators.
Learning how to deployments using Helm charts and CI/CD pipelines (e.g., Jenkins, Argo CD, GitOps).
Experience in monitoring and logging tools such as Grafana, ELK stack, Logic monitor, Strong problem-solving and troubleshooting skills, with ability to analyze and resolve issues.
Process real time streaming tasks for Ads platform running on Flink and Kafka followed by improving the efficiency by tweaking resources and task scheduling, distributing across schedule times, automate operations and improve efficiency by defining rules.
Optimize Flink tasks to recover Checkpoints, Kafka lag issues, Exceptions, slow node, TM issues timely.
Fix issues related to user application exited permission errors, Parquet Crypto Runtime Exceptions.
Fix any Missing block exceptions in HDFS path, Fix platform issues like common-09, common-13 etc.
Client: AAA Insurance, Dec2019 – Sep2022
Location: Phoenix, AZ
Role: Sr. Linux Engineer
Project Summary: As part of adopters of Cal state project for AAA Auto, I was helping to create the scope of the migration across the organization to on Prem servers to Cloud using Kubes pray etc.
Responsibilities:
Developed and contributed AWS Cluster application to Cal state project on AAA Platform.
Responsible for Automation of AAA NCNU On premise Infrastructure along with upgrading RHEL using automated tools and coordinating with the respective project managers.
Applied patches monthly to meet audit requirements using Red Hat Satellite server, YUM and RPM tools.
Resolved configuration issues, problems related to OS, mounts, LDAP, user ids, Networking, DNS.
Experience in addressing networking issues and setting up bonding
Implement VMWARE V-Realize Orchestrator Automation tool in AAA datacenter environment, which includes IaaS Components, Microsoft SQL Server, Authentication Services, VRA Business appliance and Orchestrator appliance.
Use Ansible Tower to create Playbook for end-to-end security patching on RHEL & AWS servers.
Prepare Ansible Playbooks for configuration management and application deployments.
Finding out the right database and application server candidates for Cloud Migration and migrating from on-premise to Oracle Cloud and AWS and do their compatibility tests and analysis.
Created Tables, Views, Constraints, Index (B Tree, Bitmap and Function Based).
Identify application from primary datacenter at Salt Lake City, UT for the disaster recovery requirements such as Active – Active, Active – Passive, RTO and RPO.
Setting up of Infrastructure application such as Active Directory, Single Sign On, DUO Auth proxy and Ops-View Monitoring as Active – Active.
Created Ansible roles in YAML and defined tasks, variables, files, handlers and templates.
Created inventory, configured Ansible files for deployment in Ansible for automating Continuous delivery process, Implemented Ansible system to push changes to many or all servers in inventory
I wrote the Ansible playbooks which is the entry point for Ansible provisioning, where the automation is defined through tasks using YAML format. Run Ansible Scripts to provision Dev and prod servers.
I wrote Ansible Playbooks using YAML to Manage Configurations of AWS Nodes on AWS EC2.
Day to day focus on Vulnerability management on the servers and make sure all the vulnerabilities are resolved within SLA on RHEL servers with eye on continuous monitories in ServiceNow.
Client: AT&T, Mar 2018 – Sep 2019
Location: Los Angeles, CA
Role: Devops Engineer
Project Summary: As part of adopters of Kubernetes project for DIRECTV Now, I was helping to create scope of migration across organization to on Prem servers to Cloud using Kubes pray etc. Demonstrated skill using Kubernetes in a pre-prod environment, and was vocal about the simplicity and business value that Kubernetes offer through DIRECTV Now.
Responsibilities:
Developed and contributed AWS Cluster application to Kubernetes project called Open Video Platform.
Recently upgraded all prod clusters to 1.11.9 with Kubes pray 2.5.0
Created easy migration process via ARK (Valero) backup and restore for prod clusters (lift and shift microservices across environments).
Upgraded Docker Image registry to 2.7.2 to match industry standards.
Implemented AWS Autoscaling for all our pod cluster to scale microservices as per load and traffic.
I worked with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either to AWS Cloud.
Development of automation of Kubernetes clusters with Ansible, writing playbooks as per use case.
Extensively used Docker, Jenkins, CoreOS, Kubernetes, Ansible & Bit Bucket, AWS, Clair.
Worked with logging system of Kubernetes pods to check the logs and find bugs and rectifying them.
Created Cloud formation JSON templates in Terraform for infrastructure as code.
Implemented Terraform modules for deployment of applications across multiple cloud providers.
Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS Resources.
Designed Terraform templates to create custom sized VPC, subnets, NAT to ensure successful deployment of Web applications, database templates, migration from traditional to cloud environment.
Worked on AWS Components, EC2,S3,VPC,Subnets,IAM,Route53,CloudFormation,Cloudwatch.
Configured Ansible to manage AWS environments and automate the build process for core AMIs used by all application deployments including Autoscaling, and CloudFormation scripts.
Created a Continuous Delivery process to include support building of Docker Images and publish into a private repository- Docker trusted registry.
Created custom python APIs to cope with vendors unable to expose Json/xml endpoints, Requests was used to facilitate interaction with the vendors' web applications
Created authorization scheme which enforced limitations on API clients (with emphasis on securing data)
Hands on in Client-Server application development using Oracle 11g/10g, PL/SQL, SQL *PLUS, TOAD.
Effectively made use of Table Functions, Indexes, Table Partition, Collections, Analytical functions.
Experience in Oracle supplied packages, Dynamic SQL, Records and PL/SQL Tables.
Worked extensively on Ref Cursor, External Tables and Collections.
Client: State Street Bank, Feb 2017 – Feb 2018
Location: Quincy, MA
Role: Build/Release Engineer
Project Summary: Key contributor within Network Segmentation project initiative involving architecture and development of program modules to retrieve and process various device configuration in Illumio, Palo Alto, Cisco products using Ansible, Perl, Python and stores network configuration files from routers throughout State Street Global Network.
Responsibilities:
Worked on writing playbooks for new products in organization such as Illumio (Rule sets, PCE, VEN agents, workloads, labels, payloads etc.), Palo Alto and CISCO ACI networks.
Experienced in Installing, Configured and management in Ansible Centralized Server (Tower) and creating the playbooks to support various middleware application servers, and involved in configuring the Ansible tower as a configuration management tool to automate repetitive tasks.
Involved in Configuration Automation and Centralized Management with Ansible, Implemented Ansible to manage all existing servers and automate the build/configuration of new servers.
Developed and maintaining self-service automation job(s) (delivered to engineering and QA team) as per project(s) requirement which reduce 80 percent of DevOps operation tickets.
Support global customers with customized RHEL configurations and troubleshooting through our global queue system.
Take customer calls assisting them in building out their infrastructure with our product offerings.
Mentoring junior team members through shadowing and classroom curriculum.
Collaborated in team development and expansion, Manage and administer our team lab presence.
Created break fix environment for RHEL using VMWare to train team members.
I Created, maintained and documented PL/SQL scripts and stored procedures.
Designed, developed and maintained data extraction and transformation processes and ensured that data is properly loaded and extracted in and out of our systems.
Designed and developed analytic and reporting solutions using Oracle PL/SQL and MOD PL/SQL.
Solid understanding of inheriting PL/SQL object types, casting PL/SQL object types into descendant types, converting PL/SQL object collections into a result set, bulk operations and pipelined functions.
Client: Intraedge Consulting, Jan 2014 – July 2015
Location: Pune, India
Role: Systems/Linux Engineer
Project Summary: I Served as System/Linux Engineer responsible for installation, system maintenance and troubleshooting of over 80 Red Hat Enterprise Linux servers, located across 7 remote locations, housing applications utilized for creation of disk and books on "as-demanded" basis. First-tier responder to network-related issues with 7 production sites. Create and manage several access control lists (ACLs) implemented on PA aggregate routers within production
Responsibilities:
Identify and address technical or operational risks, Manage Tomcat and Apache web servers
Script in Perl, Python and Unix Shell to automate processes and simplify tasks
Manage end user accounts, permissions, access rights, and storage allocations in accordance with best-practices regarding privacy, security, and regulatory compliance, as defined
Support application development teams
Analyze systems, servers, applications, networks, and input/output devices performance
Recommend, schedule, and perform software and hardware improvements, upgrades, patches, reconfigurations, and/or purchases
End-user training regarding branching strategies for all Subversion users to effectively use the tool.
Automated the build and release management process including monitoring changes between releases.
Built and Deployed Java/J2EE to a web application server in a continuous integration environment.
Worked on bug tracking named as Bugzilla, track the bugs and push them forward to the developer team.
Client: Solvent Software, Jan 2012 – Dec 2013
Location: India
Role: System Admin
Responsibilities:
Wrote SQL queries to retrieve data from the database using JDBC.
Utilized frameworks such as Hibernate and Spring for persistence and application layers.
Designed and developed OLAP Cubes and Dimensions using SQL Server Analysis Services (SSAS).
Work with project teams for upgrading SQL Server 2005 to SQL Server 2008.
PowerShell scripting used for pulling data from Active Directory.
Analyzed the ANT Build projects for conversion.
Managing Maven project dependencies by creating parent-child relationships between projects.
Maintained the branching and build/release strategies utilizing Subversion in Linux environments.
Designed and developed OLAP Cubes and Dimensions using SQL Server Analysis Services (SSAS).
Work with project teams for upgrading SQL Server 2005 to SQL Server 2008.
Used SOAPUI to test the SOAP and Rest based web services.
Involved in the complete lifecycle of the project from strategizing to implementation of the test framework.
Responsible for the build validation process on an ongoing basis.
Involved in following the code Quality Guidelines and implementing it across the teams.