SRE/DevOps Engineer & Multi-Cloud Specialist with 12+ years of experience in designing, automating, and managing cloud-native solutions on AWS and Azure. Expertise in Infrastructure as Code (IaC), CI/CD pipelines, container orchestration, observability, and cloud security for enterprise applications. Proven ability to lead DevOps teams and implement scalable solutions that improve deployment efficiency, system reliability, and cost optimization.
Skills:
CI/CD Implementation — Jenkins, GitLab CI/CD, Azure DevOps, Teamcity
Infrastructure as Code (IaC) — Terraform
Cloud Platforms — AWS, Azure, VMware Cloud
Kubernetes & Containerization — Kubernetes tools like Rancher, EKS, ECS, AKS.
Configuration Management — Ansible
Monitoring & Observability — Datadog, CloudWatch, SolarWinds, Nagios, Tivoli, Splunk
Version Control & GitOps — Git, Bitbucket
Scripting & Automation — Shell, Python, PowerShell scripting
Security & Compliance — Secret Management (Vault, AWS Secrets Manager), KeePass.
Agile & Collaboration — Jira, Confluence, ServiceNow, Salesforce.
Work Experience:
Senior Staff - Site Reliability Engineer
Altimetrik India Pvt Ltd
Mar 2025 till date
Project: Broadridge Finance Solutions (August 2025 to till date)
Served as offshore SRE lead while contributing as a hands-on IC, mentoring team members and owning production reliability.
Ensured high availability, reliability, and performance of cloud-native applications running on AWS (EC2, EKS, S3, RDS, SQS) with defined SLIs, SLOs, and error budgets.
Automated infrastructure provisioning and configuration using Terraform, Groovy, and AWS SSM/Parameter Store to reduce manual intervention and improve system resilience.
Implemented robust monitoring, alerting, and observability using CloudWatch, Datadog, and Splunk, including proactive incident detection and RCA.
Managed security groups, IAM policies, and secrets to enforce least-privilege access and meet security and compliance standards.
Supported Kafka streaming platforms using AWS MSK and Confluent, monitoring throughput, latency, and consumer lag via Confluent dashboards.
Optimized CI/CD pipelines with Jenkins and JFrog, enabling reliable deployments, rollback strategies, and reduced MTTR.
Performed AMI rotation, patching, and system upgrades with minimal downtime, ensuring stability and operational excellence.
Managed DB (RDS, Postgres), Jenkins pipeline configurations, and centralized log management (Splunk).
Collaborated with engineering and product teams through JIRA, leading incident response, postmortems, and continuous reliability improvements.
Project: Lulu lemon (March 2025 to July 2025)
Led a team of engineers in the design and implementation of observability solutions as per client request.
Designed and implemented end-to-end observability using Datadog, covering metrics, logs, traces, and synthetic monitoring across distributed systems and cloud infrastructure.
Developed custom dashboards and service maps in Datadog for real-time visibility into system health, performance, and dependencies.
Integrated application performance monitoring (APM) for setting up the services team and tags to the relevant repo/program to appear in dashboard.
Making changes in repo in GitLab for code changes for helm charts to get metrics.
Collaborated with development and infrastructure teams to define SLIs/SLOs/SLAs, tying observability to business outcomes and operational excellence.
Tools used AWS console, GitLab, Rancher, Jira confluence, VS studio IDE
Site Reliability Engineer
Brightly Software India Pvt Ltd
Mar 2022 to March 2025
Tenure: 3 years
Managing and leading the Devops team as backup lead.
Cloud administration in Aws IAAS AND PAAS as below:
AWS services such as EC2, EBS, S3, RDS, Lambda, and Elastic Container Service (ECS), EKS, IAM, CLI, etc.
Cloud administration in AWS IAAS AND PAAS
Experience with automation and scripting tools such as Ansible, terraform in automating the deployment and scaling of the SaaS solution on AWS.
Experience with CI/CD tools such as Jenkins, Octopus, Bit buckets.
Experience in Monitoring and Logging tools such as CloudWatch & Prometheus -Grafana is used for monitoring the performance and troubleshooting issues with the SaaS solution on AWS.
Experience in Security: tools such as IAM, KMS, and VPC is essential for securing the SaaS solution on AWS.
Monitoring in data dog.
Worked on Automation in terraform & PowerShell.
Databases in Ms SQL, RDS, RedShift, Dynamo DB.
Handling team members responsible for building deployment automation and managing site reliability, availability, and performance.
Tools used: Cisco AnyConnect for VPN, Duo Mobile & Microsoft Authentication, Jira confluence.
Knowledge in Grafana & Prometheus.
Setting up SLA, SLI and SLO standards for the client.
Helping Management with Error Budget calculations.
Designed and implemented high availability and disaster recovery for applications at physical, network and data layer using services and features like availability zones, load balancers, traffic manager and geo-replication.
Drove service improvements based on optimization categories i.e. reliability, security, performance, operational excellence, and cost.
Authored terraform templates to automate cloud infrastructure.
Optimised network performance and internet usage cost by implementing service and private endpoints.
Managed DB (RDS, Postgres), Jenkins pipeline configurations, and centralized log management (Splunk).
Secured the network by implementing NSG, firewall and VPN services.
Cloud support Analyst
Trimble Information Technologies India Pvt Ltd
Sep 2020 – Nov 2021
Tenure: 1 years 3 months
•Managing day-to-day activity of the cloud environment for IAAS AND PAAS supporting development teams with their requirements.
•Monitoring server via solar winds and handling tickets via NetSuite.
•Installation, configuration and Administration of 2008, 2012, 2016 & 2019 Desktops and Applications and adding CPU, RAM, HD
•Experience in VSphere client &Pathing update via Kaseya
•Installing, Configuring & Trouble shooting of VMware ESXi, Citrix servers & Citrix workspace.
•Handling infrastructure of Azure services/servers and TeamViewer Issue for windows.
•Maintain Windows 2000 Servers, Active Directory servers, DNS, and DHCP.
•Backing up & restore data through Windows native tool
•Administrating User, Group and OU accounts and Configuring group policies for all types of users to access resources on the servers and also in AD.
•Maintaining backups and providing solutions for OS problems.
•Building new environments for development and test in Azure.
•Management and Support of ADS, DHCP, DNS, TS, Remote Desktop mgmt., Disk Quotas, and DFS.
•Experience in creating & managing to compute, networking and storage concepts on Microsoft Azure.
•Creating VM for windows servers & New disk to VM’s in Azure.
•Monitoring the Health Status of the VM’s in Azure
•Troubleshoot Azure related issues and engage internal teams and vendor for issue resolutions.
Infrastructure analyst
Citibank - Citicorp service India Pvt Ltd
Apr 2016 – Aug 2020
Tenure: 4 years 4 months
•Provide Operations Support & Monitoring servers (Windows, Unix, Linux, Solaris) and Batch jobs using Tivoli Enterprise Console.
•Timely response and resolution to Wintel/Unix reboot alerts, service down alerts & CPU usage alerts.
•Monitoring and handling infrastructure hardware issues for Wintel/Unix servers, Cloud, Dell EMC, SAN, NAS devices in coordination with Vendors-HP/Dell and Local Site support teams across the globe.
•SLA Management, Incident management, Change management skills.
•Meticulously follow global infra and application freezes to make sure no impact to the business.
•Gathering business approvals by representing the coordinated Service Now Change records in Regional/Application/Infrastructure Change Advisory Board Meetings.
•End-to-End Change Coordination for hot-swappable devices in collaboration all support teams and perform server checkouts post change implementation to ensure no hardware failures exist through iLO.
•Batch job manipulation processing across vast CITI infrastructure using SSH Tectia/SecureCRT tool.
•Monitoring MQ services using Nastel Tool for Node, Queue Manager, Queues and depths, Channel status, try to resolve or escalate the issues to next level support team.
•Assisted on MQ COB Tests in Channel Start, Stop, Reset, Ping test on requests and coordinate till activity closure.
•Monitoring Middleware alerts through CAS tool, Stop/Start services or processes on requests.
•Gathering client requirement for various use cases and managing Datacenter.
•Monthly Windows Patching for all Physical, and cloud servers.
•Define, implement and maintain corporate security policies.
•Installing, configuring Linux/Windows servers and actively manage, improve, and monitor cloud infrastructure on AWS.
•Setup/Managing Linux Servers on Amazon (EC2, EBS, ELB, SSL, Security Groups, RDS and IAM) IAAS.
•Setup/Managing VPC, Subnets; make the connection between different zones; Blocking suspicious ip/subnet via ACL.
•Create/Managing buckets on S3 (CLI) and store db and logs backup, upload images for CDN serve.
•Setup/Managing Databases on Amazon RDS. Monitoring servers through Amazon CloudWatch, SNS.
•Creating/Managing DNS records on Amazon Route 53 and go-daddy panel.
•Creating/Managing AMI/Snapshots/Volumes, Upgrade/downgrade AWS resources (CPU, Memory, EBS)
•Creating AWS Instances and Resources Bills
•Devops tools like Git, Jenkins, Ansible and Docker.
•Hands on experience in OpsWorks, Elastic Cache, Dynamo DB, Cloud formation and ECS.
Assistant systems engineer
CSS Corp Pvt Ltd
Jun 2014 – Mar 2016 Tenure: 1 years 9 months
•Managing RHEL, CentOS, Debian, Ubuntu and Windows 2008/2003 servers in cloud environment Coordinating with the L3 Administrators and the developers in the case of deployment and investigating the issue on the production environment.
•Troubleshooting network and connectivity issues on the VMs
•Managing Forward and Reverse zones on the Name servers
•Configuring Cloud storage on the servers across multiple Data centers in private cloud(GoGrid)
•CDN Provisioning and preliminary troubleshooting
•Analyse reported latency and performance issues on the VMs
•Perform Data recovery from the crashed VMs to other servers
•Handling Cloud and Dedicated Server Infrastructure with GoGrid/ServePath.
•Hands on experience on installing, configuring & administrating redhat and other linux distribution servers.
•Hands on experience on installing, configuring & administrating Windows servers.
•Hands on experience on DNS configuration (Domain parking, adding/updating records and rDNS configuration).
•Hands on experience on Web Server, FTP, NFS and SMB configuration.
•Troubleshooting day to day cloud / dedicated server and its performance related issues.
•Hands on experience on cloud server recovery and cloud server login credential issues.
•Tighten security on Windows and Linux servers
•Cloud storage configuration and troubleshooting across Linux and Windows environment.
•F5 load balancer basic troubleshooting.
Junior Engineer
Comodo Security Solution Pvt Ltd
Jul 2013 – Jun 2014
Tenure: 1 years
•Hands on Experience in Data analysis
•Installation, Configuration, Maintenance & Trouble shooting of Linux Box's
•User/group management
•Configuration and maintenance of Samba servers
Achievement / Awards:
•2018 & 2019 Analyst of the year - Received an individual award on 2018 and 2019 during my service in Citibank.
•RISE & SHINE AWARD on Dec’2025 for extraordinary achievement & performance at work.
Certification / Trainings:
•Certified in AWS Certified Solutions Architect – Associate
•Certified in Azure Devops AZ400 & Administrator AZ 104
•Certified in ITIL (GR750354773DV) and ITSM foundation (6159947.20726324).
•Red Hat Certified System Administrator - Red Hat Enterprise Linux 6 Certification# 140-096892
•Certificate of completion on Devops from Intellipaat & SRE foundation and Practitioner from GSDC.
Education
•Bachelor of Engineering, from Arignar Anna Institute of science and Technology India.
Declaration
I, Dinesh V, hereby declare that the information contained herein is true and correct to the best of my knowledge and belief.
DATE: (DINESH V)