Amani Reddy Veluri
Sr.Cloud Data Engineer
linkedin.com/in/amani-reddy-veluri-a862582b7
Email: **************@*****.***
Contact: +1-469-***-****
PROFESSIONAL SUMMARY:
Over 7 years of experience in Data Engineering and Python Development, specializing in designing, developing, and optimizing enterprise-level data models, ETL pipelines, and data warehousing solutions to enhance business intelligence and decision-making.
Expertise in Big Data storage, processing, and analysis across AWS, Azure, and GCP, ensuring scalable and efficient data workflows.
Extensive experience with distributed computing architectures, including Hadoop, Snowflake, Apache Spark, and Python-based data processing, leveraging MapReduce and SQL for managing large datasets.
Strong proficiency in AWS cloud services, including EC2, S3, RDS, IAM, Glue, Kinesis, Lambda, EMR, Redshift, and DynamoDB, with expertise in designing, configuring, and managing cloud environments. Skilled in CloudFormation for automating infrastructure deployment using Infrastructure as Code (IaC) principles.
Hands-on experience in real-time data streaming and processing with Apache Kafka and Spark Streaming, ensuring seamless data ingestion, transformation, and analysis.
Proficient in data preprocessing, cleaning, and transformation using Python libraries such as Pandas, NumPy, and PySpark, ensuring high data quality for analytics and reporting.
Strong expertise in SQL and PL/SQL, working extensively with Teradata, Oracle, and NoSQL databases, optimizing queries, stored procedures, and database schemas.
In-depth knowledge of data modeling techniques, including dimensional modeling, Star & Snowflake Schema, OLTP/OLAP systems, and normalization vs. denormalization strategies to enhance data storage and retrieval efficiency.
Experience in ETL orchestration and automation using tools like Talend, Informatica, and Apache Airflow, enabling seamless data integration across multiple sources and platforms.
Skilled in the Hadoop ecosystem, including HDFS, MapReduce, Hive, Pig, HBase, Storm, Oozie, Sqoop, and Zookeeper, managing large-scale data storage and distributed processing.
Experience in containerizing data processing applications with Docker and orchestrating them using Kubernetes to ensure scalability and fault tolerance.
Proficiency in Apache Spark components, including Spark Core, Spark SQL, Spark Streaming, DataFrames, Datasets, and Spark ML, for building high-performance data pipelines and machine learning applications.
Strong background in data ingestion from various sources using Sqoop, efficiently transferring structured and unstructured data into HDFS and Hive tables for analytics and processing.
Expertise in cloud infrastructure automation using CloudFormation, Terraform, and CI/CD pipelines to streamline cloud-based data environment deployment and management.
Hands-on experience with Jenkins, GitHub Actions, and AWS CodePipeline for automating deployment, testing, and monitoring of ETL pipelines.
Extensive experience in cloud architecture, development, and data analytics within AWS ecosystems.
Proficient in version control and collaboration tools, including Git, BitBucket, and SVN, ensuring seamless code management and integration.
Strong experience with C# for software development and integrating data platforms for seamless data processing.
Exceptional analytical and problem-solving skills, optimizing data workflows, improving system performance, and driving data-driven decision-making for enterprise solutions.
Comprehensive knowledge of Big Data Analytics, including installation, configuration, and utilization of ecosystem components such as Hadoop MapReduce, HDFS, HBase, Zookeeper, Cloud Functions, Hive, Sqoop, Pig, Flume, Cassandra, Kafka, Spark, Oozie, and Airflow.
Proficiency in Relational Databases like MySQL, Oracle, and MS SQL Server, as well as NoSQL databases such as MongoDB, HBase, and Cassandra.
TECHNICAL SKILLS:
AWS
EC2, S3, Glacier, Redshift, RDS, EMR, Lambda, Glue, CloudWatch, Kinesis, CloudFront, Route53, DynamoDB, Code Pipeline, EKS, Athena, Quick Sight
ETL Tools
AWS Glue, Azure Data Factory, Airflow, Spark, Sqoop, Flume, Apache Kafka, Spark Streaming
Programming & Scripting
Spark Scala, Python, Java, MySQL, PostgreSQL, Pig, HiveQL, UNIX Shell Script
Data Warehouse
AWS RedShift, Snowflake, Teradata.
SQL and NoSQL Databases
Oracle DB, Microsoft SQL Server, PostgreSQL, MongoDB and Cassandra
Monitoring Tools
Splunk, Chef, Nagios, ELK
SourceCode Management
Bit-Buckets, Nexus, GitHub
Containerization
Docker, Kubernetes, OpenShift
Hadoop Tools
HDFS, HBase, Hive, YARN, MapReduce, Pig, HIVE, Apache Storm, Sqoop, Oozie, Zookeeper, Spark, SOLR, Atlas
Build & Development Tools
Jenkins, Maven, Gradle, Bamboo
Methodologies
Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE:
Client: Principal Financial Group Inc, Des Moines, IA Jun 2021-Present
Role: AWS Data Engineer
Responsibilities:
Designed and implemented real-time data streaming pipelines using AWS Kinesis to collect and process financial transaction data and market feeds, ensuring low-latency ingestion and processing for timely insights.
Utilized Apache Kafka and Spark Streaming to handle high-throughput data ingestion and real-time analytics, optimizing the processing of financial market data and trading metrics.
Integrated Flume and Sqoop to facilitate seamless movement of structured and unstructured financial data, supporting ETL workflows for transactional systems and data lakes.
Optimized complex SQL queries and PL/SQL procedures for financial data in databases like Teradata, PostgreSQL, and Oracle, ensuring efficient storage and retrieval of large datasets while meeting SLAs for real-time data analysis.
Engineered Python-based ETL pipelines with AWS Glue and PySpark, processing large volumes of financial data at scale, with automated data cleaning, transformation, and enrichment.
Established a scalable data lake architecture on Amazon S3 and Glacier, providing cost-effective storage for structured financial datasets, market data, and unstructured reports.
Leveraged AWS Redshift, Snowflake, and Teradata for optimized operational data storage and retrieval, supporting high-performance queries and analytics on financial data.
Developed and maintained RESTful APIs for seamless integration between financial applications, ensuring secure and scalable data exchange between systems.
Led the migration of raw data to AWS S3 and performed refined data processing, leveraging AWS EMR for large-scale data transformation and movement.
Designed and implemented a Data Lake infrastructure on AWS Cloud, utilizing services like S3, EMR, Redshift, Athena, Glue, EC2, RDS, VPC, IAM, CloudWatch, SNS, SQS, Kinesis, and DynamoDB to support various data tasks including analysis, processing, storage, and reporting.
Utilized serverless architecture with API Gateway, Lambda, and DynamoDB, deploying AWS Lambda functions triggered by events from S3 buckets.
Created external tables with partitions using Hive, AWS Athena, and Redshift for efficient data organization and querying.
Designed and developed ETL jobs to extract data from multiple sources and load it into Data Lake or Data Mart in Redshift.
Implemented automated ETL workflows using AWS Lambda, S3, EMR, Glue, and Redshift for seamless data processing and integration.
Developed and maintained intricate SQL scripts, indexes, views, and complex queries for comprehensive data analysis and extraction.
Managed relational databases on AWS RDS, writing optimized SQL queries for financial reporting and performing regular maintenance to ensure high availability.
Implemented security policies with AWS IAM, KMS, and CloudTrail to ensure compliance with financial industry standards, including encryption, access control, and audit logging.
Supported CI/CD pipelines using Jenkins and GitHub Actions, enabling continuous delivery of financial applications with zero downtime and enhanced security.
Collaborated in Agile sprints, ensuring timely delivery of features related to financial data processing, including real-time market data pipelines and financial reporting systems.
Optimized data processing with Apache Spark on AWS EMR, accelerating processing for financial transaction logs, market data, and portfolio analytics.
Developed interactive dashboards with AWS QuickSight, Tableau, and Splunk to visualize financial metrics, market trends, and real-time operational performance.
Processed financial transaction logs and market feeds for predictive analytics, enabling better forecasting, fraud detection, and proactive risk management strategies.
Configured AWS CloudWatch and Nagios for real-time monitoring and alerting on data pipeline performance, ensuring rapid response to operational issues in financial systems.
Implemented data governance frameworks using AWS Lake Formation and Apache Atlas to ensure compliance with financial data regulations and industry standards.
Automated infrastructure provisioning using AWS CloudFormation, Terraform, and Kubernetes for scalable and reproducible financial data pipelines and applications.
Deployed containerized financial data processing applications with Docker, Kubernetes, and OpenShift, ensuring consistency, portability, and high availability.
Optimized SQL and ETL processes for financial data to ensure faster report generation, better decision-making, and efficient querying of large-scale datasets.
Utilized advanced data security techniques, ensuring that financial data is secure at rest and in transit, adhering to strict regulatory requirements and compliance standards.
Client: St Vincent Health, Indiana Jul 2019- Apr 2021
Role: AWS Data Engineer
Responsibilities:
Designed and developed ETL processes in AWS Glue to migrate data from external sources such as S3 and formatted files into AWS Redshift.
Utilized AWS Glue catalog with a crawler to retrieve data from S3 and perform SQL operations using AWS Athena.
Implemented an ETL framework using Spark with Python, loading standardized data into Hive and HBase tables.
Created various Hive RAW and Standardized tables for data validation and analysis, partitioning the bucket.
Loaded large sets of structured and semi-structured data from multiple sources into the Row Data Zone (HDFS) using Sqoop imports and Spark jobs.
Developed lambda functions to execute AWS Glue jobs based on AWS S3 events.
Established monitors, alarms, notifications, and logs for lambda functions and Glue jobs using CloudWatch.
Developed Spark code in Python (PySpark) on distributed environments to process a large number of CSV files with different schemas.
Wrote MapReduce code to process and parse data from various sources, storing parsed data into HBase and Hive.
Orchestrated continuous integration and deployment of data infrastructure components like Apache Spark clusters, Kafka brokers, and Hadoop clusters to ensure the efficient delivery of scalable and resilient data processing environments.
Monitored and optimized database performance using tools such as Apache Ambari and Prometheus, proactively identifying bottlenecks and inefficiencies to enhance system reliability and resource utilization in a Big Data environment.
Analyzed and executed test cases for various testing phases including integration and regression testing.
Incrementally expanded the data infrastructure to support evolving data needs by embracing the agile methodology.
Designed data pipelines using Flume, Sqoop, Pig, and MapReduce to ingest customer behavioral data into HDFS for analysis.
Implemented Spark using Scala and Spark SQL, optimizing data testing and processing across multiple sources.
Utilized Apache Airflow for authoring, scheduling, and monitoring data pipelines, designing DAGs for automating ETL processes.
Managed Relational and NoSQL databases, including database design, schema optimization, performance tuning, and troubleshooting.
Designed scalable data storage solutions leveraging distributed columnar databases like Apache Druid and Amazon Redshift, enhancing query performance and resource utilization for analytical workloads.
Developed real-time data processing systems with stream processing frameworks such as Apache Kafka and Apache Flink, enabling low-latency data ingestion and analysis for time-sensitive applications like fraud detection and IoT analytics.
Designed and implemented data models to organize and structure data for efficient storage, retrieval, and analysis.
Containerized data processing applications using technologies like Docker and Kubernetes, facilitating deployment portability and scalability in CI/CD workflows across on-premises and cloud environments.
Demonstrated proficiency in using Git and Bitbucket for collaborative development, code management, and version tracking in data engineering projects.
Designed Python-driven monitoring solutions using CloudWatch, Splunk, and ELK for real-time anomaly detection.
Automated data pipeline orchestration using Apache Airflow, AWS Lambda, and Kubernetes (EKS/OpenShift) to ensure high availability and fault tolerance.
Managed CI/CD pipelines for data workflows using Jenkins, GitHub Actions, and AWS CodePipeline to ensure seamless deployment and version control.
Deployed containerized workloads with Docker and Kubernetes, ensuring portability and scalability of data processing applications.
Utilized Splunk, ELK (Elasticsearch, Logstash, Kibana), and AWS CloudWatch for real-time monitoring, log analysis, and proactive incident resolution.
Implemented disaster recovery (DR) and backup strategies using AWS S3 Glacier, Cross-Region Replication (CRR), and automated snapshots to ensure business continuity.
Client: Globe Life, McKinney, TX Aug 2017- Jun 2019
Role: Big Data Developer
Responsibilities:
Developed a centralized platform to integrate key operational data, including equipment statuses, well details, and production metrics.
Designed and developed an enterprise-grade cloud data warehouse to enhance risk assessment and business intelligence (BI) capabilities, ensuring efficient data storage, processing, and reporting for financial risk management.
Built a scalable data warehouse using Amazon Redshift, Google BigQuery, and Snowflake to store financial and risk-related data.
Integrated AWS S3, Google Cloud Storage, and Azure Blob Storage for secure storage of structured and unstructured data.
Configured RDS (PostgreSQL, MySQL) and NoSQL databases like DynamoDB and MongoDB to optimize data retrieval and storage.
Implemented Python-based SQL query execution within ETL workflows to optimize transformations in Redshift, Snowflake, and BigQuery.
Developed ETL pipelines using AWS Glue, Apache Airflow, and Azure Data Factory to streamline data ingestion, transformation, and loading.
Configured and scheduled Autosys jobs for orchestrating ETL workflows, reducing manual effort in pipeline execution.
Processed large-scale datasets using Apache Spark (PySpark/Scala), SQL, and Hadoop-based tools (HDFS, Hive, Pig) to enhance performance and reliability.
Designed and developed real-time data processing pipelines using Apache Flink for stream analytics and event-driven architecture.
Integrated Apache Flink with AWS Kinesis and Kafka to process high-throughput event streams from IoT devices and user activity logs.
Refactored existing Spark streaming pipelines into Apache Flink for better latency, throughput, and event-time accuracy.
Leveraged Kafka and Kinesis for real-time data streaming and event-driven architectures.
Maintained and optimized UNIX-based scripts for system monitoring, reducing downtime and improving operational efficiency.
Assisted in implementing microservices using Python Flask and Java Spring Boot.
Deployed and tested AWS Lambda functions triggered by S3, CloudWatch, and API Gateway.
Automated database backup and recovery processes using AWS RDS snapshot features.
Used Terraform to automate infrastructure setup for staging environments.
Wrote scripts to validate data integrity and perform ETL tasks from S3 to RDS.
Developed interactive dashboards and reports using Tableau, Power BI, and AWS QuickSight to provide real-time insights into financial risks and operational efficiency.
Enabled self-service BI by integrating Amazon Athena and Google BigQuery for ad hoc querying and analytics.
Enhanced query performance using indexing, partitioning, materialized views, and caching strategies in Redshift, BigQuery, and Snowflake.
Implemented IAM roles, encryption mechanisms (KMS), and fine-grained access controls to ensure data security and compliance with regulatory standards like GDPR, CCPA, and PCI-DSS.
Configured AWS CloudTrail and CloudWatch for auditing, monitoring, and anomaly detection in data workflows.
EDUCATION DETAILS:
Masters – Southeast Missouri State University – Computer Science.
Bachelor’s degree- CVR College of Engineering.