Rakesh Reddy
SR. DATA ENGINEER
******************@*****.*** 313-***-**** LinkedIn: Rakesh
PROFESSIONAL SUMMARY:
Possess 8+ years of experience as a Data Engineer, specializing in cloud technologies, data pipeline development, ETL processes, and data warehousing.
Expertise in building and maintaining distributed data systems using AWS services such as S3, Glue, Lambda, and EMR to process large datasets and automate workflows.
Proficient in working with Azure technologies, including Azure Data Factory (ADF), Synapse Analytics, Data Lake Storage (ADLS), and Azure Databricks, to streamline data ingestion and processing in cloud environments.
Solid experience with Apache Spark, Hadoop, and Kafka for handling big data processing, real-time data streaming, and batch analytics in both cloud and on-premises settings.
Proficient in database management systems (MySQL, PostgreSQL, SQL Server, Oracle) with a focus on optimizing query performance, designing efficient database schemas, and ensuring high availability.
Skilled in using Python, Pandas, NumPy, and Py-Spark to perform data analysis, develop statistical models, and implement machine learning algorithms.
Proven track record in developing automated ETL pipelines using AWS Glue, Talend, and Informatica reducing manual intervention and improving data processing efficiency.
Strong experience in building and optimizing data models in Star Schema for large-scale data warehouses, ensuring consistency and performance in reporting and analysis.
Extensive experience using data visualization platforms like Tableau and Power BI to deliver actionable insights supporting business stakeholders' data-driven decision-making.
Expert in CI/CD pipelines using Jenkins, GitLab, and AWS Code Pipeline for seamless code deployment and version control, ensuring agile and rapid development cycles.
Proficient in Agile and Scrum practices, working collaboratively with cross-functional teams to deliver high- quality data engineering solutions on schedule.
Excellent troubleshooting abilities, with a thorough understanding of diagnosing and resolving performance issues in complex distributed systems.
Familiar with data security best practices, including data encryption (Azure Key Vault), role-based access control
(RBAC), multi-factor authentication (MFA), and identity management in cloud platforms (IAM, Azure AD).
A self-driven and adaptable engineer, continuously learning new technologies and methodologies to improve data processing workflows and architecture.
TECHNICAL SKILLS:
Cloud Platforms: AWS: S3, Glue, Lambda, RDS, DynamoDB, Step Functions, EMR, Data Pipeline, IAM, CloudWatch, CloudFormation, Code Pipeline, Code Build, EC2, Boto3; Azure: Data Factory (ADF), Data Lake Storage (ADLS), Synapse Analytics, SQL Database, Databricks, Functions, Stream Analytics, DevOps, Active Directory (AAD), Key Vault, PowerShell
Big Data Technologies: Apache Spark, Hadoop, Hive, Kafka, Kafka Streams, Airflow, Sqoop, MapReduce, Py-Spark, Databricks
Databases: MySQL, PostgreSQL, SQL Server, Redshift, MongoDB, Oracle, T-SQL ETL & Data Integration: ETL, AWS Glue, Informatica, Talend, Step Functions, PowerShell scripting, Data Pipeline Data Processing & Analysis: Python, Pandas, NumPy, Spark SQL, Scala, TensorFlow, Scikit-Learn, Matplotlib, Seaborn, Data Analysis Tool pack (Excel)
Data Warehousing & Data Modelling: Snowflake, Teradata, RedShift, Star Schema, Data Warehousing, Query Optimization, Database Schema Design, Data Partitioning DevOps & Infrastructure Automation: Terraform, Azure DevOps, Jenkins, CloudFormation Containerization & Orchestration: Docker, Kubernetes (EKS) Data Visualization & Reporting: Tableau, Power BI, Alation, Excel (Advanced Reporting) Streaming & Real-time Data: AWS EMR, Apache Kafka, Azure Stream Analytics Monitoring & Logging: CloudWatch, ELK Stack, Splunk Security & Access Control: AWS IAM, Azure Active Directory (AAD), Role-based Access Control (RBAC), Multi-Factor Authentication (MFA), Data Encryption (Azure Key Vault) Version Control & Collaboration Tools: Git, GitHub, GitLab, JIRA, Confluence, Agile, Scrum, Bitbucket EXPERIENCE:
US Bank, Minneapolis, MN
Sr. Data Engineer
April 2024 – Till Date
Managed AWS S3 for efficient storage and management of large datasets, implementing lifecycle policies and data versioning to ensure scalability and data integrity.
Developed and maintained 15+ ETL pipelines using AWS Glue, improving data transformation efficiency by 30% and ensuring seamless integration with AWS services for enhanced data cataloging and schema management.
Optimized data warehouses in Amazon Redshift, executing complex SQL queries that improved query performance by 40% through the effective use of data distribution and sort keys.
Designed and deployed 10+ serverless data transformation functions with AWS Lambda, reducing operational overhead by 20% while enhancing the scalability of data processing workflows.
Deployed and managed relational databases, such as MySQL, on Amazon RDS, ensuring 90% availability and optimizing performance through automated backups and scaling.
Managed AWS DynamoDB, implementing efficient NoSQL database solutions, including indexing, capacity planning, and cost optimization strategies.
Orchestrated complex serverless workflows and managed multi-step data processes using AWS Step Functions, improving automation and data pipeline efficiency.
Utilized Amazon EMR to process large-scale datasets using Hadoop and Spark, optimizing big data workflows and reducing processing times.
Executed SQL queries on data stored in Amazon S3 using AWS EMR, enabling fast and cost-effective ad-hoc querying directly from storage.
Automated data movement and transformation tasks using AWS Data Pipeline, integrating 10+ AWS services and reducing manual intervention by 25%, improving pipeline efficiency.
Created interactive data visualizations and dashboards using Power Bi, providing actionable insights to stakeholders across the organization.
Wrote Python ETL scripts integrated with AWS SDK (Boto3), automating data workflows that reduced manual intervention by 30% and improved processing speed by 25%.
Automated AWS tasks and monitored data pipelines using Shell/Bash scripting, improving operational efficiency and reducing manual intervention.
Designed and optimized data models in snowflake schema and normalized forms, improving data accessibility and query performance across data systems.
Designed and implemented data warehousing solutions on Amazon Redshift, focusing on query optimization and efficient data storage techniques.
Built and maintained robust ETL workflows using AWS Glue, reducing transformation process times by 30% and improving pipeline architecture stability.
Enforced least privilege access policies using AWS IAM, securing and controlling access to AWS resources, resulting in a 95% decrease in unauthorized access incidents.
Monitored and logged AWS resource activity using CloudWatch, ensuring compliance with auditing and troubleshooting standards.
Applied data governance best practices using AWS Glue Data Catalog and AWS Lake Formation, ensuring data lineage and compliance across pipelines.
Automated resource provisioning using AWS CloudFormation, enabling infrastructure as code for repeatable and scalable deployments.
Containerized and orchestrated data workloads using Docker and Kubernetes (EKS), streamlining deployment and scaling of big data solutions.
Managed real-time data ingestion and stream processing using AWS EMR Data Firehose & Streams, ensuring efficient data flow and analytics in real time.
Utilized SNS & SQS for distributed messaging, enhancing system scalability and fault tolerance, reducing message delivery time by 15%.
Automated data pipeline deployments with AWS Code Pipeline and AWS Code Build, reducing deployment times by 25% and accelerating project delivery timelines.
Employed Hadoop and Spark for batch and stream data processing on Amazon EMR, cutting data processing time by 50% and improving performance for large datasets.
Utilized Apache Hive and Presto for querying and processing large-scale data, enhancing data access and analysis capabilities, and reducing query response times by 30%.
Worked in Agile and Scrum environments, using Jira for project management and ensuring iterative development and continuous delivery of high-quality data solutions. Environment: AWS (S3, Glue, Redshift, Lambda, RDS, DynamoDB, Step Functions, EMR, Data Pipeline, CloudWatch, IAM, Code Pipeline, Code Build, CloudFormation, EKS), Hadoop, Spark, Apache Hive, Docker, Kubernetes, Python, Boto3, MySQL, Snowflake, Jira, Shell/Bash scripting, AWS SDK, AWS Glue Data Catalog. JP Morgan Chase, Ohio
AWS Data Engineer
August 2023 - April 2024
Designed and managed AWS EC2 instances, optimizing resource utilization and scaling data processing workloads, resulting in a 30% improvement in cost efficiency and 40% faster processing times.
Implemented scalable Amazon S3 storage solutions, improving data storage performance and access times by 25% while ensuring secure and reliable data storage.
Managed Amazon RDS databases, performing regular backups, monitoring, and performance tuning to ensure reliability and speed in data retrieval.
Developed and optimized ELT pipelines using Informatica, integrating data from various sources into data warehouses and ensuring seamless data processing.
Utilized Python, NumPy, and Pandas to clean, transform, and analyse large datasets, automating data manipulation and processing workflows.
Integrated AWS SDKs (Boto3) to automate cloud infrastructure management, reducing manual workflows by 35% and streamlining data engineering tasks.
Orchestrated complex workflows using AWS Step Functions, automating multi-step processes and reducing manual intervention by 30%, improving overall workflow efficiency.
Processed large-scale datasets using Amazon EMR, enabling parallel processing of data and reducing job execution time by 40%.
Developed and optimized big data processing pipelines using Hadoop, Spark, and Py-Spark, improving processing efficiency and data insights.
Wrote and executed complex Spark SQL queries, improving analysis time by 25% and providing faster insights for decision-making processes.
Created and maintained interactive Tableau dashboards, delivering real-time business insights and visual analytics that increased stakeholder decision-making speed by 20%.
Integrated Snowflake data storage solutions, reducing data retrieval time by 30% and improving query performance across large datasets.
Utilized AWS EMR and Delta Lake to run fast, scalable analytics queries on data stored in S3, reducing query costs by 15% while improving processing speed by 20%.
Managed and optimized Databricks environments for large-scale data processing, streamlining data workflows, and improving analytics performance.
Implemented Apache Kafka for real-time data streaming, enabling data flow between applications and enhancing real-time analytics capabilities.
Developed serverless data transformation workflows using AWS Lambda, reducing infrastructure costs and improving scalability.
Used Scala for data processing tasks, enabling high-performance and fault-tolerant processing of big data workloads.
Managed data warehousing tasks, optimizing schema design and ensuring high-performance querying of large datasets using Snowflake, improving data retrieval times and scalability.
Orchestrated and monitored data pipelines with Apache Airflow, automating workflows and reducing manual intervention by 30%, ensuring reliable and smooth data operations.
Developed and maintained CI/CD pipelines with Jenkins, automating code deployment and version control, improving data engineering workflow efficiency by 25%.
Automated infrastructure provisioning using Terraform, ensuring consistent and scalable AWS environments for data engineering tasks, reducing deployment time by 20%.
Worked in Agile and Scrum teams, using Confluence for collaboration, ensuring the timely delivery of high-quality data engineering solutions.
Used Alation for data cataloguing and governance, improving data lineage tracking and metadata management and enhancing data discovery and access by 20%.
Developed comprehensive data reports to communicate complex insights to non-technical stakeholders, empowering data-driven decision-making across the organization. Environment: AWS (EC2, S3, RDS, Step Functions, EMR, Lambda, IAM), Informatica, Python, NumPy, Pandas, Boto3, Spark, Py-Spark, Hadoop, Spark-SQL, Tableau, Snowflake, Delta Lake, Databricks, Apache Kafka, Scala, Apache Airflow, Jenkins, Terraform, Confluence, Git.
Ivy, India
Azure Data Engineer
April 2021 - July 2022
Managed Azure Data Lake Storage (ADLS), optimizing data storage, implementing robust security measures, and improving data retrieval times by 20%.
Built and optimized Azure Synapse Analytics data warehouses, running complex T-SQL queries and implementing distributed computing to enhance data processing performance.
Designed, deployed, and monitored ETL pipelines using Azure Data Factory (ADF) to streamline data integration processes and ensure reliable data flow across various systems.
Developed advanced SQL queries and applied indexing, partitioning, and performance tuning on Azure SQL Database & SQL Server to enhance query performance and data accessibility.
Utilized Azure Databricks for data preparation, machine learning workflows, and big data processing with Apache Spark, significantly reducing data transformation time.
Implemented Azure Functions for serverless architecture to trigger workflows and automate data transformations, increasing process efficiency.
Processed real-time data streams and implemented IoT analytics using Azure Stream Analytics, enabling timely insights from streaming data.
Developed CI/CD pipelines and managed version control using Azure DevOps, ensuring streamlined code deployment and version management through Git.
Leveraged Apache Spark & Py-Spark for data transformation and batch processing, optimizing big data analytics workloads.
Orchestrated data workflows and managed complex data pipelines with Apache Airflow, enhancing the efficiency and reliability of data processing tasks.
Automated data integration workflows using Azure Logic Apps, reducing manual interventions and speeding up data processing cycles.
Utilized Python for data manipulation, building automated data pipelines, and integrating seamlessly with various Azure services.
Optimized SQL queries for performance, troubleshooting, and enhancing database efficiency to ensure minimal latency in data retrieval.
Automated infrastructure management tasks using PowerShell, improving operational efficiency and reducing manual setup errors in Azure environments.
Managed identity and access using Azure Active Directory (AAD), ensuring secure Role-Based Access Control
(RBAC) and implementing multi-factor authentication (MFA).
Ensured secure handling of sensitive data by managing secrets and API keys through Azure Key Vault, adhering to security best practices.
Designed and optimized star schema data models, facilitating efficient data warehousing and reporting.
Built and optimized dynamic Power BI dashboards, leveraging Azure Synapse Analytics to provide real-time business insights and visualizations.
Deployed and managed Azure infrastructure using Terraform, automating provisioning and ensuring scalable and consistent environments.
Collaborated effectively in Agile & Scrum teams, utilizing tools such as Jira and Confluence to track progress, manage tasks, and ensure timely delivery of data solutions. Environment: ADF, ADLS, Azure Synapse Analytics, Azure SQL Database, SQL Server, Azure Databricks, Azure Functions, Azure Stream Analytics, Azure DevOps, Git, Apache Spark, Py-Spark, Apache Airflow, Azure Logic Apps, Python, PowerShell, Azure AD, Azure Key Vault, Power BI, Terraform, Jira, Confluence. Ronas Soft Solutions, India
Data Engineer
December 2018 - March 2021
Designed and optimized SQL and PostgreSQL queries for efficient data extraction and manipulation, ensuring high performance in large-scale data processing environments.
Utilized Python, Pandas, and NumPy for data transformation and analysis, streamlining ETL workflows and enhancing data processing speed by 20%.
Developed and maintained ETL pipelines to extract, transform, and load data, ensuring smooth data integration across various sources, including real-time data caching with Redis.
Wrote Shell scripts to automate data pipeline tasks, minimizing manual intervention and improving process efficiency by 15%.
Architected and maintained data warehouses to store and retrieve structured and unstructured data, supporting business intelligence and reporting needs.
Collaborated on the design and integration of real-time data caching and analytics applications using Redis, significantly improving data access speeds for analytics teams.
Managed and versioned data engineering workflows with Git, ensuring proper collaboration, version control, and streamlined deployment processes.
Developed data models and optimized database schemas for high-performance querying, improving data retrieval efficiency by 25%.
Ensured the reliability and scalability of data pipelines, automating data workflows and streamlining data processing tasks to reduce manual workload and potential errors. Environment: SQL, PostgreSQL, Python (Pandas, NumPy), Shell, Redis, Git, Hadoop. Numeric Technologies, India
Data Analyst
October 2015 - November 2018
Conducted advanced data analysis using Hadoop, PostgreSQL, and MongoDB to extract valuable insights, driving data-informed business decisions.
Utilized SQL and Python (Pandas, NumPy) to clean, manipulate, and analyze large datasets, ensuring data quality and integrity.
Developed and optimized ETL processes using Talend to automate data migration workflows, reducing manual effort by 30%.
Applied hypothesis testing and regression analysis techniques to identify trends and correlations, providing actionable insights for stakeholders.
Built complex Excel models using macros, VLOOKUP, PivotTables, and the Data Analysis Tool pack for detailed financial and operational reporting.
Collaborated with cross-functional teams using JIRA to manage data analysis tasks, ensuring timely delivery of actionable insights.
Applied predictive analytics to forecast trends, improving business forecasting accuracy by 25%.
Managed version control and collaboration of code and analysis using Git, ensuring proper versioning and team collaboration.
Designed and implemented automated data migration processes, reducing data transfer errors and improving system integration efficiency.
Created and maintained comprehensive documentation for data workflows, analysis methodologies, and ETL processes, ensuring reproducibility and ease of use for future projects.
Delivered training and support to business users on data analysis techniques, empowering them to utilize self- service data tools effectively.
Environment: Hadoop, PostgreSQL, MongoDB, SQL, Python (Pandas, NumPy), Talend, Excel (Macros, VLOOKUP, PivotTables, Data Analysis Tool pack), JIRA, Git.
Education:
Bachelor of Science, Osmania University, Hyderabad – India. 2012 – 2015. Master’s in information science, Trine University, Detroit – USA 2022 – 2023.