Indu Ramella.
AWS Data Engineer
Ph:813-***-****
Dallas, Texas
LinkedIn: https://www.linkedin.com/in/indu-r-52657b22a
Email: ************.**@*****.***
PROFESSIONAL SYNOPSIS
Over 6+ years of experience designing and implementing data solutions on AWS, focusing on scalability, automation, and cost efficiency.
Expertise in AWS services including S3, EC2, Redshift, Glue, Lambda, and Athena for building robust data architectures.
Skilled in ETL processes, data integration, and database management, ensuring seamless data flow and consistency.
Proficient in Python, SQL, and Big Data tools for data processing, scripting, and optimization.
Strong background in data modeling, performance tuning, and building data pipelines for analytics and reporting.
Experience in real-time data processing and integrating data from multiple sources for business intelligence solutions.
Collaborative team player, working with cross-functional teams to drive data-driven decision-making and problem-solving.
Committed to staying up-to-date with emerging cloud technologies to enhance data infrastructure and ensure security.
TECHNICAL SKILLS
AWS Services Knowledge- Amazon S3, Amazon Redshift, AWS Glue, Amazon RDS, DynamoDB, Amazon EMR,AWS Lambda, Amazon Kinesis, AWS Data Pipeline, AWS Athena, AWS Step Functions
Data Engineering Tools- Apache Spark, Apache Kafka, Apache Hive, ETL tools (e.g., Talend, AWS Glue)
Programming Languages- Python, SQL, Java, Scala, Shell Scripting.
Big Data Technologies- Hadoop, Apache Spark, Apache Flink, Apache Hive.
Data Modeling and Architecture- Data warehousing (star/snowflake schema), ETL pipeline design, Data Lake architecture.
Cloud Infrastructure Management- AWS CloudFormation, Terraform, IAM, AWS CloudWatch.
Containerization & Orchestration- Docker, Kubernetes (AWS EKS).
Data Security- Encryption (AWS KMS), VPC configuration, IAM roles and policies.
Data Warehousing Design & Optimization- Performance tuning (Redshift), Data partitioning, indexing.
DevOps and Automation- CI/CD pipelines (AWS CodePipeline), Automation tools (Jenkins, GitLab CI)
WORK EXPERIENCE
AWS Data Engineer
Truist Bank, Irving,Texas.
January 2023 to present.
• Proficient in using AWS services like Amazon Redshift, S3, Glue, Athena, EMR, Lambda, DynamoDB, RDS, and Kinesis for data storage, transformation, and processing.
• Implemented ETL pipelines using AWS Glue and AWS Lambda for real-time data processing and batch transformations.
• esigned and managed Redshift clusters for large-scale data analytics, ensuring high performance and cost efficiency.
• Expertise in building data lakes and data warehouses on AWS using S3, Glue, Redshift, and Lake Formation.
• Led the design and implementation of scalable, secure, and fault-tolerant data architectures that support enterprise-level analytics and machine learning applications.
• Developed and optimized ETL/ELT pipelines to ensure data consistency and high-quality data flow across various systems.
• Integrated on-premises and cloud-based data sources using AWS Data Migration Service (DMS), Kinesis, and AWS Direct Connect.
• Streamlined and automated data ingestion from external sources (e.g., APIs, third-party systems) into AWS data lakes or warehouses.
• Used Apache Spark, Flink, and EMR to process large-scale data in both batch and real-time streams.
• Implemented data security best practices, including IAM roles/policies, encryption (at rest and in transit), and VPC to ensure secure access and compliance.
• Optimized Redshift queries using distribution keys, sort keys, and materialized views to improve query performance and reduce costs.
• Tuned AWS Glue jobs and Athena queries for high efficiency, leveraging partitioning, parallel processing, and optimized storage formats like Parquet and ORC.
• Reduced operational costs by optimizing S3 storage, utilizing S3 Intelligent-Tiering, and cleaning up unused data and obsolete backups.
• Automated routine data engineering tasks using AWS Lambda, Step Functions, and CloudWatch for real-time monitoring, alerting, and logging.
• Built robust CI/CD pipelines for data workflows using Code Pipeline, Code Build, and Terraform to ensure smooth deployment of infrastructure and data solutions.
• Worked closely with data scientists to integrate machine learning models into data pipelines using SageMaker, TensorFlow, and PyTorch.
• Deployed and managed ML models in production environments to generate actionable insights and predictive analytics for business decision-making.
• Led cross-functional teams of data engineers, analysts, and scientists to implement complex data-driven solutions in the cloud.
• Mentored junior engineers on best practices in cloud-based data engineering and helped develop career growth paths within the team.
• Collaborated with stakeholders, including product managers, business analysts, and IT teams, to ensure successful delivery of data projects.
• Monitored and optimized AWS cost by setting up usage reports and alerts with AWS Cost Explorer and AWS Budgets.
• Implemented solutions to reduce AWS costs by efficiently utilizing resources and taking advantage of reserved instances and spot instances.
• Led successful migrations of legacy on-premise data systems to AWS cloud platforms, reducing operational overhead and increasing data availability.
• Developed and executed cloud data strategies for transitioning to serverless and managed AWS services, driving digital transformation for businesses.
AWS Data Engineer
TD Bank, Cherry Hill, NJ
September 2021 to December 2022
• Proficient in AWS Data Services such as S3, Redshift, Glue, Athena, Kinesis, and Lambda for efficient data storage, processing, and analytics.
• Implemented AWS Glue ETL workflows for data transformation and migration, ensuring seamless data movement across multiple environments.
• Expertise in building and maintaining data pipelines using AWS Step Functions, Lambda, and EMR to support real-time data processing and batch ETLworkflows.
• Designed, developed, and optimized ETL pipelines for large-scale data integration, ensuring high performance and scalability.
• Experience in writing efficient SQL queries and scripts for data extraction, transformation, and loading (ETL) using tools like AWS Glue, Apache Spark, anPython.
• Expertise in data cleansing, data transformation, and data aggregation techniques for financial and transactional data in banking systems.
• Built scalable and high-performance data warehouses on AWS Redshift, optimizing schema design and queries to support real-time business intelligence analytics.
• Collaborated with business stakeholders to implement banking-specific analytics solutions, including fraud detection, transaction monitoring, and customer behavior analysis.
• In-depth understanding of banking data structures and regulations (e.g., AML, KYC, PCI-DSS) and experienced in handling sensitive financial data in a compliant manner.
• Worked on integrating banking systems with AWS cloud architecture, ensuring seamless data flow and integrity between core banking applications and analytical platforms.
• Experience in managing large datasets from various banking channels (e.g., ATM, mobile banking, online transactions) for data-driven decision-making.
• Implemented robust data validation and quality checks throughout the ETL pipeline to ensure the accuracy, consistency, and reliability of financial data.
• Familiar with data governance frameworks to ensure that data usage in banking environments complies with industry regulations and internal policies.
• Automated repetitive ETL tasks using AWS Lambda, Step Functions, and CloudWatch, reducing operational overhead and improving pipeline efficiency.
• Set up real-time monitoring and alerting data pipelines using AWS CloudWatch, ensuring proactive issue resolution and minimal downtime.
• Collaborated with cross-functional teams (Data Science, DevOps, and Business Analysts) to design and deploy data solutions that meet banking requirements.
• Participated in Agile methodologies, delivering sprint-based solutions, tracking project progress, and refining solutions to meet evolving business needs.
• Implemented data security best practices using AWS IAM, KMS, and VPC for secure data storage and access, ensuring compliance with GDPR and PCI DSS for banking data.
• Enforced encryption-at-rest and encryption-in-transit for sensitive data in financial applications, aligning with regulatory standards.
• Led cloud migration projects, transitioning legacy on-premises data warehouses to AWS cloud environments, ensuring minimal downtime and data integrity.
• Experienced in using AWS Database Migration Service (DMS) and AWS Snowball for seamless data transfer from on-premise systems to AWS.
• Strong knowledge of SQL, Python, Java, and Scala for developing complex ETL jobs and automation scripts.
• Familiarity with Apache Kafka and Apache Spark for real-time data processing.
• Proficient in version control and CI/CD pipelines using Git, Jenkins, and AWS CodePipeline.
Big Data Developer
JP MORGAN CHASE, Hyderabad, India
January 2020 to July 2021.
• Experience with popular Big Data frameworks like Apache Hadoop, Spark, and Hive for data processing, storage, and analysis.
• Strong working knowledge of MapReduce, HDFS, and YARN for efficient distributed data processing.
• Practical experience in building ETL pipelines to extract, transform, and load large datasets using tools like Apache Nifi, Talend, or custom solutions built with Spark.
• Familiarity with working on data integration and processing using both batch and real-time processing frameworks.
• Expertise in querying large datasets using SQL on traditional RDBMS systems (like PostgreSQL or MySQL) as well as NoSQL databases (like MongoDB, Cassandra, HBase).
• Ability to optimize queries for better performance and reduce processing times.
• Working experience with cloud services like AWS (EMR, S3, Lambda) or Azure (HDInsight, Data Lake) for scalable Big Data processing and storage.
• Familiarity with managing data pipelines in the cloud using various tools like AWS Glue, Databricks, or Azure Data Factory.
• Experience in data preprocessing, transformation, and cleaning to ensure the accuracy and reliability of data for analysis.
• Familiarity with handling missing data, outliers, and noisy data using libraries like PySpark, Pandas, and Scala.
• Exposure to real-time data streaming frameworks like Apache Kafka and Spark Streaming, ensuring efficient data ingestion and processing in near real-time.
• Knowledge of event-driven architecture’s and the ability to integrate systems for real-time analytics.
• Strong proficiency in programming languages like Java, Scala, and Python used for developing Big Data applications.
• Ability to write efficient, reusable, and scalable code.
• Familiar with tools such as Tableau, Power BI, or custom visualization libraries to create dashboards and reports that provide actionable insights from large datasets.
• Demonstrated ability to troubleshoot issues in data pipelines, optimize processing jobs, and ensure data quality.
• Able to analyze large datasets and present findings to both technical and non-technical stakeholders.
• Experience working in Agile teams with cross-functional collaboration between developers, data scientists, and business analysts.
• Effective communication skills for discussing technical concepts and solutions with non-technical team members or clients.
ETL Developer.
Goldman Sachs,Hyderabad, India
June 2018 to December 2019.
Hands-on Experience with ETL Tools: Proficient in using ETL tools like Talend, Apache Nifi, Informatica, or Microsoft SSIS for data extraction, transformation, and loading tasks.
Strong working knowledge of SQL, with the ability to write complex queries, optimize performance, and manipulate large dataset.
Experience integrating data from various sources such as databases, flat files, cloud platforms (e.g., AWS, Azure), and APIs.
Data Transformation and Cleaning: Adept at transforming raw data into structured formats, performing data cleaning, and ensuring data quality during the ETL process.
Skilled in optimizing ETL workflows for performance improvements, including handling large datasets efficiently and debugging performance bottlenecks.
Familiar with using Git or other version control systems to manage and track ETL scripts and workflows.
Experience with scheduling and automating ETL jobs to ensure timely data loads, including using tools like Airflow, Cron jobs, or integrated scheduling in ETL tools.
Understanding of implementing error handling, logging, and monitoring of ETL jobs to ensure smooth execution and quick issue resolution.
Worked closely with data engineers, data analysts, and business stakeholders to understand requirements and deliver accurate, timely data.
Familiar with cloud technologies (AWS, Google Cloud, Azure) and utilizing their data services for ETL tasks.
Capable of troubleshooting complex ETL issues, debugging jobs, and resolving data discrepancies.
Basic understanding of data warehousing concepts and architecture, such as star schema, snowflake schema, and data mart design.
EDUCATION
Bachelors Degree: Bachelors in computer science. (2015-2018).
Masters Degree: Masters in information systems and technology (2021-2023)