Post Job Free
Sign in

Data Engineer Processing

Location:
Fort Worth, TX
Posted:
May 12, 2025

Contact this candidate

Resume:

Nandini Javvaji

Sr Data Engineer

Email: ****************@*****.***

LinkedIn: www.linkedin.com/in/nandini-javvaji

Phone: 972-***-****

PROFESSIONAL SUMMARY:

Over 7+ years of expertise in Data Engineering, leveraging AWS, Azure, and Databricks to design, develop, and deploy scalable enterprise data solutions.

Deep experience with AWS services such as Lambda, S3, Redshift, DynamoDB, and CloudFormation, along with proficiency in EC2, VPC, IAM, RDS, and Direct Connect, for managing and optimizing cloud infrastructure.

Developed and optimized large-scale data pipelines using PySpark on Databricks, improving data processing efficiency and reducing job execution times in high-performance environments.

Architected ETL workflows and data transformation layers using AWS Glue, Informatica, and Python for seamless data integration across diverse systems, ensuring accuracy and scalability.

Implemented real-time data streaming solutions using Apache Kafka, integrating with AWS and Spark to streamline data ingestion and transformation workflows.

Proficient in Hadoop ecosystems (HDFS, Hive, Sqoop) for processing multi-terabyte datasets, enhancing storage, transformation, and analysis capabilities.

Skilled in containerization and microservices deployment using Docker and Kubernetes (AKS), ensuring efficient and resilient application delivery.

Hands-on experience with CI/CD pipelines using Azure DevOps, automating build, test, and deployment processes for seamless software delivery.

Advanced cloud architecture skills with both AWS and Azure, including Azure Data Factory, Data Lake, Synapse Analytics, and Cosmos DB, for handling cloud-based data solutions.

Proficient in Spark Core, Spark SQL, Streaming, PySpark, and Scala for developing efficient data processing applications.

Experience designing and optimizing Snowflake data warehouses for real-time analytics and large-scale data processing.

Worked extensively with messaging systems like AWS SNS, and SQS monitoring tools like CloudWatch and Athena for real-time event processing and data infrastructure management.

Utilized Terraform and CloudFormation to automate infrastructure as code (IaaS) deployment from scratch, streamlining infrastructure provisioning.

Expertise in Data Mining, Visualization, and Model Building using tools like Tableau, Power BI, and Alteryx for advanced reporting and analytics.

Experience in using JIRA, and Rally for bug-tracking issues and involved in both Agile and Waterfall Methodologies.

Expertise in all phases of SDLC (Software Development Life Cycle).

Employed Git for version control, enhancing code management and collaboration throughout the development lifecycle.

Complete knowledge of Agile and SCRUM structure of software development by conducting meetings and coordinating with the team members to meet the deliverables.

TECHNICAL SKILLS:

AWS Services

AWS S3, EC2, Redshift, EMR, SNS, SQS, Athena, Glue, CloudWatch, Kinesis, API Gateway, Route53, DynamoDB, IAM, Lambda, Step Functions.

NoSQL Databases

MongoDB, Cassandra, Amazon DynamoDB, Hbase

SQL Databases

Oracle DB, Microsoft SQL Server, PostgreSQL, Teradata, Amazon RDS

Big Data Technologies

HDFS, SQOOP, PySpark, Hive, MapReduce, Spark, Spark Streaming, HBASE, Kafka

Monitoring Tools

Splunk, Nagios, ELK, AWS Cloud watch.

Containerization

Docker & Docker Hub, OpenShift.

Programming and Scripting

Spark Scala, Python, Java, MySQL, PostgreSQL, Shell Scripting, C, HiveQL, T-SQL

Data warehouse

Snowflake, Redshift, Teradata.

Version Control

GIT, GitHub, Bitbucket,

Cloud Technologies

AWS, Azure, GCP (Google Cloud Platform).

Methodologies

Agile/Scrum, Waterfall

AWS Services

AWS S3, EC2, Redshift, EMR, SNS, SQS, Athena, Glue, CloudWatch, Kinesis, API Gateway, Route53, DynamoDB, IAM, Lambda, Step Functions.

PROFESSIONAL EXPERIENCE:

Barclays Bank, Wilmington, Delaware September 2022-Present

Sr Data Engineer

Roles and Responsibilities:

Migrated data from an on-prem Cloudera cluster to AWS EC2 on EMR and developed an ETL pipeline using PySpark to store and process logs in an AWS S3, Data Lake.

Designed and optimized ETL workflows in Python for both batch and real-time data ingestion, incorporating data validation, transformation, and loading into AWS Redshift.

Conducted a proof-of-concept (POC) to store server log data in MongoDB, enabling system alert metric identification.

Leveraged Alteryx for advanced data preparation, blending, and analytics to enhance data workflows and integration processes.

Enabled real-time data validation and schema enforcement using Delta Lake within PySpark ETL workflows, ensuring consistency and compliance across banking data assets.

Developed ETL pipelines in AWS Glue for Spark transformations and automated ETL processes using Python scripts.

Built scalable data frameworks with PySpark and Scala, transforming large datasets from HDFS and S3 into analytics-ready formats, improving data processing efficiency.

Developed comprehensive Big Data solutions using Hadoop (HDFS, MapReduce, Hive, HBase), enhancing the processing and analysis of large datasets.

Exported data from Teradata to HDFS using Sqoop and built tables in Hive for large-scale data processing.

Created AWS Data Pipelines utilizing API Gateway, Lambda, Snowflake, DynamoDB, and S3, converting API responses into JSON format for further processing.

Managed data integration workflows with AWS Step Functions, SNS/SQS, and implemented Snow pipe for real-time data ingestion into Snowflake.

Utilized Google Sheets APIs to develop dashboards for real-time financial metrics, enabling stakeholders to track transaction trends efficiently in the financial sector.

Developed scalable data pipelines using Apache Spark and AWS AppFlow to ingest and process large-scale data from social media platforms.

Integrated Apache Airflow with AWS to monitor multi-stage ML processes on Sage Maker and developed Python scripts for ETL validation with regression testing.

Loaded data into Amazon Redshift and monitored AWS RDS instances using AWS CloudWatch, ensuring operational efficiency and system health.

Implemented Profisee MDM to standardize and unify critical financial data across multiple source systems, improving data accuracy and enabling reliable regulatory reporting.

Implemented baseline AWS account security, incorporating endpoint protection, vulnerability scanning, and intelligent threat detection mechanisms.

Developed real-time, interactive dashboards using Power BI, integrating data from multiple sources to deliver actionable insights for stakeholders.

Automated financial reporting and reconciliation processes using SSIS workflows, significantly reducing manual intervention and improving data consistency.

Developed and optimized Spark applications leveraging Core, Streaming, SQL, Data Frames, Datasets, and Spark ML for advanced data processing; created PowerBI reports with complex calculations for real-time analytics.

Implemented ETL and data integration processes using Informatica Cloud on AWS, facilitating seamless migration and transformation across cloud services like S3 and Redshift.

Automated infrastructure provisioning and management on AWS using Terraform, reducing deployment time and improving consistency across environments.

Optimized data ingestion and updates using Delta Lake merge and time travel features, improving traceability and version control.

Developed and maintained modular, scalable data transformation models in DBT to standardize financial reporting logic and improve data consistency across Snowflake and Redshift.

Integrated DBT into CI/CD pipelines using GitHub Actions, enabling automated testing, documentation, and deployment of analytics models across environments.

Led Agile data projects using JIRA and Grafana, collaborated on SAP data transformation for BI and warehousing.

Environment: AWS, ETL, Data Lake, MongoDB, Informatica, Apache Airflow, Delta Lake, DynamoDB, Snowflake, Lambda, Spark, Redshift, HDFS, PowerBI, Git, Kubernetes, Jenkins, Pyspark, Grafana, JIRA, JSON, Agile.

AIG Insurance, New York, NY March 2019- Aug 2022

Data Engineer

Roles and Responsibilities:

Led the design and implementation of a hybrid cloud strategy, integrating on-premises systems with Azure and AWS cloud

services to securely process sensitive financial data.

Created ETL jobs to load and transport server data into S3 buckets and moved S3 data into the Data Warehouse for centralized storage and analytics.

Architected and implemented scalable data solutions on Azure, leveraging Azure Data Factory for orchestrating

complex ELT processes, ensuring robust data integration from relational databases like SQL Server, oracle into snowflake.

I specialized in data warehousing and ETL tools such as Talend, optimizing data integration and transformation processes.

Implemented AWS S3 alongside Azure Blob Storage for archival and storage of infrequently accessed data, optimizing

cost and performance.

Integrated Delta Lake with AIG data lake architecture to ensure ACID-compliant transactions and claims data processing.

Built AWS data pipelines to extract Big Data from diverse sources (Excel, Flat Files, Oracle, SQL Server, Teradata, log data) into Hadoop HDFS for large-scale processing.

Configured Azure VMs and AWS EC2 with auto-scaling for high availability; built Hive layer over HDFS to enable easy querying of unstructured data.Designed, tested, and implemented data migration, ingestion, and processing frameworks capable of handling hundreds of GBs of data, utilizing Airflow, PySpark, Python, and Big Query.

Designed and implemented scalable data pipelines using Apache Airflow for orchestrating ETL workflows, ensuring efficient data processing, monitoring, and automated error handling across multiple cloud environments.

Developed Spark code using Scala and Spark-SQL for faster testing, processing, and querying, leveraging Spark YARN and Spark Context for distributed data computation.

Created Spark streaming applications to pull data from cloud storage into Hive tables and processing large volumes of structured data using Spark SQL and Scala.

Optimized SQL queries for efficient data retrieval from SQL Server and MySQL, significantly reducing execution times for high-volume retail data.

Established data security protocols using Azure Active Directory (AD), Azure Key Vault for encryption, and on-premises

LDAP, maintaining compliance with financial regulations.

Worked with distributed data frameworks such as Apache Spark and Presto on Amazon EMR, integrating with Redshift, S3, and DynamoDB for high-performance data solutions.

Implemented Azure Databricks for collaborative data processing, allowing streamlined data engineering and machine

learning workflows.

Monitored data pipelines and ensured fault tolerance using AWS CloudWatch, CloudTrail, and SNS for alerting and performance optimization.

Built and maintained DBT models to transform raw insurance and claims data into analytics-ready datasets in Snowflake, improving data accessibility and reporting efficiency for actuarial and compliance teams.

Designed and implemented serverless data processing workflows using AWS Lambda, integrating with S3, MongoDB, and Redshift to automate ETL tasks and real-time data transformations.

Managed large-scale data processing using PySpark on HDInsight and Hadoop, with secure deployment via IAM, KMS, and

VPC to meet compliance standards.

Utilized Docker and Azure Kubernetes Service (AKS) to containerize applications and manage their deployment in

hybrid environments, ensuring scalability and reliability of data pipelines and ELT workflows.

Built CI/CD pipelines on AWS, automating the software delivery process to streamline development, testing, and deployment.

Leveraged Kubernetes to automate microservices management, ensuring high availability, resilience, and optimal resource utilization for data processing jobs.

Enabled time travel and data versioning using Delta Lake, supporting audit trails and historical analysis for insurance reporting.

Implemented DBT testing and documentation frameworks to ensure data quality, lineage, and transparency across regulatory and operational reporting pipelines.

Developed GitHub Actions workflows to deploy Terraform templates into AWS, enabling automated infrastructure provisioning.

Led agile data engineering projects using JIRA for task management, and Grafana for real-time monitoring, and collaborated with cross-functional teams to ensure timely and efficient delivery of data solutions.

Environment: AWS, ETL, Apache airflow, Kafka, snowflake, Tableau, Lambda, Redshift, Pyspark, Spark, Hadoop,Azure, Python, MongoDB, SQL, Jenkins, Kubernetes, Terraform, GitHub, JIRA, JSON, Agile.

Best Buy, Richfield, MN May 2017 – February 2019

Data/ Bigdata Engineer

Roles and Responsibilities:

Implemented scalable, real-time data processing and streaming solutions using Python, Apache Kafka, and Spark Streaming, driving efficient data flow and analytics across distributed systems.

Built Python-based frameworks for real-time data processing and streaming using Kafka and Spark Streaming, optimizing inventory management and enhancing real-time order tracking.

Utilizing Python libraries like Pandas, NumPy, and PySpark to clean, manipulate, and transform structured and unstructured data for analytics and reporting.

Leveraged AWS S3 and Azure Blob Storage for scalable, secure data storage and integrated them into Hadoop clusters for long-term retention.

Experienced in designing, deploying, and managing large-scale distributed systems using Hadoop, HDFS, MapReduce, Hive, HBase, and Zookeeper for efficient data processing, storage, and coordination.

Leveraged Apache Spark for real-time and batch processing, as well as using Sqoop and Flume for seamless data migration and ingestion between Hadoop and relational or unstructured data sources.

Skilled in automating and orchestrating complex workflows with Apache Oozie, optimizing Hive queries for data warehousing, and managing high-throughput NoSQL workloads with HBase for real-time analytics.

Extensive experience in designing, developing, and automating complex data workflows using Apache Airflow, ensuring seamless orchestration of ETL pipelines and data processes

Optimized Spark jobs through techniques such as partitioning, caching, and tuning resource allocation, improving processing efficiency and reducing execution times.

Proficient in leveraging PySpark and Scala for data transformation, ETL pipelines, and advanced analytics, implementing complex business logic for big data solutions.

Skilled and integrated data from Hadoop ecosystems (e.g., HDFS, Hive) into reporting platforms like Tableau enabling seamless visualization of key metrics and business trends.

Managed large-scale Oracle database solutions to support data processing needs, optimizing for query performance and scalability in a high-volume retail environment.

Configured AWS Lambda and Azure Functions for serverless execution of Python-based workflows, improving resource efficiency.

Developed and optimized database schemas utilizing Star and Snowflake models for Oracle databases, enhancing query efficiency and accelerating analytical reporting performance.

Utilized AWS Glue and Azure Data Factory for orchestrating complex ELT processes, ensuring efficient data integration from multiple sources into the Hadoop ecosystem.

Designed and optimized MongoDB schemas and indexing strategies to support high-performance data storage, efficient retrieval, and scalability for large-scale unstructured datasets, while integrating with Big Data frameworks like Apache Spark.

Managed version control using GitLab and Bitbucket, implementing branching strategies and code reviews to ensure collaboration across retail data engineering teams.

Deployed and managed containerized applications using Docker and Kubernetes, ensuring efficient scalability, orchestration, and automated deployment of data engineering workflows in cloud environments.

Developed and deployed infrastructure as code (IaC) using Terraform to automate the provisioning and scaling of big data platforms, ensuring efficient and consistent cloud resource management.

Implemented CI/CD pipelines using Jenkins for automated deployment of big data applications, while managing project tasks and tracking progress in JIRA to ensure seamless delivery of data solutions.

Implemented agile techniques, combining Trello with Kanban for task management to improve teamwork and workflow efficiency while delivering data solutions.

Environment: Hadoop, Apache airflow, Kafka, HDFS, MapReduce, Hive, HBase, Sqoop, Flume, Python, Azure,Docker, Kubernetes, Terraform, Jenkins, Jira, Bitbucket, Agile.

EDUCATION:

Master’s Degree: Texas A & M University, Computer Science

CERTIFICATION:

AWS Data Engineer-Associate.



Contact this candidate