Data Engineer Real-Time

Location:

Irving, TX

Posted:

September 10, 2025

Contact this candidate

Resume:

Nikhil Koteswara Reddy Bollam

AWS Data Engineer

Phone: 469-***-**** Email: *****************@*****.***

Summary:

Experienced Data Engineer with 7 years of expertise in designing, developing, and optimizing large-scale data pipelines using AWS Cloud, Big Data technologies, and ETL/ELT frameworks.

Built and maintained ETL pipelines using Airflow and Spark to move and clean data from different sources, making it easier for teams to access reliable insights.

Used ELT approach with dbt and BigQuery to transform raw data directly in the warehouse, speeding up reports and simplifying data workflows.

Developed scalable data pipelines in Databricks using PySpark to process large datasets, helping teams get cleaner, faster data for analytics.

Designed and managed data models in Snowflake to support real-time reporting and reduce query times for business users.

Used Informatica PowerCenter to integrate data from various sources, ensuring accuracy and consistency across enterprise systems.

Built Python scripts to automate daily data loads and validations, reducing manual work and improving data accuracy across teams.

Used Python with Pandas and NumPy for data wrangling and transformation tasks, making raw data analysis-ready for reporting tools.

Created reusable Python modules to handle API data ingestion and file processing, saving development time for future projects.

Worked with Hadoop ecosystem tools to process and analyze large-scale data, improving batch processing efficiency for business reports.

Used HDFS to store and manage high-volume raw and processed data, ensuring reliable access and scalability across data pipelines.

Used Apache Spark and Hive to process and query large volumes of data for analytics teams, helping deliver faster insights from complex datasets.

Built end-to-end data pipelines using Sqoop, Flume, and Oozie to move data from RDBMS to Hadoop, enabling smooth and automated data flow.

Managed real-time and batch data workflows using HBase, SQS, and SNS, improving system communication and reliability for streaming applications.

Used Apache Airflow to schedule and monitor data pipelines, making it easy to track tasks and ensure data flows smoothly across systems.

Used AWS Glue to automate ETL jobs and clean raw data before loading it into Redshift, making reporting faster and more reliable.

Processed large datasets on EMR with Spark, improving data transformation speed and supporting downstream analytics in Redshift.

Built data processing pipelines using PySpark and Scala to handle millions of records efficiently, making data ready for analytics in near real-time.

Integrated Apache Kafka for real-time data streaming, helping systems react faster to events and reducing delays in business insights.

Collaborated with stakeholders to analyze data trends and documented workflows and processes clearly, ensuring the team stays aligned and projects run smoothly.

Managed and optimized relational databases like MySQL, Oracle, and PostgreSQL to ensure fast, reliable data access for applications and reporting.

Worked with MongoDB to handle flexible, unstructured data, enabling quick development of scalable features and easier data retrieval.

Designed and maintained scalable data pipelines on AWS using services like S3, Lambda, and Glue to automate data processing and reduce manual effort.

Leveraged AWS Redshift and Athena to build fast, cost-effective data warehouses and enable self-service analytics for business teams.

Monitored and optimized AWS cloud resources to improve data pipeline reliability and control costs, ensuring smooth operations at scale.

Used Terraform to automate cloud infrastructure setup, making deployments faster, more consistent, and easy to manage across environments.

Built and maintained CI/CD pipelines using Jenkins and GitHub Actions to automate testing and deployment, speeding up delivery and reducing errors.

Collaborated with development teams to integrate automated quality checks, ensuring smoother releases and better software reliability.

Managed code versions and collaboration using Git and Bitbucket, ensuring smooth teamwork and controlled releases across multiple projects.

Automated build and dependency management with Maven, simplifying project setup and improving development efficiency.

Applied Agile methodologies and wrote unit tests using frameworks like JUnit and PyTest to deliver high-quality, reliable software on time.

Quickly adapted to new tools and technologies as needed, always eager to learn and help improve team processes and project outcomes.

TECHNICAL SKILLS:

Database

PostgreSQL, MySQL, Redshift, Snowflake, MongoDB, DynamoDB

Big Data Technologies

Hadoop, spark, Kafka, Hive, Pig, HBase, Sqoop, Flum, PySpark, Presto

Cloud Platform

AWS, Azure, GCP

Programming languages

Python, SQL, Scala, R, JAVA

BI Tools

Tableau, Power BI, SSRS.

Tools & Software

TOAD, MS Office, BTEQ, Teradata SQL Assistant

ETL Tools

Pentaho, Informatica, Talend

Operating System

Windows, Dos, Unix, Linux

Data warehouse

Snowflake, AWS Redshift, Azure synapse analytic

Version Control & Testing

Git, Bitbucket, PyTest, Great Expectations

Orchestration & Workflow

Apache Airflow, AWS Step Functions

Professional Experience:

US Bank, Minneapolis, MN Feb 2021- Present

AWS Data Engineer

Responsibilities:

Worked as an AWS Data Engineer for a leading banking client, designing and managing secure, scalable data pipelines to support real-time financial analytics and regulatory compliance.

Built and optimized end-to-end ETL pipelines for banking data systems, ensuring smooth data flow across platforms.

Used Databricks with Snowflake to build scalable data pipelines and perform distributed data processing, enabling faster insights for banking analytics and reporting.

Developed and scheduled Informatica workflows to automate complex ETL processes, ensuring accurate data integration across core banking systems.

Developed Python scripts to automate data validation, cleansing, and transformation tasks, improving efficiency and reducing manual errors in banking data pipelines.

Utilized Python for building reusable modules and integrating APIs to streamline data ingestion from multiple banking systems into centralized platforms.

Created Python-based data quality dashboards to monitor pipeline health and identify anomalies in near real-time.

Designed and optimized Spark-based data pipelines to process large volumes of transactional banking data, enabling faster insights and real-time analytics.

Leveraged PySpark for distributed data processing, improving performance and scalability of ETL workflows across diverse financial datasets.

Worked on managing and querying large-scale financial datasets using Hadoop and HDFS, enabling secure and scalable data storage solutions.

Developed and maintained data transformation jobs using Pig and Spark on Hadoop clusters.

Achieved a 30% reduction in overall data processing time through workflow optimization.

Built data ingestion pipelines that leverage HDFS for storing raw and processed banking data, ensuring high availability and fault tolerance.

Implemented data transfers between Hadoop and RDBMS using Sqoop for reliable ETL integration.

Utilized Big Data technologies like HBase, Hive, and Apache Spark to design scalable data storage and processing solutions, enabling efficient handling of large banking datasets for analytics and reporting.

Implemented end-to-end data ingestion and workflow automation using tools such as Sqoop, Flume, Oozie, and Zookeeper to ensure reliable, real-time data availability across banking systems.

Built and managed automated workflows using Apache Airflow to orchestrate complex data pipelines, ensuring reliable and timely data processing for banking analytics.

Leveraged AWS EMR to run large-scale distributed data processing jobs, optimizing performance for complex banking analytics workloads.

Designed and automated ETL workflows using AWS Glue, enabling seamless data transformation and integration across multiple banking data sources.

Managed and optimized Amazon Redshift data warehouses to support fast, reliable querying and reporting of large-scale banking datasets for strategic decision-making.

Developed real-time data streaming solutions using Kafka and PySpark to process and analyze banking transactions with low latency and high reliability.

Implemented scalable data processing workflows in Scala and PySpark, improving the efficiency of ETL pipelines for complex financial datasets.

Collaborated with cross-functional teams to analyze banking data trends and created clear, detailed documentation to support data processes and decision-making.

Designed and maintained efficient relational database schemas using MySQL and Oracle to support secure and high-performance banking applications.

Optimized complex SQL queries and stored procedures for faster data retrieval and reporting in financial systems.

Implemented MongoDB solutions for flexible storage and rapid retrieval of unstructured banking data, enhancing customer insights and operational efficiency.

Designed and deployed scalable data pipelines on AWS using services like S3, Glue, and Lambda to streamline data processing for banking analytics.

Implemented secure data storage and access controls on AWS to ensure compliance with financial industry regulations and protect sensitive banking information.

Automated data ingestion and transformation workflows leveraging AWS Glue and EMR, improving data availability and reducing processing time for business reporting.

Automated the provisioning and management of cloud infrastructure using Terraform, ensuring consistent and scalable environments for banking data applications.

Implemented CI/CD pipelines to automate the deployment of data engineering workflows, reducing manual errors and accelerating release cycles for banking projects.

Integrated testing and monitoring into CI/CD processes to ensure high-quality, reliable data pipelines that meet stringent financial compliance standards.

Managed source code and version control using Git and Bitbucket, enabling smooth collaboration across distributed data engineering teams.

Utilized Maven for efficient build automation and dependency management, ensuring reliable and consistent project deployments.

Applied Agile methodologies and unit testing frameworks to deliver robust, well-tested data pipelines that meet banking compliance and quality standards.

Collaborated with cross-functional teams to troubleshoot data issues and implement process improvements, enhancing overall data quality and project delivery

UnitedHealth Group, Minnetonka, MN Jan 2019 - Feb 2021

Role: AWS Data Engineer

Data Engineer with experience in designing and managing data pipelines and analytics solutions within the HealthStream healthcare learning platform to support compliance and workforce development.

Designed an efficient ELT framework using Spark and AWS Athena, reducing query execution time by 40%.

Leveraged Databricks and Snowflake to build scalable data pipelines and used Informatica to automate ETL workflows, ensuring efficient and reliable data processing for healthcare analytics.

Developed Python scripts to automate data extraction, transformation, and loading processes, enhancing efficiency and reducing manual errors.

Created reusable Python modules to integrate and analyze complex datasets, supporting accurate and timely data-driven decisions.

Designed and optimized Apache Spark pipelines to process large-scale datasets efficiently, enabling faster data analysis and reporting.

Utilized PySpark to develop scalable ETL workflows that improved data processing speed and reliability across diverse data sources.

Managed large-scale data storage and processing using Hadoop and HDFS, ensuring reliable and efficient handling of complex datasets.

Built and maintained distributed data processing workflows using Hive, MapReduce, and Spark, enabling scalable analytics on high-volume datasets.

Integrated tools like Sqoop, Flume, Oozie, and Zookeeper to automate data ingestion, scheduling, and coordination across complex big data environments.

Designed and managed data workflows using Apache Airflow, ensuring reliable scheduling, monitoring, and orchestration of end-to-end data pipelines.

Built and orchestrated data pipelines using AWS Glue and EMR for processing and loaded curated data into Redshift to enable fast and scalable analytics.

Developed real-time data streaming pipelines using Kafka and PySpark, enabling efficient processing of high-volume event data with minimal latency.

Wrote scalable data transformation jobs in Scala and PySpark, improving ETL performance and maintainability across distributed systems.

Analyzed complex datasets to extract actionable insights and created clear, well-structured documentation to support data processes and stakeholder understanding.

Designed and optimized relational database schemas using MySQL and Oracle, supporting efficient querying and reliable data storage for critical applications.

Worked with MongoDB to handle semi-structured data, enabling flexible data modeling and faster access to evolving business requirements.

Built end-to-end data pipelines on AWS using services like S3, Glue, and Lambda to automate data ingestion, transformation, and loading.

Managed scalable data processing jobs using EMR and integrated data into Redshift for fast, cost-effective analytics.

Implemented IAM roles, encryption, and resource tagging to ensure security, compliance, and operational transparency across AWS-based data workflows.

Used Terraform to automate and manage cloud infrastructure as code, ensuring consistent, repeatable, and scalable environment provisioning across projects.

Implemented CI/CD pipelines to automate the build, test, and deployment processes, reducing manual intervention and speeding up delivery cycles.

Integrated quality checks and monitoring into CI/CD workflows, ensuring reliable and secure deployments across development and production environments.

Collaborated using Git and Bitbucket for version control, enabling smooth code integration and team collaboration across development workflows.

Applied Agile methodologies and integrated unit testing frameworks with Maven builds to ensure code quality, consistency, and faster delivery cycles.

Documented technical processes, data flows, and best practices to support knowledge sharing and smooth onboarding across projects.

Electronic Arts (EA), Redwood City, CA Dec 2017 – Dec 2018

Role: Data Engineer

Data Engineer with strong experience in designing scalable data solutions, building efficient pipelines, and driving insights from large datasets using modern data platforms and tools.

Used Python to automate data workflows, perform transformations, and develop reusable modules for scalable and maintainable data engineering solutions.

Wrote complex SQL queries and optimized them for performance to extract, join, and aggregate large datasets across various sources for analytics.

Built robust ETL pipelines to extract data from multiple systems, transform it into structured formats, and load it into data warehouses for downstream analysis.

Utilized Hadoop and HDFS for distributed storage and processing of large datasets, ensuring fault tolerance and high availability.

Built and automated ETL pipelines using AWS Glue and S3 for seamless data movement and transformation.

Processed large-scale datasets using AWS EMR with Spark to enable distributed, high-performance data transformation.

Managed secure data storage on Amazon S3 with IAM roles and KMS for access control and encryption.

Integrated tools like Hive, Spark, and Sqoop within big data ecosystems to process and move high-volume data efficiently in distributed environments.

Worked with Snowflake for scalable data warehousing, building performant queries and managing secure, multi-cluster environments for analytics.

Designed and maintained relational databases like MySQL/Oracle and worked with MongoDB for flexible, schema-less data storage.

Managed data ingestion and optimization in Amazon Redshift, enabling fast querying and reporting over large-scale datasets

Developed scalable ETL and streaming pipelines using PySpark and Scala, and integrated Kafka for real-time data processing and event-driven architectures.

Used Git and Bitbucket for version control and collaboration, while applying unit testing practices to ensure code reliability and quality in production-grade pipelines.

Education:

Master’s degree in applied computer science from the Grand Valley State University in Grand Rapids, MI

Contact this candidate