Data Engineer Senior

Location:

Lubbock, TX, 79415

Salary:

90000

Posted:

October 15, 2025

Contact this candidate

Resume:

Data Engineer

Name: Naga Venkata Balaram Mullapudi

Phone: +1-806-***-****

Email id: *****************@*****.***

● Results-driven Senior Data Engineer with over 7 years of experience designing, implementing, and optimizing cloud- based data solutions across AWS, Azure, Google Cloud Platform (GCP) to drive business insights and operational efficiency.

● Adept in multiple programming languages for building robust and scalable data solutions, including Python (Pandas, NumPy, Matplotlib), SQL, Java, Scala, PL/SQL, PowerShell, and JavaScript.

● Hands-on experience working with Amazon Web Services (AWS) using Elastic Map Reduce, S3, Glue, EMR, Redshift, Lambda, Athena, EC2 for data processing.

● Proficient in ETL and data integration, using tools such as AWS Glue, Informatica, Talend, Apache Airflow, DBT

(Data Build Tool) to automate and optimize data pipelines for seamless processing.

● Expertise in Azure services such as Databricks, Azure Delta Lake Gen 2, Azure Data Factory, Azure Synapse Analytics, Azure Cosmos DB, and Azure Blob Storage for comprehensive data engineering.

● In-depth experience with GCP services, including Big Query, Dataflow, Pub/Sub, Cloud Storage, Storage Transfer Service and Cloud Functions, and knowledgeable in Data Warehouse frameworks like Snowflake, providing scalable cloud- based data solutions.

● Hands-on experience with real-time data streaming and messaging technologies like AWS Kinesis and Apache Kafka

(Kafka Streams) for handling large-scale event-driven architectures.

● My expertise extends to data governance, security compliance (GDPR, HIPAA, SOX), and CI/CD automation using Terraform, Jenkins, and Azure DevOps.

● Hands-on experience with data modeling, including the design of Data Vault, Star/Snowflake schema normalization, and optimizing query performance through partitioning and indexing techniques.

● Expertise in big data and distributed computing technologies such as Apache Spark (PySpark, Spark-SQL), Hive, Presto, Apache Beam, Hadoop (HDFS, MapReduce) Databricks (Unity Catalog, Delta Lake) for scalable and high-performance data processing.

● Expert in data warehousing and database management across various platforms (Cassandra, MySQL, MongoDB, Snowflake, Redshift, HBase, Oracle) for efficient data storage, retrieval, and governance.

● Strong background in machine learning and advanced analytics, specializing in Scikit-learn, feature engineering, predictive modeling, and anomaly detection to drive data-driven strategies.

● Deep understanding of security and data governance, implementing AWS IAM, RBAC, Unity Catalog (Databricks), and Collibra (Metadata Management) to enforce compliance and secure sensitive data.

● Adept at implementing CI/CD pipelines using GitHub Actions, Docker, Kubernetes, Jenkins and Terraform, ensuring reliable and automated data workflows.

● Cloud Platforms & Services: AWS (Glue, RDS, Lambda, S3, EC2, Kinesis, IAM, Lake Formation, Redshift, Redshift Spectrum, Step Functions, Athena, EMR, CloudFormation, CloudWatch, Code Pipeline, Code Build); Azure (Azure SQL Database, Azure Synapse Analytics, Azure Data Lake Storage, Azure Functions, Azure DevOps, Azure Stream Analytics); Google Cloud Platform (GCP) (Dataflow, Dataproc, Cloud Functions, Big Query, Cloud Storage, Stack driver)

● Big Data & Distributed Computing: Apache Spark (PySpark, Spark-SQL), Apache Hive, Presto, Apache Beam, Hadoop

(HDFS, MapReduce), Databricks (Unity Catalog, Delta Lake)

● Databases & Data Warehousing: Cassandra, MySQL, MongoDB, Snowflake, Amazon Redshift, Apache HBase, Oracle

● ETL & Data Integration: AWS Glue, Informatica, Talend, Apache Airflow, DBT (Data Build Tool)

● Data Streaming & Messaging: AWS Kinesis, Apache Kafka (Kafka Streams)

● Programming Languages: Python (Pandas, NumPy, Matplotlib), SQL, Java, Scala, PL/SQL, PowerShell, JavaScript

● Data Visualization & BI: Tableau, Power BI, Matplotlib, D3.js, Excel (PivotTables)

● Machine Learning & Analytics: Scikit-learn, Feature Engineering, Predictive Modeling, Anomaly Detection

● DevOps & CI/CD: Jenkins, Azure DevOps, BitBucket, GitHub, Docker, Kubernetes, Terraform, AWS CloudFormation

● Security & Governance: AWS IAM, Data Governance Strategies, Role-Based Access Control (RBAC) PROFESSIONAL SUMMARY

TECHNICAL SKILLS

Technology

● Unity Catalog (Databricks), Collibra (Metadata Management)

● Other Tools & Technologies: RESTful APIs, Bugzilla, Agile, Scrum, Kanban, Confluence, JIRA

● AWS Certified Solutions Architect – Associate/Professional

● Microsoft Certified: Azure Data Engineer Associate Client: Southwest Airlines, TX

Professional Experience:

Client: Bank Of America, NC Aug 2023 – Till Date

Role: Data Engineer

Responsibilities:

● Designed, implemented, and optimized AWS Glue ETL pipelines to process large-scale datasets, ensuring high performance and data reliability.

● Administered AWS RDS and Cassandra databases, optimizing query performance, indexing, and partitioning strategies for scalability.

● Developed big data processing frameworks using Apache Spark to handle structured and unstructured datasets at scale.

● Advanced knowledge of programming languages such as Python, SQL, Java, and Scala was crucial for developing robust data solutions.

● Led Databricks platform management, utilizing Unity Catalog for secure data governance and implementing data quality frameworks to ensure clean and structured data for analytics.

● Integrated machine learning workflows into data pipelines, leveraging Python, Pandas NumPy for feature engineering and model deployment.

● Expertise in design, development and implementation of Enterprise Data Warehouse solutions using Mediation Zone Digital Route and Talend ETL Big Data Integration suite version 6.2

● Developed and optimized Amazon Redshift data warehouses, leveraging Redshift Spectrum for querying external datasets stored in AWS Lake Formation.

● Managed version control and collaborated with development teams using GitHub, Jenkins following Agile and Kanban methodologies for efficient project execution.

● Established CI/CD pipelines with Jenkins, streamlining deployment processes and maintaining high software quality.

● Architected and maintained a data warehouse using Snowflake, ensuring efficient data storage, retrieval, and analytics.

● Integrated Terraform for infrastructure as code (IaC), ensuring repeatable, scalable, and automated cloud deployments.

● Designed and deployed serverless architectures using AWS Lambda to automate data ingestion and processing workflows.

● Implemented AWS Kinesis for real-time data streaming and event-driven architectures, ensuring seamless integration with downstream analytics platforms.

● Extensive experience and proficiency on the design, construction and implementation of Business Intelligence tools (Star Schema, Data-Warehouses, OLAP cubes, Partitioning and Aggregations)

● Built AWS Glue Data Catalog solutions for metadata management, enabling efficient data discovery and governance across multiple AWS services.

● Deployed and orchestrated containerized workloads using Kubernetes, Docker, ensuring scalability and high availability of data pipelines.

● Optimized MapReduce and Apache Beam jobs to process massive datasets efficiently, improving ETL performance across distributed systems.

● Designed and deployed AWS CloudFormation templates to automate infrastructure provisioning and management.

● Monitored and optimized data workflows using AWS CloudWatch, enabling proactive issue detection and resolution.

● Engineered scalable Hadoop solutions for distributed data processing, optimizing resource utilization cost-effectiveness. Environment: AWS Glue, Apache Spark, Spark-SQL, AWS RDS, Cassandra, Databricks, Unity Catalog, Python, Pandas, NumPy, SQL, Apache Hive, Presto, Amazon S3, Amazon Redshift, AWS Lambda, AWS Kinesis, RESTful APIs, AWS Code Pipeline, AWS Code Build, Kubernetes, Docker, JavaScript, D3.js, MapReduce, Apache Beam, CloudFormation, Hadoop. CERTIFICATIONS

EXPERIENCE

Client: Elevance Health (Legato), Bangalore, India Feb 2021–July 2023 Role: Data Engineer

Responsibilities:

● Developed scalable data architecture solutions to ensure high availability, data integrity, and consistency across enterprise applications.

● Ensured compliance with HIPAA regulations and implemented data masking, encryption, and access controls using Cloud DLP and IAM.

● Leveraged Apache Spark and PySpark for distributed data processing, enhancing ETL performance and enabling large-scale analytics.

● Developed scalable, server-side applications using dynamic OOP languages like Scala, Java, and Python, applying advanced design patterns and algorithms.

● Designed and implemented scalable cloud-based data processing solutions using Google Cloud Platform (GCP) services like Dataflow, Dataproc, and Cloud Functions.

● Automated complex data workflows using PowerShell and Azure Functions, streamlining ETL processes and reducing manual intervention.

● Engineered Azure Synapse Analytics solutions for enterprise-wide data warehousing, enabling advanced analytics and seamless big data integration.

● Designed, implemented, and maintained Azure Data Lake Storage (ADLS) solutions to store and manage massive structured and unstructured datasets.

● Developed, optimized, and maintained SQL queries, stored procedures, and scripts for data transformation, analytics, and reporting.

● Led Azure DevOps CI/CD pipeline implementations, automating deployment workflows, version control, and infrastructure as code.

● Utilized Pandas and NumPy for data transformation, cleaning, and statistical analysis, improving efficiency in ETL pipelines.

● Managed source code, collaboration, and version control using BitBucket, ensuring smooth integration with development and deployment workflows.

● Designed and implemented role-based access controls (RBAC) to enforce secure data access and safeguard sensitive business information.

● Created and maintained business intelligence reports and dashboards using Power BI, enabling stakeholders to make data- driven decisions.

● Applied machine learning techniques for predictive modeling, anomaly detection, and data-driven insights to enhance business intelligence.

Environment: Azure SQL Database, Apache Spark, PySpark, PowerShell, Azure Functions, Azure Synapse Analytics, Azure Data Lake Storage (ADLS), SQL, Azure Stream Analytics, Azure DevOps, CI/CD, Pandas, NumPy, BitBucket, RBAC, Power BI, Machine Learning.

Client: Metamorph IT Systems PVT LTD, Hyderabad, India July 2018 – Jan 2021 Role: ETL Developer

Responsibilities:

● Migrated traditional ETL processes to a cloud-based Data Lake by implementing Spark and Hive jobs, enhancing data processing efficiency.

● ETL design of the flow and the logic for the Data warehouse project using Informatica, Shell Scripting.

● Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.

● Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.

● Created various UNIX scripts such as automated creation of Informatica mapping for common logic, Data archiving.

● Extensive knowledge of Big-data, Hadoop, Hive, Sqoop, HDFS, NoSQL Databases such as MongoDB and Cassandra and other emerging technologies.

● Created and used Data Sharing techniques in Snowflake to share data with non-snowflake users, creates roles and policies to achieve data masking as per the user role.

● Created Integrated test Environments for the ETL applications developed in GO-Lang using the Dockers and the python APIs.

● Designed ETL packages dealing with different data sources (SQL Server, CSV, Flat Files etc) and loaded the data into target data sources by performing different kinds of transformations using SQL Server Integration Services (SSIS).

● Developed data integration solutions using Talend, enabling efficient data extraction, transformation, and loading (ETL) from diverse data sources.

● Built data pipelines to ingest and process data in real-time using AWS Glue, Lambda, and Kinesis, supporting real-time analytics and reporting.

● Involved in developing test automation scripts using Perl/Python.

● Worked closely with the data scientists for migrating the prediction algorithms/models to Python sciKit-learn API from R-studio and also Involved in the feature selection for creating the prediction models.

● Extracted data from flat files and Sql Server and Sybase and Oracle Database and applied business logic and loaded in to oracle and Sybase Warehouse.

Environment: AWS (S3, EMR, Redshift, Glue, Lambda, CloudFormation), PL/SQL, PySpark, Spark, Hive, HDFS, Kafka, Sqoop, Oozie, HBase, MapReduce, Snowflake, Talend, Kafka, Spark Streaming.

Contact this candidate