Data Engineer Capture

Location:

Frisco, TX

Posted:

September 10, 2025

Contact this candidate

Resume:

Nagul Shaik

***** ***** ***** **, ******, TX, USA +1-978-***-**** *******************@*****.***

PROFILE SUMMARY

Results-driven Data Engineer with 5 years of experience designing and maintaining scalable data pipelines. Proficient in setting up change data capture and ETL transformations using Apache Spark for both streaming and batch processing. Demonstrated success in orchestrating raw data into queryable formats and optimizing data solutions on AWS. TECHNICAL SKILLS

• Programming & Scripting: Python, Java, Scala, C#, SQL, Bash, VB, R, Nodejs, JavaScript

• Big Data Technologies: Hadoop, Apache Spark, Hive, HDFS, Presto, Pig, Flink

• ETL/ELT Tools: Informatica, Talend, Apache, SSRS, Nifi, dbt, SSIS, Change Data Capture

• Data Warehousing: Snowflake, Redshift, Azure Synapse, Big Query, Teradata, Vertica

• Data Integration & Orchestration: Apache Airflow, Luigi, AWS Step Functions, Azure Logic Apps, GCP Data Fusion

• Streaming & Real-Time Processing: Apache Kafka, AWS Kinesis, Azure Event Hubs, Google Pub/Sub, Spark Streaming

• Database Systems: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, DynamoDB, Cosmos DB, HBase

• Business Intelligence Tools: Tableau, Power BI, Looker, AWS Quick Sight, Google Data Studio

• DevOps & CI/CD: Jenkins, GitLab CI/CD, Azure DevOps, AWS Code Pipeline, Docker, Kubernetes

• Machine Learning Integration: AWS Sage Maker, Azure ML, GCP Vertex AI

• Workflow Automation: Terraform, CloudFormation, Ansible

• Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket

• Cloud Platforms: AWS, Azure, Google Cloud (GCP), AWS Skillset

• Data Quality & Governance: Apache Griffin, AWS Deequ, Apache Hudi WORK EXPERIENCE

Albany International Nov 2023 - Present

AZURE Data Engineer Rochester, New Hampshire, USA

Developed and optimized end-to-end data solutions using Azure technologies to streamline data integration, analytics, and reporting, ensuring data quality, scalability, and compliance.

• Implemented and managed Azure Data Lake solutions for storing and processing large material datasets, resulting in improved data accessibility and processing efficiency.

• Enhanced data integration and processing workflows using Azure Synapse Analytics and Databricks, improving processing speed and reliability.

• Engineered real-time data streaming solutions leveraging Azure Stream Analytics, providing actionable insights and faster deci- sion-making.

• Created interactive dashboards and visualizations using Power BI, supporting material data decision-making with improved clarity.

• Automated deployment and testing processes using Azure DevOps for CI/CD pipelines, reducing deployment time and increasing testing accuracy.

• Optimized Azure SQL Databases for high-performance analytics, resulting in faster query responses and improved data analysis.

• Enhanced reporting accuracy and compliance by leveraging MS SSRS.

• Expanded data integration capabilities with cloud-based solutions through Azure Data Factory (ADF).

• Fostered improved project alignment and stakeholder satisfaction through effective business collaboration and engagement.

• Refined SSIS package development for data transformation and integration, enhancing overall data processing efficiency and accuracy.

• Constructed scalable data models to support analytical reporting and governance frameworks, improving consistency and reporting accuracy.

• Configured Apache Hadoop and Spark clusters on Azure HDInsight for big data processing, increasing capacity and speed.

• Collaborated using Jupyter Notebooks on Azure Databricks for data analysis and machine learning, enhancing teamwork and accuracy.

• Optimized data storage and query performance with Azure Synapse Analytics, resulting in faster data retrieval and improved efficiency.

• Applied Microsoft DevOps best practices for automating CI/CD of data workflows, increasing efficiency and reducing errors.

• Set up Azure Traffic Manager for load balancing and optimizing data service performance, enhancing service reliability and user experience.

• Monitored resource usage and costs with Azure Cost Management to maintain budget efficiency. C&S Wholesale Grocers Nov 2022 - Oct 2023

AWS Data Engineer Keene, New Hampshire, USA

Designed, constructed, and maintained data management systems and data analytic roadmaps, aligning with business and technology objectives.

• Developed and sustained data pipelines and workflows on AWS using AWS Glue, AWS Lambda, Amazon EMR, Amazon S3, and Amazon Redshift to improve processing efficiency.

• Engineered serverless solutions with AWS Lambda to automate data ingestion and processing, reducing manual intervention and accelerating processing.

• Established real-time data streaming solutions using Amazon Kinesis and Apache Kafka for high-velocity data processing, enhancing data availability and timeliness.

• Implemented CI/CD pipelines for data workflows with AWS Code Pipeline, AWS Code Build, and GitHub Actions to ensure rapid and reliable deployments.

• Utilized AWS Athena for ad-hoc querying and analysis of large datasets stored in S3, enabling quick insights.

• Performed advanced data transformations using Python and PySpark within AWS Glue and EMR environments, enhancing data quality and usability.

• Constructed interactive dashboards and visualizations with Amazon Quick Sight to drive data-driven decision-making.

• Secured scalability and high availability of data infrastructure through Auto Scaling and AWS Elastic Load Balancer (ELB), supporting business continuity.

• Optimized and maintained relational databases using SQL on Amazon RDS, Amazon Redshift, and DynamoDB to ensure high performance and data integrity.

• Processed and analyzed large-scale datasets using Hadoop ecosystem tools such as HDFS, Apache Hive, and Apache Spark on Amazon EMR, enabling efficient big data processing.

• Developed and managed ETL workflows with AWS Glue, Apache NiFi, and Talend to systematically extract, transform, and load data from diverse sources into data lakes and warehouses.

• Constructed scalable big data solutions using Apache Spark, Hadoop, Amazon EMR, and Kinesis, supporting data-driven strategies.

• Integrated and deployed machine learning models with Amazon SageMaker, TensorFlow, and PyTorch to facilitate predictive analytics.

• Formulated data pipelines for ingestion, transformation, and analysis in Snowflake, leveraging its multi-cluster architecture and seamless AWS integration for enhanced processing.

• Deployed serverless data processing workflows with AWS Lambda to automate real-time ingestion, transformation, and event-driven tasks across AWS services, improving operational efficiency.

• Streamlined continuous integration and deployment using Jenkins, AWS Code Pipeline, and GitLab CI/CD to accelerate deployment speed and reliability.

• Managed and optimized data infrastructure on Linux and Windows platforms for seamless AWS integration, high availability, and robust security configurations.

• Developed interactive and insightful Power BI dashboards and reports by integrating data from AWS Redshift, SQL databases, and Snowflake.

Liberty Mutual Insurance (Infosys) Nov 2021 - Jul 2022 Data Engineer Bangalore, India

Modernized and transformed enterprise data solutions on Tableau, simplifying complex software designs by deconstructing them into clear, understandable diagrams.

• Automated manual tasks such as data ingestion, metric computation, and metadata management, streamlining data visualization and reporting processes.

• Converted multiple SQL Server and Oracle stored procedures into Hadoop implementations using Spark SQL, Hive, Scala, and Java, thereby improving processing efficiency and scalability.

• Established a production data lake capable of managing transactional processing operations using Hadoop ecosystem components, enhancing storage and retrieval capabilities.

• Developed PySpark and Spark SQL code to process data on Amazon EMR, ensuring efficient data transformations and analysis.

• Validated and cleansed data using Pig statements and developed Pig MACROS, ensuring high data accuracy and reliability.

• Integrated Hadoop Big Data with ETL processes to streamline data extraction, loading, and transformation for ERP data, optimizing workflow efficiency.

• Conducted extensive exploratory data analysis using Teradata, enhancing dataset quality and generating actionable insights through Tableau visualizations.

• Leveraged Python libraries such as Pandas and NumPy to streamline data manipulation and analysis.

• Implemented natural language processing using the PyTorch library to enhance data analysis capabilities.

• Developed Tableau data visualizations to monitor day-to-day accuracy of models with incoming data.

• Presented and reported business insights using SSRS and Tableau dashboards, integrating various diagrams for comprehensive analysis.

• Utilized Jira for project management and Git for version control, ensuring collaborative and efficient program development. IDFC FIRST Bank Jun 2019 - Oct 2021

Data Engineer Bangalore, India

Designed and developed scalable data models to support business analytics, ensuring optimal performance and scalability for enhanced decision-making.

• Constructed, optimized, and maintained data pipelines for processing large-scale datasets, significantly improving processing efficien- cy.

• Integrated structured and unstructured data from multiple sources using ETL tools and data integration frameworks, enhancing data accessibility and analysis.

• Managed relational databases including SQL Server, PostgreSQL, and MySQL to boost query performance and optimize storage efficiency.

• Crafted data transformation workflows using SQL, Python, and Apache Spark, thereby increasing readiness and speed for analysis.

• Leveraged Hadoop and Apache Spark for distributed data processing, ensuring efficient handling of large datasets and supporting data-driven decisions.

• Engineered data warehousing solutions using Snowflake, Redshift, and Teradata, improving storage and retrieval capabilities.

• Automated data pipelines using Apache Airflow and Apache NiFi, facilitating seamless data flow and process orchestration.

• Enforced data governance policies to maintain data integrity, security, and compliance with GDPR and CCPA regulations.

• Developed custom reporting and dashboard solutions using Power BI and Tableau, enhancing business intelligence with superior data visualization.

• Implemented cloud-based data solutions with Azure, enabling scalable and cost-effective data storage and processing to support business growth.

• Designed and deployed real-time data streaming solutions using Apache Kafka for live data ingestion and processing. EDUCATION

Southern New Hampshire University 2022 - 2024

Master's, information technology

• GPA: 3.944

Certifications

• Professional Data Engineer Certification

• Azure data engineer certification

• Azure Data Fundamentals

• Aws Data Engineer Associate

Contact this candidate