NAGESH KUMAR KARUTURI
Email: ********@*****.***
Phone: 940-***-****
OBJECTIVE
To leverage my 17 years of experience in the IT industry, specifically as a Azure Data Engineer along with my extensive knowledge of various technology methodologies including Big Data analysis, design, and development using Hadoop, Snowflake Azure, Python, data Lake, Scala, and PySpark. My objective is to utilize my broad experience and technical expertise to design and implement high-performance data solutions that not only deliver significant business value but also empower data-driven decision-making processes.
PROFESSIONAL SUMMARY
9+ years of experience on Azure Data Engineer, Big Data with Hadoop, Hive, Map Reduce, Spark, Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API.
Accomplished Data Engineer with over 8 years of experience in designing and implementing scalable data ingestion pipelines using Big Data, Microsoft Azure Cloud, Python, PySpark, On-Prem to Azure Cloud Solutions.
Deployed and developed various data pipelines extensively and intricately harnessing the power of Azure Data Factory.
Demonstrated expertise in leveraging Azure Databricks for distributed data processing, transformation, validation, cleansing, and ensuring data quality and integrity.
Secured and managed sensitive cryptographic keys and secrets by leveraging Azure Key Vault, ensuring robust data protection and compliance in cloud-based environments.
Developed end-to-end data workflows using Azure Logic Apps, Azure Functions, and serverless solutions.
Demonstrated expertise in utilizing Azure Event Hub for real-time streaming data ingestion.
Proficient in utilizing Azure Synapse Pipelines for orchestrating and managing complex data integration and transformation workflows.
Extensive experience in working with Azure Blob Storage for efficient storage and retrieval of unstructured and semi-structured data.
Worked in complex SNOW SQL and Python Queries in Snowflake.
Created DB, Schema, Virtual Warehouses, Tables, Views, Stages, Sequence, and DB replication in Snowflake Cloud Data Warehouse.
Used Replication Task to have the data back up in Snowflake.
Designed and implemented complex ETL workflows using Apache Airflow, ensuring seamless data pipeline orchestration across diverse data sources
Developed custom scheduling strategies in Airflow to automate daily, weekly, and monthly data ingestion and transformation tasks, optimizing resource usage and ensuring timely data availability.
Optimized Airflow DAGs for high performance and scalability, reducing execution time by 30% through efficient task parallelization and resource management.
Worked on Snowflake advanced concepts like setting up virtual warehouse sizing, query performance tuning, Data Sharing, UDF, Zero Copy Clone, Time Travel, and Data Pipelines (Streams and Tasks).
Followed Snowflake best practices like Clustering the View and Tables, created Materialized views, enabled the Result cache, Resizing and Multi Clustering to improve the performance in Snowflake.
Designed dimensional model, data lake architecture, data vault 2.0 in Snowflake and used Snowflake logical data warehouse for compute.
Utilized Azure DevOps to streamline software development and deployment processes, enabling efficient collaboration, version control, and automated CI/CD pipelines for accelerated software delivery.
Proficient in scripting languages such as Python, PySpark and Scala, enabling seamless integration of custom functionalities into data pipelines.
Hands-On working experience with a diverse range of file formats, including CSV, JSON, Parquet, and Avro, to efficiently store, process, and exchange data within data engineering pipelines and analytics workflows.
Highly skilled in utilizing Hadoop, HDFS, Map-Reduce, Hive, and SparkSQL and for efficient ETL tasks, real-time data processing, and analytics.
Exceptional command over Kafka streaming technology, adeptly utilizing its distributed messaging capabilities to construct resilient and high-performing data flows.
Demonstrated mastery in utilizing Apache Sqoop for seamless import and export of data between HDFS and Hive, and expertly configures and manages workflow using Apache Oozie, Control M for effective scheduling and management of Hadoop jobs.
Expert in optimizing query performance in Hive and Spark by designing and implementing Bucketing and Partitioning strategies to enable efficient data retrieval and storage optimization.
Developed streamlined data ingestions and integrations in large-scale data engineering initiatives utilizing Tez.
Configured and managed Zookeeper to ensure efficient coordination and synchronization of distributed data processing systems.
Demonstrated ability to design and implement data integration strategies between Snowflake and external systems, leveraging technologies such as Apache Airflow or custom-built orchestration frameworks to ensure seamless data movement and synchronization.
Demonstrated expertise in implementing advanced serialization techniques to optimize data storage, transfer, and deserialization processes.
Adept at designing cutting-edge, cloud-based data warehousing solutions using Snowflake on Azure, optimizing schemas, tables, and views for streamlined data storage and retrieval.
Shown expert-level proficiency in using SnowSQL to retrieve and manipulate large datasets in Snowflake data warehouses.
Developed, enhanced and maintained Snowflake database applications, including crafting logical and physical data models, incorporating necessary changes and improvements.
Expertly defined roles and privileges to ensure controlled access to various database objects within the Snowflake ecosystem.
Collaborated seamlessly with data analysts and stakeholders to implement well-aligned data models, structures, and designs. years of experience in Java/J2EE development role in various domains.
WORK EXPERIENCE
From Nov 2022- Till Date in Freeman.
From Nov 2006 – Oct 2022 in Tata Consultancy Services.
From Feb 2006 – Oct 2006 in Siemens as Software Engg.
EDUCATION
Bachelor of Science in Computer Science 2001 from Kakatiya University
Master of Science in Computer Science, 2003, From Periyar University.
TECHNICAL SKILLS:
Cloud Technology : Azure data Factory, Azure Data Bricks,Logic Apps, Functional App, Snowflake, Azure DevOps.
Big Data Technologies : HDFS, MapReduce, Hive, Sqoop, Oozie, Zookeeper, Kafka, Apache Spark,SparkStreaming
Operating Systems : Amazon Linux AMI, Linux (Ubuntu, Centos, Red Hat, Debian), Solaris, Windows.
Java/J2EE Technologies : Java, Servlets/JSP, Struts, JMS, XML, BEA WebLogic, APACHE web server, Jboss
Virtualization Tools : VMware Workstation, Oracle VirtualBox, Vagrant.
Version Control Tools : GIT, SVN, GitHub, Bitbucket, GitLab.
CI/CD Tools : Jenkins, GitHub Actions.Azure Devops.
Containerization : Docker, Kubernetes.
Repo Management : Docker Hub.
Web/ App Servers : Nginx, Apache Tomcat, Jetty, Apache HTTP Server, WebLogic.
Scripting Languages : Bash, Shell, Python(boto3).
SDLC : Agile, Waterfall.
Bug Tracking Tools : JIRA.
Databases : SQL Server, MySQL, DynamoDB.
PROFESSIONAL EXPERIENCE:
Client
Freeman, USA
Role
Azure Data Migration Engineer
Location
FREEMAN, USA
Project Duration
Nov 2022 – Till Date
Designed and implemented end-to-end data pipelines using Azure Data Factory to facilitate efficient data ingestion, transformation, and loading (ETL) from diverse data sources into Snowflake data warehouse.
Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements
Developed Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
Deployed Azure Data Lake Storage as a reliable and scalable data lake solution, implementing efficient data 0partitioning and retention strategies to store and manage both raw and processed data effectively.
Employed Azure Blob Storage for optimized data file storage and retrieval, implementing advanced techniques like compression and encryption to bolster data security and streamline storage costs.
Integrated Azure Logic Apps seamlessly into the data workflows, ensuring comprehensive orchestration and triggering of complex data operations based on specific events, enhancing overall data pipeline efficiency.
Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
designed and developed metadata control tables for Data factory Pipelines to pass the parameters to the pipelines dynamically using pipeline parameters.
Troubleshot and resolved data processing issues promptly, minimizing downtime and ensuring uninterrupted data availability for analysis.
Managed codebase and deployments using Git, facilitating seamless collaboration with team members, and ensuring version control integrity.
Utilized SQL to normalize data structures and schemas, optimizing data storage and improving query performance for analytical workloads, while ensuring consistency and adherence to data governance standards.
Developed and deployed Azure Functions to handle critical data preprocessing, enrichment, and validation tasks within the data pipelines, elevating the overall data quality and reliability.
Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
implemented medium to large scale solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools
Implemented comprehensive monitoring and alerting for Airflow tasks using built-in logging and external tools like Prometheus and Grafana, ensuring proactive issue detection and resolution.
stablished robust error handling and retry mechanisms within Airflow workflows, minimizing data pipeline failures and ensuring data integrity.
Orchestrated complex Spark jobs within Airflow, leveraging its capabilities to handle large-scale data processing tasks efficiently, including both batch and real-time processing.
Developed custom Airflow operators and plugins to extend functionality, enabling seamless integration with proprietary tools and third-party APIs.
Implemented data quality checks and validation steps within Airflow pipelines to ensure data accuracy and consistency
Utilized version control systems (Git) and CI/CD pipelines to manage and deploy Airflow DAGs, ensuring consistent and reliable deployment across different environments.
Ensured data security and compliance within Airflow workflows by implementing role-based access control (RBAC), encryption, and adherence to industry standards and best practices.
Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
Engage with business users to gather requirements, design visualizations, and provide training to use self-service BI tools.
Develop conceptual solutions & create proof-of-concepts to demonstrate viability of solutions
Technically guide projects through to completion within target timeframes
Collaborate with application architects and DevOps
Identify and implement best practices, tools and standards Architect and implement ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Datawarehouse environment. experience in DWH/BI project implementation using Azure DF
Migrate data from traditional database systems to Azure databases
Interacts with Business Analysts, Users, and SMEs on requirements
Designs Logical and Physical Data Model for Staging, DWH and Data Mart layer. Recommend, Design and construct policies and standards that impact infrastructure operations and services but also improve overall business performance. (Strategic)
Provide leadership and work guidance to less experienced personnel (Leadership). Work with similar Microsoft on-prem data platforms, specifically SQL Server and related technologies such as SSIS, SSRS, and SSAS
Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
Identify potential problems and recommend alternative technical solutions
Participating in Technical Architecture Documents, Project Design and Implementation Discussions Azure Automation through Runbooks Creation, Migration of existing .PS1 scripts, Authorizing, Configuring, Scheduling
Configuring Network Security Groups (NSG). Work within and across Agile teams to design, implement, test, and support technical solutions across a full stack of technologies
Collaborate with application architects on infrastructure as a service (IaaS) application to Platform as a Service (PaaS)
Design and implementation of High Availability (HA) systems (24x7x365) an Expertise in setup High Availability and Recoverability of databases using with SQL Server technologies including Always On Azure VM
Architect, design and validate Azure infrastructure-as-a-Service (IaaS) environment
Cloud Security Architecture, Governance, Best practices, and tools
Azure Cloud Build and Automation using ARM template Deploying Azure Resource Manager JSON Templates from PowerShell worked on Azure suite: Azure SQL Database, Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, Azure Analysis Service
Worked on SQL Server Integration Services (SSIS), SSAS, SSRS,T-SQL skills, stored procedures, triggers SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) and Familiarity with Excel PowerPivot, PowerView, and PowerBI
Collaborated closely with cross-functional teams including data scientists, data analysts, and business stakeholders, ensuring alignment with data requirements and delivering scalable and reliable data solutions.
Environment: Azure Data Factory, Azure Databricks, Snowflake data warehouse, Azure Event Hubs, Azure Functions, Azure Data Lake Storage, Azure Blob Storage, Azure Logic Apps, Azure Machine Learning, Azure Monitor, Power BI, Azure Analysis Services, Apache Purview, Airflow.
Company
Tata Consultancy Services
Client
ING Bank
Role
Azure Data Engineer
Location
Bangalore. India
Project Duration
Jun 2020 – Oct 2022
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with MapReduce, Hive
Hands - on experience in Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
Created Batch & Streaming Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data.
Created and managed various types of Snowflake tables, including transient, temporary, and persistent tables, to cater to specific data storage and processing needs.
Implemented advanced partitioning techniques in Snowflake to significantly enhance query performance and expedite data retrieval.
Defined robust roles and access privileges within Snowflake to enforce strict data security and governance protocols.
Implemented regular expressions in Snowflake for seamless pattern matching and data extraction tasks.
Developed and implemented Snowflake scripting solutions to automate critical data pipelines, ETL processes, and data transformations,
Developed event-driven data mesh to combines the scale and performance of data in motion with product-focused rigor and self-service capabilities, putting data at the front and center of both operational and analytical use-cases.
Developed event-driven data mesh which reduces barriers between operational and analytical use-cases, while simultaneously enabling both real-time applications and batch-based jobs
Created Azure Data Factory (ADF) Batch pipelines to Ingest data from relational sources into Azure Data Lake Storage (ADLS gen2) & incremental fashion and then load into Delta tables after cleansing
Created Azure logic apps to trigger when a new email received with an attachment and load the file to blog storage
Implemented CI/CD pipelines using Azure DevOps in cloud with GIT, Maven, along with Jenkins plugins.
Build a Spark Streaming application to perform real-time analytics on streaming data.
Worked on snowflake connector for developing python applications.
Created roles and access level privileges and taken care of Snowflake Admin Activity end to end.
Implemented Snowflake Continuous Data Loading, Snowflake Time Travel & Fail-safe and secure Data Sharing.
Created conceptual, logical, and physical models for OLTP, Data Warehouse Data Vault and Data Mart Star/Snowflake schema implementations.
Build the logical and physical data model for Snowflake as per the changes required.
Loaded the tables from the azure data lake to azure blob storage for pushing them to snowflake.
Develop a Spark Streaming application that integrates with event-driven architectures such as Azure Functions or Azure Logic Apps.
Automated critical data workflows and monitoring tasks using Python scripting, saving over 20 hours of manual effort per week.
Use Spark Streaming to process events in real-time, and trigger downstream workflows based on the results.
Involved in creating Hive tables and loading and analyzing data using hive queries
Designed and developed custom HiveUDF’s
Using the JSON and XMLSerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables
Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
Implemented to reprocess the failure messages in Kafka using offset id
Used HiveQL to analyze the partitioned and bucketed data.
Worked on Infomatica Data Quality to resolve customers address related issues.
Designed a STAR schema for the detailed data marts and Plan data marts involving shared dimensions (Conformed).
Ensured the feasibility of the logical and physical design models.
Developed a Spark job in Java which indexes data into azure functions from external Hive tables which are in HDFS.
Written Hive queries on the analyzed data for aggregation and reporting
Developed Sqoop Jobs to load data from RDBMS to external systems like HDFS and HIVE
Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.
Worked on converting the dynamic XMLdata for injection into HDFS
Transformed and Copied data from the JSON files stored in a Data Lake Storage into an Azure Synapse Analytics table by using Azure Databricks
Azure Databricks, Azure Storage Account etc. for source stream extraction, cleansing, consumption and publishing across multiple user bases.
Created resources, using Azure Terraform modules, and automated infrastructure management.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data
Loading data from UNIX file system to HDFS
Configured spark streaming to receive real time data from the Apache Flume and store the stream data using Scala to Azure Tables.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
Used several RDD transformation to filter the data injected into SparkSQL
Used HiveContext and SQLContext to integrate Hive meta store and SparkSQL for optimum performance.
Used the version control system GIT to access the repositories and used in coordinating with CI tools.
Environment: Spark SQL, HDFS, Hive, Pig, Apache Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, IntelliJ, CI/CD, Oracle, Subversion, and Agile Methodologies.
Company
Tata Consultancy Services
Client
Lexmark
Role
Azure Data Engineer
Location
Bangalore. India
Project Duration
Jan 2018 – May 2020
Designed and implemented scalable data ingestion pipelines using Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs.
Developed robust data processing workflows leveraging Azure Databricks and Spark for distributed data processing and transformation tasks.
Ensured data quality and integrity through comprehensive data validation, cleansing, and transformation operations performed using Azure Data Factory and Databricks.
Leveraged Azure Synapse Analytics to seamlessly integrate big data processing and analytics capabilities, empowering data exploration and insights generation.
Automated data pipelines and workflows by configuring event-based triggers and scheduling mechanisms, streamlining data processing and delivery which resulted in 48% reduction in manual intervention.
Implemented comprehensive data lineage and metadata management solutions, ensuring end-to-end visibility and governance over data flow and transformations.
Identified and resolved performance bottlenecks within data processing and storage layers, optimizing query execution and reducing data latency.
Enforced advanced techniques such as partitioning, indexing, and caching in Snowflake and Azure services to enhance query performance and reduce processing time.
Conducted meticulous performance tuning and capacity planning exercises, ensuring scalability and maximizing efficiency within the data infrastructure.
Demonstrated proficiency in scripting languages like Python and Scala, enabling efficient data manipulation and integration of custom functionalities.
Developed and fine-tuned high-performance Spark jobs to handle complex data transformations, aggregations, and machine learning tasks on large-scale datasets.
Developed end-to-end data pipelines using Kafka, Spark, and Hive, enabling seamless data ingestion, transformation, and analysis.
Leveraged Kafka and Spark Streaming to process and analyze streaming data, contributing to real-time data processing and insights generation, improving real time analytics capabilities by 30%.
Utilized Spark core and Spark SQL scripts using Scala to expedite data processing and enhance performance.
Architected and implemented a cloud-based data warehousing solution utilizing Snowflake on Azure, harnessing its exceptional scalability and performance capabilities.
Created and optimized Snowflake schemas, tables, and views to facilitate efficient data storage and retrieval, catering to advanced analytics and reporting requirements.
Collaborated closely with data analysts and business stakeholders to deeply understand their needs and implement well-aligned data models and structures within Snowflake.
Executed Hive scripts through Hive on Spark and SparkSQL, effectively supporting ETL tasks, maintaining data integrity, and ensuring pipeline stability.
Proficiently worked within Agile methodologies, actively participating in daily stand-ups and coordinated planning sessions.
Environment: Azure Databricks, Data Factory, Snowflake Data Warehouse, Logic Apps, Functional App, Snowflake, SparkSQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Jenkins, Kafka, Power Bi, Spark Streaming.
Company
Tata Consultancy Services
Client
Deutsche Bank
Role
Data Engineer
Location
Singapore/India
Project Duration
May 2015 –Dec 2017
Imported Data using Sqoop to load Data from MySQL to HDFS on regular basis.
Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).
Load and transform large sets of structured, semi structured, and unstructured data.
Written Hive queries for data analysis to meet the Business requirements.
Built HBASE tables by leveraging on HBASE Integration with HIVE on the Analytics Zone.
Hands on experience in using Kafka, Spark streaming, to process the streaming data in specific use cases.
Developed data pipeline using Flume, Sqoop to ingest customer behavioral data histories into HDFS for analysis.
Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.
Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data.
Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.
Written Hive queries for data analysis to meet the Business requirements.
Built HBASE tables by leveraging on HBASE Integration with HIVE on the Analytics Zone.
Hands on experience in using Kafka, Spark streaming, to process the streaming data in specific use cases.
Developed data pipeline using Flume, Sqoop to ingest customer behavioral data histories into HDFS for analysis.
Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.
Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data
Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
Implemented automation for deployments by using YAML scripts for massive builds and releases
Migrated the existing data to Hadoop from RDBMS (Oracle) using Sqoop for processing the data.
Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
Using JIRA to manage the issues/project workflow.
Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Used Zookeeper to coordinate, synchronize and serialize the servers within the clusters.
Worked on Oozie workflow engine for job scheduling.
Used Git as version control tools to maintain the code repository.
Worked on SparkSQL using PySpark for analyzing and processing the data.
Worked closely with the team in fixing the JVM related issues.
Environment: SQoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, RDBMS, Python, PySpark, shall script, Ambari, JIRA.
Company
Tata Consultancy Services
Client
Deutsche Bank
Role
Java Developer/ Java Tech Lead
Location
Singapore/Australia/UK/India
Project Duration
Nov 2006 – April 2015
Involved in design, development, test changes for enhancements and setup of the CIT, SIT, UAT, staging environments.
Preparing High Level and Low-Level Design Documents for the enhancements.
Review the Code, Test Cases, and Technical Design Documents prepared by peers.
Ensure timely and continuous communication with the onshore counterpart.
Identify, recommend and participate in continuous improvement activities.
Ensure adherence to the application SLA and provide input to operational reporting.
Provide reports to management regarding tasks/assignments status.
Analyze, determine, and document service request and business requirements and determine and develop design specifications for required software modifications.
Work with the client business process analysts and application architects and Accenture team leads/business liaisons to translate service requests and business requirements into Interact and interface configuration/software requirements.
Support current configurations/systems through providing customer service, maintenance, and problem resolution.
Mobilize and motivate team to generate ideas which could bring about ReEngg saves and work on any performance or productivity improvement opportunities.
Ensure systems meet functional and performance standards.
Co-ordination among different affected groups for Code drops, UAT Deployment and outages through Bridge calls and Conference calls.
Co-ordinate with Project team, Change Management team and System assurance for smooth deployment of changes to production.
Serve as Single Point of Contact for any escalations from Customer and Team.
As a Release coordinator, was involved in set up of the CIT, SIT and UAT environments from scratch.
Conduct ordered and organized deployment that respects all change and release management processes.
Environment: Java, Servlets/JSP, Struts, JMS, XML, BEA WebLogic, APACHE web server, Maven, Ant, Oracle, Jenkins, Nginx, IBM MQ and JMS.