Data Processing Software Development

Location:

St. Louis, MO

Posted:

February 20, 2025

Contact this candidate

Resume:

Dhanumjaya Naidu

SQL Developer -ETL AI Big Data Data Science

Phone: +1-316-***-****

Email: ****************@*****.***

Linkedin : www.linkedin.com/in/dhanunjai1

PROFESSIONAL SUMMARY:

6+ years of experience in IT, specializing in software development, data engineering, and performance optimization across diverse industry sectors.

Expertise in C/C++ architecture, including low-level system programming, memory management, multithreading, and performance tuning.

Mobile device experience (Android/iOS), with hands-on development, debugging, and optimization of mobile applications for seamless user experience.

Familiarity with Machine Learning models, implementing AI-driven solutions in data processing, predictive analytics, and automation workflows.

Experience with barcode scanners, integrating and optimizing barcode scanning technology for various applications, ensuring high-speed and accurate data capture.

Performance tuning expert, optimizing applications, databases, and algorithms to enhance speed, efficiency, and scalability.

Strong knowledge of Big Data technologies, including Hadoop ecosystem, Spark (PySpark), Hive, Kafka, and real-time data ingestion pipelines.

Designed and implemented job scheduling workflows using Automic, Control-M, and Apache Airflow to automate ETL pipelines and data processing tasks across Azure, AWS, and GCP environments.

Proficient in cloud platforms, including AWS (EC2, EMR, S3, Lambda, Glue, Athena, DynamoDB, IAM) and Azure (ADF, ADLS, Synapse, Databricks, Blob Storage, SQL Server, DevOps) for cloud-based data processing and analytics.

Experience in AWS EMR clusters, leveraging PySpark for large-scale data processing and storing results in AWS S3.

Expertise in SQL-based technologies, including Hive, Oracle, SQL Server, MySQL, T-SQL, PL/SQL, with strong skills in data modeling, querying, and optimization.

Optimized Spark and Python-based data workflows in Airflow by tuning parallel execution, implementing dynamic retries, and integrating with Azure Data Factory, Snowflake, and Databricks for scalable data processing.

Experience in data pipelines, developing and managing ETL workflows using Sqoop, Spark, HiveQL, Informatica, and PowerBI for effective data integration and visualization.

Strong background in real-time data processing, using Apache Kafka and Spark Streaming for ingesting and analyzing streaming data.

Hands-on experience with Azure services, such as Azure Data Lake, Synapse Analytics, Data Bricks, Data Factory, Logic Apps, and EventHub, facilitating cloud-based data solutions.

CI/CD pipeline implementation, using Jenkins, Bitbucket, and GitHub, ensuring seamless software development and deployment.

Experience in system-level debugging and troubleshooting, identifying performance bottlenecks and optimizing software for enhanced reliability.

Proficient in writing complex Spark SQL, Spark UDFs, and MapReduce jobs in Java, ensuring high-performance data transformations.

Familiar with Agile and Waterfall methodologies, collaborating effectively with cross-functional teams, managing client interactions, and delivering high-quality software solutions.

Excellent problem-solving and communication skills, with a strong ability to analyze technical challenges and implement innovative solutions.

Techincal Skills:

Programming Languages

C, C++, Java, Python, SQL, PL/SQL, HiveQL, Scala, T-SQL, PostgreSQL

Big Data Technologies

HDFS, MapReduce, Hive, Sqoop, Oozie, Zookeeper, Kafka, Apache Spark, Spark Streaming, PySpark

Cloud Services

AWS (EC2, EMR, S3, Athena, Lambda, Glue, DynamoDB, IAM), Azure (Data Factory, Data Bricks, Synapse, DevOps, Blob Storage, ADLS, EventHub, Logic Apps, Functional Apps), Snowflake, Google Cloud Platform (GCP)

Machine Learning & AI

ML Models Deployment, TensorFlow, Scikit-learn, AI-driven Data Processing, Data Wrangling

Performance Optimization

Azure Synapse Analytics, Teradata, Informatica, Power BI, SSIS, SSRS

Data Warehousing & ETL

Eclipse, Visual Studio, IDE Spring.

Streaming & Real-Time Processing

Apache Kafka, Spark Streaming, AWS Kinesis, Event-driven Data Pipelines

CI/CD & DevOps

Jenkins, Git, GitHub, Bitbucket, Azure DevOps, Docker, Kubernetes, Terraform

Databases

Oracle 11g/12c, MS SQL Server (2016/2014/2012), MySQL, Azure SQL DB, Cosmos DB, DynamoDB

IDE & Development Tools

Visual Studio, Eclipse, IntelliJ IDEA, Xcode, Android Studio

Software Development Methodologies

Agile (Scrum, Kanban), Waterfall, Test-Driven Development (TDD)

Version Control & Build Tools

Git, GitHub, Bitbucket, Maven, Gradle, Visual Studio, Eclipse

PROFESSIONAL EXPERIENCE:

Tailored Brands Houston, Texas June 2024 - Present

Senior Data Engineer SQL ETL

Responsibilities:

Implemented a real-time data ingestion pipeline using PostgreSQL and Apache Kafka, enabling seamless processing and analysis of streaming data for business intelligence.

Developed event-driven automation in Control-M and Automic, triggering real-time data ingestion from Kafka, Event Hub, and Cosmos DB, ensuring seamless integration with SQL and NoSQL databases.

Developed and optimized C/C++ applications for high-performance data processing and analytics, ensuring low-latency execution for critical business workflows.

Migrated legacy ETL processes from SAS to SQL, optimizing data workflows and improving processing efficiency.

Converted complex SAS scripts into SQL stored procedures to streamline data transformation and integration.

Analyzed existing SAS-based ETL logic and re-engineered it using SQL for better performance and scalability.

Worked extensively with mobile devices (Android/iOS) to integrate barcode scanning solutions, leveraging Zebra SDKs and Honeywell APIs for real-time inventory management.

Familiarity with Machine Learning models, collaborating with data scientists to integrate ML-driven insights into data pipelines using Azure Databricks and Python.

Developed and maintained automated data pipeline workflows using Apache Airflow, ensuring seamless scheduling and monitoring of ETL processes.

Experience with performance tuning, optimizing SQL queries, C++ applications, and PySpark transformations for high-efficiency data processing.

Integrated job scheduling tools with CI/CD pipelines in Azure DevOps and Terraform, enabling automated deployment of data engineering workloads in cloud and on-premises environments.

Integrated Snowflake with Azure services, including Azure Data Factory and Blob Storage, enabling seamless data migration and orchestration from on-premises systems to the cloud.

Implemented job scheduling automation using Azure Data Factory, optimizing data refresh and reducing manual intervention.

Configured and managed workload automation tools like Control-M to streamline batch job execution and improve system efficiency.

Designed and developed Azure Synapse dedicated SQL pools, incorporating materialized views, column store indexes, and Python-based stored procedures for optimized data retrieval.

Built scalable ETL pipelines using DBT, ensuring data transformation consistency and automation across diverse data sources.

Worked closely with teams to validate SAS to SQL migration results, ensuring data accuracy and consistency.

Optimized Azure Synapse workloads, leveraging Azure Databricks cluster policies, workload importance classification, and parallel pipeline execution via Azure Data Factory with Python monitoring.

Implemented CI/CD pipelines in Azure DevOps, automating data solution deployments to improve efficiency and reduce release cycle time.

Designed and scheduled cloud-based automation scripts in Python and Shell scripting to handle data processing, backups, and system monitoring.

Documented SAS to SQL conversion processes, providing clear guidelines for future data migration efforts.

Worked with Confluence for collaborative documentation, maintaining knowledge bases, data dictionaries, and best practices for data engineering workflows.

Developed barcode scanning and tracking solutions, integrating with PostgreSQL and Hasura's real-time capabilities for live data updates across retail applications.

Conducted SQL development, unit testing, and performance tuning, resolving issues based on defect reports and ensuring high-quality database performance.

Experience with Spark SQL, Presto, and BigQuery, using Python client libraries to optimize and build scalable analytics applications.

Implemented security best practices, including row-level security, dynamic data masking, and managed identity-based authentication in Azure Synapse and Databricks for secure data movement.

Led the migration of legacy data systems to a modern data lake architecture, ensuring business continuity while improving data accessibility and analytics capabilities.

Developed and maintained data profiling solutions, using Ganglia metrics, PySpark profiling functions, and Azure Data Factory monitoring to enhance data governance.

Implemented payment platform solutions, leveraging Azure Databricks Delta Live Tables, Unity Catalog, and MLflow, orchestrating complex Python and PySpark ML pipelines via Azure Synapse and message queues.

Worked with Informatica Data Transformations, parsing complex data files and loading structured and semi-structured data into Snowflake and Azure Synapse.

Automated testing and documentation in DBT, leveraging version control to track changes and ensure smooth collaboration among team members.

Environment: C, C++, Python, SQL, PL/SQL, T-SQL, PostgreSQL, Apache Kafka, Spark, PySpark, Hadoop, Hive, Presto, BigQuery, Snowflake, Azure (Data Factory, Synapse, Databricks, Blob Storage, ADLS, DevOps, EventHub), Android, iOS, Barcode Scanning (Zebra, Honeywell), SDK Integration, DBT, Informatica, Azure Synapse Analytics, Snowflake, SQL Server, Teradata, Cosmos DB, MySQL, Azure DevOps, Jenkins, GitHub, Bitbucket, Docker, Kubernetes, Terraform, Query Tuning, Indexing, Workload Classification, Multithreading, Memory, Confluence, Jira, Agile (Scrum, Kanban)

Tenet Healthcare - Dallas, TX March 2023 - May 2024

Data Engineer SQL ETL

Responsibilities:

Implemented a real-time data ingestion pipeline using PostgreSQL and Apache Kafka, enabling seamless processing and analysis of streaming healthcare data.

Developed and optimized C/C++ applications for high-performance data processing, ensuring efficient handling of large-scale medical and patient data.

Worked extensively with mobile devices (Android/iOS) to integrate barcode scanning solutions, leveraging Zebra SDKs and Honeywell APIs for real-time patient tracking and medication administration.

Familiarity with Machine Learning models, collaborating with data scientists to integrate ML-driven insights into data pipelines using Azure Databricks and Python for predictive healthcare analytics.

Experience with performance tuning, optimizing SQL queries, C++ applications, and PySpark transformations to enhance healthcare data processing efficiency.

Integrated Snowflake with Azure services, including Azure Data Factory and Blob Storage, enabling seamless data migration and orchestration from on-premises hospital systems to the cloud.

Developed Azure Synapse dedicated SQL pools, incorporating materialized views, column store indexes, Python-based stored procedures, and dynamic SQL queries, improving data retrieval efficiency for healthcare reporting.

Built scalable ETL pipelines using DBT, ensuring data transformation consistency and automation across electronic health records (EHR) and other hospital data sources.

Optimized Azure Synapse workloads, leveraging Azure Databricks cluster policies, workload importance classification, and parallel pipeline execution via Azure Data Factory with Python monitoring.

Implemented CI/CD pipelines in Azure DevOps, automating deployment of data solutions to enhance efficiency and reduce downtime for hospital data services.

Developed barcode scanning and tracking solutions, integrating with PostgreSQL and Hasura's real-time capabilities for real-time patient medication and supply chain tracking.

Conducted SQL development, unit testing, and performance tuning, resolving issues based on defect reports to ensure accurate and reliable patient data processing.

Worked with Presto, Hive, Spark SQL, and BigQuery, using Python client libraries to optimize analytics applications for healthcare reporting and compliance.

Implemented security best practices, including row-level security, dynamic data masking, and managed identity-based authentication in Azure Synapse and Databricks to ensure HIPAA compliance.

Led the migration of legacy healthcare data systems to a modern data lake architecture, ensuring business continuity while improving data accessibility and analytics capabilities.

Developed and maintained data profiling solutions, using Ganglia metrics, PySpark profiling functions, and Azure Data Factory monitoring to enhance data governance and compliance monitoring.

Built a payment platform leveraging Azure Databricks Delta Live Tables, Unity Catalog, and MLflow, orchestrating complex Python and PySpark ML pipelines via Azure Synapse and message queues for healthcare billing and claims processing.

Worked with Informatica Data Transformations, parsing complex data files and loading structured and semi-structured data into Snowflake and Azure Synapse for advanced healthcare analytics.

Automated testing and documentation in DBT, leveraging version control to track changes and ensure smooth collaboration among cross-functional teams.

Environment: C, C++, Python, SQL, PL/SQL, T-SQL, PostgreSQL, Apache Kafka, Spark, PySpark, Hadoop, Hive, Presto, BigQuery, Snowflake, (Data Factory, Synapse, Databricks, Blob Storage, ADLS, DevOps, EventHub), Android, iOS, Barcode Scanning (Zebra, Honeywell), SDK Integration, DBT, Informatica, Azure Synapse Analytics, Snowflake, SQL Server, Teradata, Cosmos DB, MySQL, Azure DevOps, Jenkins, GitHub, Bitbucket, Docker, Kubernetes, Terraform, Query Tuning, Indexing, Workload Classification, Multithreading, Memory Management, HIPAA, Row-Level Security, Dynamic Data Masking, Managed Identities, Role-Based Access Control (RBAC), Confluence, Jira, Agile (Scrum, Kanban)

IBM - India Feb 2020 - Dec 2022

Data Engineer

Responsibilities:

Designed and developed proof of concepts (POCs) in Spark using Scala, comparing Spark's performance with MapReduce and Hive for large-scale data processing.

Demonstrated expertise in C/C++ architecture, optimizing performance and implementing scalable solutions for high-throughput data processing applications.

Worked with mobile devices (Android/iOS) and barcode scanning solutions, integrating Zebra and Honeywell SDKs for seamless data capture and processing.

Familiarity with Machine Learning models, collaborating with data scientists to develop and integrate ML-driven insights using Azure Databricks and PySpark.

Implemented performance tuning strategies, optimizing SQL queries, Spark jobs, and C++ applications for high-efficiency data pipelines.

Proficient in Azure Cloud Services, including Azure Synapse Analytics, SQL Azure, Data Factory, Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Data Lake.

Developed batch and streaming pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipeline components for ETL processes.

Implemented incremental data ingestion pipelines in Azure Data Lake Storage (ADLS Gen2) and transformed data into Delta tables using PySpark.

Developed Azure Logic Apps to automate data ingestion workflows, such as processing email attachments and storing files in Azure Blob Storage.

Built and managed CI/CD pipelines using Azure DevOps, integrating GIT, Maven, and Jenkins plugins to streamline deployments.

Developed Spark Streaming applications for real-time analytics and integrated event-driven architectures using Azure Functions and Azure Logic Apps.

Created Hive tables, optimized data loading, and developed Hive UDFs for complex business logic execution within the Hive ecosystem.

Utilized JSON and XML SerDe's for serialization/deserialization of structured and semi-structured data into Hive tables.

Led migration of ETL processes from Oracle to Hive, ensuring seamless data transformation and analysis in the Hadoop ecosystem.

Developed Spark applications using PySpark and Spark SQL, processing data from multiple file formats and transforming it for analytical reporting.

Worked on converting dynamic XML data into HDFS, ensuring compatibility and integrity in large-scale data storage.

Transformed and copied JSON-based datasets from Azure Data Lake Storage into Synapse Analytics tables using Azure Databricks.

Utilized Azure Databricks and Azure Storage Accounts for real-time data extraction, cleansing, and publishing across multiple business units.

Created and automated infrastructure management using Azure Terraform modules, optimizing cloud resource allocation.

Configured Spark Streaming to receive real-time data from Apache Kafka, storing and processing streaming data using Scala in Azure Tables.

Integrated Hive Metastore with Spark SQL, leveraging HiveContext and SQLContext for efficient metadata management.

Managed version control using Git, ensuring seamless repository access and coordination with CI/CD tools.

Developed data warehouse models in Snowflake, working with over 100 datasets using WhereScape for optimized storage and retrieval.

Created secure data-sharing workflows between Snowflake accounts, enabling cross-environment access with performance optimizations.

Redesigned Snowflake views, improving query performance and reducing data processing latency.

Developed interactive Power BI reports, connecting directly to Snowflake for real-time business intelligence and visualization.

Environment: C, C++, Python, SQL, PL/SQL, Scala, Java, Shell Scripting, Azure (Data Factory, Synapse, Databricks, SQL Azure, ADLS, Key Vault, Monitoring, Blob Storage), AWS (EC2, S3, Lambda), Android, iOS, Barcode Scanning (Zebra, Honeywell), SDK Integration, DBT, Informatica, Snowflake, Azure Synapse Analytics, SQL Server, Oracle, Teradata, MySQL, Azure DevOps, Jenkins, GitHub, Bitbucket, Terraform, Docker, Kubernetes, Query Tuning, Indexing, Workload Classification, Multithreading, Memory Management, Power BI, Azure Analysis Services, Data Studio. Confluence, Jira, Agile (Scrum, Kanban)

Societe General - India Aug 2018 - Jan 2020

Hadoop Developer

Responsibilities:

Developed and managed large-scale data processing pipelines using Hadoop, Spark, and Hive, ensuring efficient and optimized data workflows.

Implemented and optimized ETL workflows to extract, transform, and load data from multiple sources into HDFS, Hive, and HBase.

Developed Spark jobs using PySpark and Scala, processing structured and semi-structured data for business intelligence and analytics.

Worked on data ingestion frameworks using Sqoop and Kafka, importing large datasets from SQL Server, Oracle, and other relational databases into Hadoop.

Assisted in performance tuning of Hive and Spark queries, optimizing execution plans and indexing for efficient data retrieval.

Managed Hive schema evolution, creating Hive tables, partitions, and bucketing strategies to improve query performance.

Utilized Apache Oozie workflows to automate and schedule Hadoop jobs, ensuring seamless execution of data pipelines.

Developed and maintained HDFS file management strategies, including data ingestion, compression techniques (Parquet, Avro), and archival.

Implemented real-time data ingestion pipelines using Apache Kafka and Spark Streaming, ensuring low-latency data processing.

Worked on SQL optimization and query performance tuning for efficient data retrieval in Hive, Spark SQL, and Presto.

Integrated Hadoop with various cloud environments, supporting data migration from on-premise databases to Azure and AWS cloud storage (S3, ADLS, Blob Storage).

Developed Shell scripts and Python scripts for job automation, data monitoring, and file system operations in the Hadoop ecosystem.

Collaborated with data engineering and analytics teams to support big data modeling, data cleansing, and business intelligence requirements.

Worked with Sqoop and Kafka to ensure seamless data integration, moving data from RDBMS to Hadoop-based data lakes for processing and analysis.

Supported data pipeline troubleshooting and debugging, resolving issues related to performance, memory utilization, and data consistency.

Utilized Git for version control, ensuring smooth collaboration across the team for maintaining and updating Hadoop jobs.

Implemented data security best practices in Hadoop, including Kerberos authentication, data masking, and encryption for compliance and data protection.

Assisted in documentation of big data workflows and best practices, ensuring knowledge sharing and smooth project handovers.

Environment: SQL Server, Oracle, MySQL, PostgreSQL, HBase, Python, Scala, SQL, T-SQL, Shell Scripting, Informatica, SSIS, Snowflake, HiveQL, Presto, Git, Bitbucket, Jenkins, Docker, Kubernetes, Query tuning, Indexing, Spark Optimization, Partitioning, Bucketing, Confluence, Jira, Agile (Scrum, Kanban)

EDUCATION:

Masters in computer science

Wichita State University (WSU)

Contact this candidate