Data Engineer Software Development

Location:

United States

Salary:

$70/hr on C2C

Posted:

March 05, 2025

Contact this candidate

Resume:

Nandakumar Konda

Professional Summary:

Over 11 years of experience in Data Engineer and Data Analytics, Hadoop, Snowflake, Database Administration and Software development expertise.

Experience in a variety of industries working on Big Data technology using technologies such as Cloudera and Hortonworks distributions. Hadoop working environment includes Hadoop, Spark, MapReduce, Kafka, Hive, Ambari, Sqoop, HBase, and Impala.

Integrated different data sources, Data wrangling, Cleaning, transforming, merging and reshaping data sets by writing Python scripts.

Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.

Hands on experience & Fluent programming experience with Scala, Java, Python, SQL, T - SQL, R.

Good knowledge of Amazon AWS concepts like EMR & EC2 web services which provide fast and efficient processing of Big Data.

Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.

Strong database development skills using Database servers like Oracle, IBM DB2, My SQL and hands on experience with SQL, PL/SQL. Extensive experience of backend database programming in oracle environment using PL/SQL with tools such as TOAD.

Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.

Skilled in developing applications in Python language for multiple platforms familiarity with process and Python software development

Adept at configuring and installing Hadoop/Spark Ecosystem Components.

Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse.

Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.

Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming.

Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.

Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka

Worked on reading multiple data formats on HDFS using Scala. Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.

Hands on experience in using other Amazon Web Services like Autoscaling, RedShift, DynamoDB, Route53.

Experience with operating systems: Linux, RedHat, and UNIX.

Experience in analyzing data using HiveQL, Pig, HBase and custom MapReduce programs in Java 8.

Experience working with GitHub/Git 2.12 source and version control systems.

Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.

Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS.

Design and implement Snowflake-based data solutions, including data warehousing, data lakes, and data marts, to support various business needs.

Develop and maintain ETL (Extract, Transform, Load) pipelines to move and transform data from various source systems into Snowflake.

Proficient in designing and implementing data solutions using Snowflake, a cloud-based data warehousing platform.

Hands-on experience in creating and managing Snowflake warehouses, databases, schemas, tables, and views.

Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.

Comprehensive experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, filtering and data aggregation. Also possess detailed knowledge of MapReduce framework.

Ample knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data modeling, data mining, machine learning and advanced data processing.

Experience working with NoSQL databases like Cassandra and HBase and developed real-time read/write access to very large datasets via HBase.

Utilized Snowflake's built-in query optimization features to enhance query performance and reduce execution times.

Implemented normalized and denormalized data models based on specific business requirements.

Designed and developed ETL pipelines to extract, transform, and load (ETL) data from diverse sources into Snowflake using Snowflake's native features or third-party tools.

Education:

MS in Information Assurance from Wilmington university in DE (2015 to 2016)

Bachelor’s in computers from Osmania University 2010

Technical Skills:

Big Data Eco System

Hadoop 2.1, HDFS, Hive0.13, HBase Sqoop, Zookeeper 3.4.5,Yarn, Spark Streaming, Spark SQL, Kafka, Cloudera CDH3, CDH4, Hortonworks, Oozie, Apache Airflow, Flume, Impala, Talend, Tableau

Hadoop management & Security

Hortonworks Ambari, Cloudera Manager, Kafka

NoSQL Databases

MongoDB, HBase, Redis, Couchbase and Cassandra

Server-Side Scripting

UNIX Shell Scripting

Database

Oracle 11g/10g/9i/8i, MS SQL Server 2012/2008, DB2 v8.1, MySQL, Teradata, Snowflake

Scripting Languages

Python, Shell Scripting,

OS/Platforms

Windows7/2008/Vista/2003/XP/2000/ Linux (All major distributions, mainly Centos and Ubuntu), Unix

Methodologies

Agile, UML, Design Patterns, SDLC

Tools

Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, SQL Oracle Developer, Teradata, Alteryx

Office Tools

MS Office - Excel, Word, PowerPoint

PROFESSIONAL EXPERIENCE:

ALBERTSONS – Plano Texas

AUG’2023 – Till Date

Sr. Data Engineer

Albertsons Companies is committed to helping people live healthier lives by providing quality food and pharmacy services. The company's mission is to be locally great while being nationally strong, and to make a positive impact on the communities it serves.

Responsibilities

Designed and developed multiple Audi components with airflow and Apache beam

Migrated on-prem ETL’s from SQL Server to Azure Cloud using Azure Data Factory and Databricks.

Performed Data transformation using Azure Data factory using data pipeline and dataflow, bulk load and copy utilities for various transformations.

Utilized Databricks SQL & Spark SQL to analyze and transform large datasets efficiently.

Developed a component to dynamically parse the input configs and render the dags in airflow with multiple dependencies

Developed dataflow components to consume the data from files and JSON, CSV and implemented the tokenization and loading to tables

Built data ingestion and transformation scripts using Python with Pandas, NumPy, and PySpark for scalable data processing.

Developed the JDBC to BQ migration components to user dynamically configure and execute them with custom template and flex templates

Analysed the debugging the prod issues and helping users

Document creation and maintaining the products day to day

Involved the migration projects from different legacy platform to cloud based solutions and used technologies like big query, Apache beam, airflow and python, PySpark, Databricks

Used Data Flows in ADF to perform complex transformations without writing code.

Designed and developed migration product from snowflake to BigQuery and BigQuery to snowflake

Environment: Snowflake, Spark, PySpark, SQL, Python, Apache Airflow, Apache Beam, JSON, CSV, Azure Databricks,Sql Server, SQL Stored procedures, Azure Data factory, Azure Databricks, Azure Devops, SSAS, SSRS, MS Excel, DDL, DML.Spark Core, Python, MS Azure, Python, OLAP, Pyspark, Kubernetes.

PROFESSIONAL EXPERIENCE:

Elevance Health know as Anthem Inc - Richmond, VA

Dec’2021 – JULY 2022

Sr. Data Engineer

Anthem, Inc., is a provider of health insurance in the United States. It is the largest for-profit managed health care company in the Blue Cross Blue Shield Association.

Responsibilities:

Performed Data Cleansing using Python and loaded into the target tables.

Implemented Spark Scripts using, Spark SQL to access hive tables into spark for faster processing of data.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.

Implemented Spark RDD transformations, actions to implement business analysis.

Worked on Spark Streaming and Spark SQL to run sophisticated applications on Hadoop.

Used Oozie and Oozie coordinators to deploy end to end processing pipelines and scheduling the workflows.

Developed Python-based APIs for data integration and real-time streaming using FastAPI and Flask.

Automated data validation and quality checks using Python libraries like Great Expectations and pytest.

Created and maintained technical documentation for launching Hadoop Clusters and executing Script.

Collaborated with data analysts and business stakeholders to define data requirements and create Snowflake views and materialized views.

Developed ETL pipelines in Databricks using PySpark to process large-scale structured and unstructured data.

Experience working with Snowflake within cloud platforms such as AWS, Azure, or Google Cloud, leveraging cloud-native services for data processing and storage.

Work closely with cloud infrastructure teams to manage the deployment, monitoring, and scaling of Snowflake resources.

Implemented role-based access control (RBAC) within Snowflake to enforce data security and manage user access to databases, tables, and views.

Utilized Snowflake's security features to ensure data encryption, authentication, and compliance with industry regulations.

Collaborate with data analysts, data scientists, and other stakeholders to understand data requirements and translate them into technical solutions.

Utilized Databricks SQL & Spark SQL to analyze and transform large datasets efficiently.

Optimize and tune Snowflake databases and queries for performance and scalability.

Part of developing a fully operational production grade large scale data solution on Snowflake Data Warehouse.

Configured triggers, event-based execution, and monitoring for efficient pipeline automation in ADF.

Daily basis Extracting, loading, or transforming data that needs to be Automated using airflow in Apache.

Environment: Hadoop, CDH 4, CDH 5, HDFS, Hive, Sqoop, HBASE, Spark SQL, Snowflake, Teradata, Microsoft SQL server Management Studio, Spark-Streaming, CntrlM Scheduler, Python, Microsoft Visual Studio., UNIX and Shell Scripting.

BLUE CROSS BLUE SHIELD, DURHAM, NC.

NOV 2019 – Nov 2021

Position: Data Engineer.

Description:

Blue Cross Blue Shield Association is a federation of 36 separate United States health insurance companies that provide health insurance in the United States to more than 106 million people. The BCBS manages communications between its members and the operating policies required to be a licensee of the trademarks.

Job Responsibilities:

Extract data from multiple sources, integrate disparate data into a common data model, and integrate data into a target database, application, or file using efficient programming processes

Document, and test moderate data systems that bring together data from disparate sources, making it available to data scientists, and other users using scripting and/or programming languages

Write and refine code to ensure performance and reliability of data extraction and processing

Configured Databricks clusters (Standard & High Concurrency) and optimized resource allocation for cost efficiency.

Participate in requirements gathering sessions with business and technical staff to distill technical requirements from business requests.

Develop SQL queries to extract data for analysis and model construction

Own delivery of moderately sized data engineering projects

Integrated ADF with Databricks, Azure SQL Database, ADLS, and Synapse Analytics for enterprise-level data management.

Define and implement integrated data models, allowing integration of data from multiple sources

Design and develop scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, integration, and validation required to provide access to prepared data sets to analysts and data scientists

Used Python logging and exception handling to improve the reliability of data pipelines.

Define and implement data stores based on system requirements and consumer requirements

Document and test data processes including performance of through data validation and verification

Collaborate with a cross-functional team to resolve data quality and operational issues and ensure timely delivery of products

Develop and implement scripts for database and data process maintenance, monitoring, and performance tuning.

Analyze and evaluate databases to identify and recommend improvements and optimization

Environment: SQL, Snowflake, Teradata, Teradata BTEQ, Microsoft SQL server Management Studio, Impala, Alteryx, Spark-Streaming, CntrlM Scheduler, Python, R Programming, Microsoft Visual Studio, Tableau, UNIX and Shell Scripting.

NTT DATA/Texas Department of Transportation, Austin TX

Nov 2018 – Oct 2019

Sr Data Engineer

Description: NTT DATA is basically an IT services company. It is a SAP gold partner in implementing SAP systems. There is one more company in India called NTT DATA GDS (Global Delivery Services) which is services company. Its business areas are in national and local governments, financial, and telecommunication sectors.

Responsibilities:

Hadoop development and implementation (Environment - HDFS, HBase, Spark, Kafka, Ozie, Apache Airflow, Scoop, Flume, Kerberos, Oracle ASO, MySQL)

Designed and implemented ADF Pipelines to orchestrate ETL workflows for ingesting and transforming data from multiple sources.

Loading from disparate data sets using Hadoop stack of ingestion and workflow tools

Designing, building, installing, configuring and supporting Hadoop.

Configure and implementation of Data Marts in Hadoop platform

Involved in loading data from Teradata, Oracle database into HDFS using Sqoop queries.

Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.

Responsible for importing log files from various sources into HDFS using Flume.

Imported data using Sqoop to load data from MySQL to HDFS on regular basis.

Developed workflow in Oozie or Airflow to automate the tasks of loading the data into HDFS and processing with Sqoop and Hive.

Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.

Monitoring the daily schedule jobs in Apache Airflow and find out solutions for failure jobs and provide immediate solutions.

Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

Loaded multiple NOSQL databases including MongoDB, PostgreSQL, Couchbase, HBase and Cassandra.

participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.

Setting up Snowflake connections through private links from AWS EC2 and AWS EMR to secure data transfers between application and database.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, Kafka, Oozie, and Big Data, Python, Data tax, Flat files, MySQL, Toad, Windows NT, LINUX, Cassandra UNIX, SVN, Hortonworks Cassandra, AVRO Files, SQL, ETL, DWH, Cloudera Manager, MongoDB.

First Data, Atlanta, GA

May 2018 – Oct 2018

Position: Data Engineer

Description: First Data Corporation provides electronic commerce solutions for merchants, financial institutions, and card issuers worldwide. It operates through three segments: Global Business Solutions (GBS), Global Financial Solutions (GFS), and Network & Security Solutions (NSS).

Responsibilities:

Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.

Worked in AWS environment for development and deployment of Custom Hadoop Applications.

Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka.

Developed Spark scripts by using Scala shell commands as per the requirement.

Implemented discretization and binning, data wrangling: cleaning, transforming, merging and reshaping data frames using Python.

Created Hbase tables to store various data formats of PII data coming from different portfolios.

Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis

Involved in creating Pig tables, loading with data and writing Pig Latin queries which will run internally in Map Reduce way.

Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.

Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.

Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.

Developed Spark code to using Scala and Spark-SQL for faster processing and testing.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

Worked on python files to load the data from CSV, JSON, MySQL, hive files to Neo4j Graphical database.

Handled Administration, installing, upgrading and managing distributions of Cassandra.

Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.

Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.

Experience in integrating Apache Kafka with Apache Storm and creating Storm data pipelines for real time processing.

Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.

Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.

Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Meta store management.

Worked with Talend on a POC for integration of data from the data lake.

Highly involved in development/implementation of Cassandra environment.

Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Python, Hive, Spark, Hue, Pig, Sqoop, Kafka, AWS, Avro, HBase, Oozie, Cassandra, Impala, Zookeeper, Talend, Teradata, Oracle 11g/10g, Python, (jdk1.6), Scala, UNIX, SVN, Hortonworks, Maven.

Abbott Laboratories, IL

Jan 2017 – May 2018

Position: Data Engineer

Description: Abbott Laboratories is an American health care company with headquarters in Lake Bluff, Illinois, United States. The company was founded by Chicago physician Wallace Calvin Abbott in 1888 to formulate known drugs; it eventually grew to also sell research-based drugs, medical devices, diagnostics, and nutritional products.

Responsibilities:

Manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.

Extracted files from MongoDB through Sqoop and placed in HDFS and processed.

Created customized BI tool for manager team that perform Query analytics using HiveQL.

Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.

Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.

Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.

Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.

Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.

Collaborated with Business users for requirement gathering for building Tableau reports per business needs.

Experience in Upgrading Apache Ambari, CDH and HDP Cluster.

Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.

Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.

Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.

Converting queries to Spark SQL and using parquet file as storage format.

Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.

Written spark programs in Scala and ran spark jobs on YARN.

Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.

Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.

Setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.

Worked with BI teams in generating reports and designing ETL workflows on Tableau.

Knowledgeable on Talend for Data integration purposes.

Experienced in Monitoring Cluster using Cloudera manager.

Environment: Hadoop, HDFS, HBase, MapReduce, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Python, Hive, Sqoop, Flume, Impala, Oozie, Hue, Zookeeper, Kafka, AWS, Cassandra, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.

Cardinal Health, Dublin, OH

July 2016 – Dec 2016

Role: Data Engineer

Description:

Cardinal Health is improving patient care through an optimized supply chain using analytic innovations with Teradata and Hadoop. Employing a unified data strategy has allowed Cardinal Health to see significant business value including a 50%-time savings for end users working with raw data. I was part of the Hadoop development team responsible for improving the costumer experience through Hadoop advantages.

Responsibilities:

Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.

Creating multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.

Successfully loading files to Hive and HDFS from Oracle, SQL Server using Sqoop.

Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Creating Hive tables, loading with data and writing Hive queries.

Involved in Spark for fast processing of data. Defining job flows.

Using Hive to analyze the partitioned data and compute various metrics for reporting.

Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations.

Unit testing and delivered Unit test plans and results documents.

Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization purposes.

Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Hive, Sqoop, HBase, Kafka, AWS, Oozie, Zookeeper, Spark,

Open Solutions, India

Sept 2013 – Sep 2015

Role: Associate Developer

Responsibilities:

Executed end to end scenarios for Migration testing with DB2, Oracle and SQL Server (XMETA and IADB) on 91 MBR Refresh and 9.1 FP1,912

Designed, developed, and optimized complex SQL queries, stored procedures, and functions to support business analytics and reporting requirements.

Created and maintained database schemas, tables, views, indexes, and triggers to enhance data integrity and access speed.

Performed query optimization and tuning to reduce processing time by up to 30%, improving overall database performance.

Involved in installing IS on different Windows 2008, AIX, RH with Oracle, DB2 and SQL Server.

Involved in applying BAR and Migration patches on IS.

Executed Backup and Restore Scenarios.

Opening and closing the Work items in RTC tool (Rational Team Concert).

Reported defects and validated them once they were fixed.

Involved in UI testing with the latest builds.

Environment: DataStage 8.x, UNIX, AIX. SQL, Data Models, SQL/PLSQL, Linux, Unix.

Contact this candidate