Data Engineer Sr

Location:

Tampa, FL

Posted:

May 06, 2025

Contact this candidate

Resume:

Bala Srikanth Gandiboina

SR DATA ENGINEER

***************@*****.***

813-***-****

Linkdin: https://www.linkedin.com/in/bala-srikanth-0668a2138/ Professional Summary:

6+ years of IT experience in a variety of industries working on Big Data technology using technologies such as Cloudera and Hortonworks distribuƟons. Hadoop working environment includes Hadoop, Spark, MapReduce, KaŅa, Hive, Ambari, Sqoop, HBase, and Impala.

Fluent programming experience with Scala, Java, Python, SQL, T - SQL, R.

Hands-on experience in developing and deploying enterprise-based applicaƟons using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, KaŅa.

Excellent knowledge on cloud compuƟng and IaaS, PaaS, SaaS methodologies using AWS, Azure and GCP

Hands on experience on Google Cloud Plaƞorm (GCP) in all the bigdata products BigQuery, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service)

Adept at conﬁguring and installing Hadoop/Spark Ecosystem Components.

Proﬁcient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory compuƟng capabiliƟes wriƩen in Scala. Worked with Spark to improve eﬃciency of exisƟng algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.

Experience in applicaƟon of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured ﬁles into a data warehouse.

Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.

Experience in ExtracƟon, TransformaƟon and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecƟng, aggregaƟng and moving data from various sources using Apache Flume, KaŅa, PowerBI and MicrosoŌ SSIS.

Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.

Comprehensive experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, ﬁltering and data aggregaƟon. Also possess detailed knowledge of MapReduce framework.

Used IDEs like Eclipse, IntelliJ IDE, PyCharm IDE, Notepad ++, and Visual Studio for development.

Seasoned pracƟce in Machine Learning algorithms and PredicƟve Modeling such as Linear Regression, LogisƟc Regression, Naïve Bayes, Decision Tree, Random Forest, KNN, Neural Networks, and K-means Clustering.

Ample knowledge of data architecture including data ingesƟon pipeline design, Hadoop/Spark architecture, data modeling, data mining, machine learning and advanced data processing.

Experience working with NoSQL databases like Cassandra and HBase and developed real-Ɵme read/write access to very large datasets via HBase.

Developed Spark ApplicaƟons that can handle data from various RDBMS (MySQL, Oracle Database) and Streaming sources.

Proﬁcient SQL experience in querying, data extracƟon/transformaƟons and developing queries for a wide range of applicaƟons.

Capable of processing large sets (Gigabytes) of structured, semi-structured or unstructured data.

Experience in analyzing data using HiveQL, Pig, HBase and custom MapReduce programs in Java 8.

Experience working with GitHub/Git 2.12 source and version control systems.

Strong in core Java concepts including Object-Oriented Design (OOD) and Java components like CollecƟons Framework, ExcepƟon handling, I/O system. Technical Skills

Hadoop/Big Data Technologies HDFS, Hive, Pig, Sqoop, Yarn, Spark, Spark SQL, Kafka Hadoop Distributions Horton works and Cloudera Hadoop Languages C, C++, Python, Scala, UNIX Shell Script, COBOL, SQL and PL/SQL Tools Teradata SQL Assistant, Pycharm, Autosys

Operating Systems Linux, Unix, ZOS and Windows

Databases Teradata, Oracle 9i/10g, DB2, SQL Server, MySQL 4.x/5.x Cloud Platform AWS EMR, EC2, Athena,GCP,Aure.

ETL Tools IBM InfoSphere Information Server V8, V8.5 & V9.1 Reporting Tableau

Professional Experience:

Role: Sr Data Engineer Oct 2024 to Till Date

Client: Centene, Tampa, FL.

ResponsibiliƟes:

Primarily involved in Data MigraƟon using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.

Worked on AWS Data pipeline to conﬁgure data loads from S3 to into RedshiŌ.

Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.

Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR

Worked on developing Pyspark script to encrypƟng the raw data by using Hashing algorithms concepts on client speciﬁed columns.

Responsible for Design, Development, and tesƟng of the database and Developed Stored Procedures, Views, and Triggers

Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.

Compiling and validaƟng data from all departments and PresenƟng to Director OperaƟon.

KPI calculator Sheet and maintain that sheet within SharePoint.

Created Tableau reports with complex calculaƟons and worked on Ad-hoc reporƟng using PowerBI.

CreaƟng data model that correlates all the metrics and gives a valuable output.

Worked on the tuning of SQL Queries to bring down run Ɵme by working on Indexes and ExecuƟon Plan.

Performing ETL tesƟng acƟviƟes like running the Jobs, ExtracƟng the data using necessary queries from database transform, and upload into the Data warehouse servers.

Pre-processing using Hive and Pig.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combinaƟon of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake AnalyƟcs.

Data IngesƟon to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Implemented Copy acƟvity, Custom Azure Data Factory Pipeline AcƟviƟes

Architect & implement medium to large scale BI soluƟons on Azure using Azure Data Plaƞorm services (Azure Data Lake, Data Factory, Data Lake AnalyƟcs, Stream AnalyƟcs, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

MigraƟon of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored

(ADLS) using Azure Data Factory (ADF V1/V2).

Developed a detailed project plan and helped manage the data conversion migraƟon from the legacy system to the target snowﬂake database.

Design, develop, and test dimensional data models using Star and Snowﬂake schema methodologies under the Kimball method.

Implement ad-hoc analysis soluƟons using Azure Data Lake AnalyƟcs/Store, HDInsight

Developed data pipeline using Spark, Hive, Pig, python, Impala, and HBase to ingest customer

Involved in converƟng Hive/SQL queries into Spark transformaƟons using Spark RDDs, Python and Scala.

Ensure deliverables (Daily, Weekly & Monthly MIS Reports) are prepared to saƟsfy the project requirements cost and schedule

Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and stored and dashboards.

Designed SSIS Packages to extract, transfer, load (ETL) exisƟng data into SQL Server from diﬀerent environments for the SSAS cubes (OLAP)

SQL Server reporƟng services (SSRS). Created & formaƩed Cross-Tab, CondiƟonal, Drill-down, Top N, Summary, Form, OLAP, Subreports, ad-hoc reports, parameterized reports, interacƟve reports & custom reports

Created acƟon ﬁlters, parameters and calculated sets for preparing dashboards and worksheets using PowerBI

Developed visualizaƟons and dashboards using PowerBI

Used ETL to implement the Slowly Changing TransformaƟon, to maintain Historically Data in Data warehouse.

Performing ETL tesƟng acƟviƟes like running the Jobs, ExtracƟng the data using necessary queries from database transform, and upload into the Data warehouse servers.

Created dashboards for analyzing POS data using Power BI Environment: MS SQL Server 2016, T-SQL, SQL Server IntegraƟon Services (SSIS), SQL Server ReporƟng Services (SSRS), SQL Server Analysis Services (SSAS), Management Studio (SSMS), Advance Excel

(creaƟng formulas, pivot tables, Hlookup, Vlookup, Macros), Spark, Python, ETL, Power BI, Tableau, Hive/Hadoop, Snowﬂakes, Power BI, AWS Data Pipeline, IBM Cognos 10.1, Data Stage, Cognos Report Studio 10.1, Cognos 8 & 10 BI, Cognos ConnecƟon, Cognos oﬃce ConnecƟon, Cognos 8.2/3/4, Data stage and Quality Stage 7.5

Role : Sr Data Engineer July 2018 to Mar 2022.

Client : Kroger, Imp: Sonata SoŌware, Bangalore.

Automated data processing workﬂows and improved ETL eﬃciency using Python.

UƟlized Google Cloud Plaƞorm (GCP) services like BigQuery and Cloud Storage for data storage, processing, and analyƟcs.

Implemented GCP services like Cloud FuncƟons and Pub/Sub to automate workﬂows and enable real-Ɵme data processing, enhancing overall system eﬃciency.

OpƟmized GCP resources by using eﬃcient storage classes, BigQuery parƟƟoning and clustering, while leveraging Cloud Monitoring and Logging for proacƟve issue resoluƟon.

Developed complex SQL queries to extract, transform, and load data from various relaƟonal databases for further analysis.

Managed and opƟmized MySQL databases to store large volumes of transacƟonal and claims data eﬃciently.

UƟlized Apache Hive for querying large datasets in Hadoop and performing data warehousing tasks.

Leveraged Apache Spark for fast and scalable data processing, reducing the Ɵme required for analyzing large datasets.

Automated and orchestrated data workﬂows with Apache Airﬂow, streamlining the scheduling and monitoring of ETL tasks.

Managed and opƟmized data storage, querying, and integraƟon processes within Snowﬂake data warehouse for business intelligence applicaƟons.

Used Apache Sqoop to transfer large datasets between relaƟonal databases and Hadoop, improving data migraƟon eﬃciency.

Applied version control with Git to track and manage code changes, enabling smooth collaboraƟon with team members on data pipeline development.

Hosted and managed project repositories on GitHub, facilitaƟng easy access, code reviews, and collaboraƟon within the data engineering team.

Implemented data transformaƟon techniques to clean, enrich, and prepare datasets for analysis and reporƟng in Snowﬂake.

Collaborated with cross-funcƟonal teams, including analysts and soŌware engineers, to ensure smooth data integraƟon and system interoperability across plaƞorms.

Followed the Agile SoŌware Development Life Cycle (SDLC) methodology to ensure systemaƟc Environment: GCP, BigQuery, Spark, Python, Snowflake, Hive, Airflow,SQL Sqoop, Git, GitHub Role : Data Engineer May 2016 to June 2018

Client : CGI, Hyderabad, India.

Imported required modules such as Keras and NumPy on Spark session, also created directories for data and output.

Read train and test data into the data directory as well as into Spark variables for easy access and proceeded to train the data based on a sample submission.

The images upon being displayed are represented as NumPy arrays, for easier data manipulaƟon all the images are stored as NumPy arrays.

Created a validaƟon set using Keras2DML in order to test whether the trained model was working as intended or not.

Deﬁned mulƟple helper funcƟons that are used while running the neural network in session. Also deﬁned placeholders and number of neurons in each layer.

Created neural networks computaƟonal graph aŌer deﬁning weights and biases.

Created a TensorFlow session which is used to run the neural network as well as validate the accuracy of the model on the validaƟon set.

AŌer execuƟng the program and achieving acceptable validaƟon accuracy a submission was created that is stored in the submission directory.

Executed mulƟple SparkSQL queries aŌer forming the Database to gather speciﬁc data corresponding to an image.

Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, TensorFlow, NumPy, Keras, PowerBI

Contact this candidate