Post Job Free

Resume

Sign in

Data Engineer Big

Location:
Louisville, KY
Salary:
140000
Posted:
December 08, 2023

Contact this candidate

Resume:

Anwesh Boddu ad1ta4@r.postjobfree.com

Cloud Data Engineer 732-***-****

Professional Summary:

Overall 8+ years of experience in all phases of Software Application requirement analysis, design, development, and maintenance of Hadoop/Big Data application like Spark, Kafka, Adls, Hive, Sqoop, Databricks, Snowflake and applications using python to be tailored with industry needs.

●Hands on experience with Spark Core, Spark SQL, Spark Streaming.

●Used Spark-SQL to perform transformations and actions on data residing in Hive.

●Used Kafka & Spark Streaming for real-time processing.

●Ability to spin up different AZURE ADLS GEN2, Datafactory, Virtual machines using terraform templates.

●Hands on experience in creating the databricks manages table and external tables in unity catalog.

●Hands on experience in creating datafactory pipeline to move the files from landing zone to raw zone.

●Hands on experience in snowflake and databricks on writing snowflake cursor connections and querying the data.

●Hands on experience with MongoDB.

●Hands on experience on Azure services like Databricks, Sql Data warehouse, Delta Lake, and synapse.

●Ensure data integrity and data security on Azure technology by implementing Azure best practices.

●Templated Azure infrastructure in code with Terraforms to build out staging and production environments.

●Having 3+ years of hands-on experience with Big Data Ecosystems including Hadoop and YARN, Spark, Kafka, DynamoDB, Redshift, SQS, SNS, Hive, Sqoop, Flume, Pig, Oozie, MapReduce, Zookeeper in a range of industries such as Financing sector and Health care.

●Good Knowledge in Teradata.

●Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.

●Good Knowledge in Apache Spark data processing to handle data from RDBMS and streaming sources with Spark streaming.

●Experience in developing HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDF's) for data specific processing.

●Extensive experience in using splunk to transfer log data files to Hadoop Distributed File System.

●Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.

●Tested various flume agents, data ingestion into HDFS, retrieving and validating snappy files.

●Have good skills in writing SPARK Jobs in python for processing large sets of structured, semi-structured and store them in HDFS.

●Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.

●Experience in NoSQL Column-Oriented Databases like HBase, MongoDB and its Integration with Hadoop cluster.

●Experience in writing Hive Queries for processing and analyzing large volumes of data.

●Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.

●Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.

Technical Skills:

Big Data/Hadoop : HDFS, Hive, Sqoop, Oozie, Flume, Impala

Zookeeper, Kafka, MapReduce, Cloudera.

Spark Components : Spark Core, SparkSQL, Spark Streaming, RDD,

Spark Data frame.

Programming Languages : SQL, Python.

Databases & NoSQL : MySQL, T-Sql, Hive-QL, Oracle.

Cloud : Azure ADLS, Data factory, Databricks,

Azure Sql server, Terraform, Prefect,

Snowflake, Microsoft Azure, Streamsets,

Jfrog artifactory.

Operating Systems : Windows, Unix, RedHat Linux.

Software methodologies : Agile.

Other tools : Toad, Putty, Dollar Universe, Visual studio code,

IntelliJ.

Professional Experience:

Humana

Cloud Data Engineer Aug 2018 – Present

Responsibilities:

Research, Design and develop computer software systems, applications which require use of advanced computational and quantitative methodologies.

Modify existing workflow to advance the workflow and to improve the performance and provide service in the setup of the new environment.

Processing the data and loading them into Azure Delta Lake and snowflake.

Using Azure Databricks to process json and csv files and loading them into on-prem Oracle tables.

Using Azure Data bricks to process raw files into parquet and load them into Native zone.

Helping the team with larger migration of data from on prem to cloud using Hadoop technologies and ETL tools like Streamsets.

Using python code to create a reusable function for file conversations.

Used databricks to do transformations, joins, filter, and some pre-aggregations before storing the data onto unity catalog.

Worked on three layers for storing data such as raw layer, intermediate layer and publish layer(Medallion architecture).

Optimizations techniques include partitioning, bucketing.

Automated xml parsing using Apache Spark and loading them Azure Sql table.

Creating data visualization using business intelligence tools like tableau and powerbi.

I am working with the Medicare and Medicaid data belongs to different state clients.

I am working with Apache Spark to parse the Xml data and create stage tables.

Using Stream sets to create a data pipeline which is used to migrate data in real time.

Creating IIB workflows to consume data from IBM MQ servers.

Using Streamsets to consume data from Kafka topic and loading in Oracle data base.

Using Streamsets to process the Json data and load them in Oracle stage tables.

Using Streamsets to stream the data in real time and doing data transformation in between.

Assigning work to offshore and near shore according to the priority.

Coordinates application installation or deploy across multiple environments.

Participated in evaluation and selection of new technologies to support system efficiency.

Participated in development and execution of system and disaster recovery processes.

Analyzing and modifying the data logs generated by different sources using Splunk.

As part of introducing new technologies to the existing ecosystem doing POC (proof of Concept).

Ensuring CI/CD continuous deployments of Application on Azure devops.

Using GIT code repos and creating GIT workflows for deploying the code.

Using Databricks bundle concept for creating databricks workflow from GIT.

Cisco– San Jose, CA

Hadoop Developer Jan 2018 – August 2018

●Interacting with multiple teams understanding their business requirements for designing flexible and common component.

●Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.

●Implemented Spark SQL to access hive tables into spark for faster processing of data.

●Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data.

●Data visualization for some data set by pyspark in jupyter notebook.

●Validating and visualizing the data in Tableau.

●Working with Offshore and onsite teams for Sync up.

●Using hive extensively to create a view for the feature data.

●Creating and maintaining automation jobs for different Data sets.

●Working with platform and Hadoop teams closely for the needs of the team.

●Using Kafka for Data ingestion for different data sets.

●Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.

●Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.

●Created and Validate the Hive views using HUE.

●Created deployment document and user manual to do validations for the dataset.

●Created Data Dictionary for Universal data sets.

●Created shell scripts and TES job scheduler for workflow execution to automate the loading into different data sets.

●Developed sqoop jobs to import the data from RDBMS and file servers into Hadoop.

●Working closely with EDNA and Platform teams for the team needs and platform issues.

Choice Software limited– Hyderabad, India

SQL Developer Nov 2012 – Aug 2015

●Worked as developer in Mediation team. Mediation deals Integration and Services.

●Analyzing the requirements and implementing changes via new processing mechanisms.

●Analyzing business spec documents and preparing entity relationship models that best suits the requirement.

●Built SQLscripts and complex queries(CASE,DECODE,ISNULL,CONVERSION functions) for validating the extracted data and generation of readiness boards.

●Worked on creating sessions and SQL scripts based on requirements.

●Worked on SQL loaders to load data from flat files into DB.

●Creation of Low-Level Design Documents.

●Review of code development in new Processing Streams.

●Review preparation and execution of test cases for Stream and Node-wise testing.

●Performed Unit testing and System testing.

●System Integration Testing (SIT) support.

●Test case sheet preparation.

●Create the objects like tables, synonyms, and sequences.

●Written SQL queries for DML Operations Performed DDL operations on tab.

Certifications:

DP-203: Data Engineering on Microsoft Azure

DEA-C01 - Snowpro advanced data engineer

Prefect Associate Certification

Education:

Bachelor of Technology in Computer Science and Engineering from CSJM University, India.

Master of Science in Computer Science from Silicon Valley University, CA, USA.

Master of Science in Information Technology from University of the Cumberland’s, Ky, USA



Contact this candidate