Resume

Data Engineer Software Development

Location:

Mount Juliet, TN

Salary:

70/hour

Posted:

January 26, 2024

Contact this candidate

Resume:

SAIKIRAN

Email: ad24f9@r.postjobfree.com PH: 615-***-****

Sr. Data Engineer

Professional Summary

8+ years' experience in Software Development Life Cycle (SDLC) and Software Engineering including Requirement Gathering, Analyzing, Designing, Implementing, Testing, Support and Maintenance.

Well versed with Big Data on AWS cloud services i.e. EC2, S3, Glue, Athena, DynamoDB and RedShift.

Extensively used PySpark to build scalable data pipelines for reporting.

Migrating the coding from Hive to PySpark using RDD, Data frame and Dataset.

3+ years of experience in writing python as ETL framework and Pyspark to process huge amount of data daily.

Extensively used SQL, Numpy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.

Experience with Hadoop Ecosystem including HDFS, Spark, Map Reduce, PIG, HIVE and HBase.

Experience in importing and exporting data from different RDBMS like MySql, Oracle and SQL Server into HDFS and Hive using Sqoop.

Developed the spark code for AWS Glue jobs and for EMR.

Good understanding on Cloud Based technologies such as GCP, AWS.

Proficient with Apache Spark ecosystem such as Spark, Spark Streaming using Scala and Python.

Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions (SCD).

Experience in developing custom Map-Reduce programs using to perform Data Transformation and analysis.

Strong competency in HIVE Schema design, Data imports and Analysis.

Spark for ETL follower, Data bricks Enthusiast, Cloud Adoption & Data Engineering enthusiast in Open-source community.

Good working knowledge on Snowflake and Teradata databases.

Experience on converting Hive queries into Spark transformations using Spark RDDs and Scala.

Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populate AWS Glue data Catalog with metadata table definitions.

Hands-on experience in writing Pig Latin scripts, Pig UDF's and Hive UDF's

Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.

Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.

Have hands on experience on various DB platforms like Oracle, MySQL and MS SQL Server.

Experience in object-oriented analysis and design (OOAD), unified modeling language (UML) and Agile Methodologies.

Professional Experience

Sr. Data Engineer

Emc Insurance, Iowa May 2022 to Present

Responsibilities:

Worked on migrating the on-premises Big Data project to AWS cloud.

Design, develop, implement, test, document, and operate large-scale, high-volume, high-performance big data structures for business intelligence analytics.

Created ETL Framework using spark on AWS EMR in Scala/Python.

Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.

Parsed and transformed complex daily data feeds for multi-destination delivery Provide analytics support to leadership team by proactively asking questions and rigorously analyzing some of the most important issues impacting the future of the Consumer business.

Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.

Implement one time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

Experience in developing Spark applications using Spark-SQL in Data bricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the data to uncover insights into the customer usage patterns.

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.

Design and Develop ETL Process in AWS Glue to migrate Campaign data from external sources.

Experienced in writing Spark Applications in Scala and Python.

Written Programs in Spark using Scala for Data quality check.

Develop Cloud Functions in Python to process JSON files from source and load the files to BigQuery.

Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark.

Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups, EBS, Snowflake, IAM, CloudFormation, Route 53 and CloudWatch.

Very capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs.

Used Scala to convert SQL queries into RDD transformations in Apache Spark.

Used Kafka consumer's API in Scala for consuming data from Kafka topics.

Design and implement ETL configuration, work with complex huge datasets, enhance query performance, migrating jobs to HDFS to improve scaling and performance are some of my other job responsibilities

Developed a data Quality framework to automate the quality checks at different layer of data transformation.

Build a frame to extract the data file from mainframe system and performance ETL using shell script and Pyspark.

Developed deployment architecture and scripts for automated system deployment in Jenkins.

Environment: Spark, PySpark, Spark SQL, GCP, Python, Cloud, AWS, Glue, HDFS, Hive, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Java, Jenkins, Git, Oozie, MySQL, Soap, NIFI, Cassandra and Agile Methodologies

Data Engineer

Cummins Columbus, Indiana September 2020 to April 2022

Responsibilities:

Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.

Created multi-node Hadoop and Spark clusters in AWS instances to generate Terabytes of data and stored it in AWS HDFS.

Data extraction, aggregations and consolidation of adobe data within AWS Glue using Pyspark.

Migrated an existing on-premises application to AWS.

Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster

Used AWS services like EC2 and S3 for small data sets.

Develop and deploy the outcome using Spark and Scala code in Hadoop cluster running on GCP.

Expertise in snowflake to create and Maintain Tables and views.

Used Spark SQL with Scala for creating data frames and performed transformations on data frames.

Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and BigQuery tables.

AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda.

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

During Migration upgraded the code with new spark and python versions.

Developed a POC for project migration from on premise Hadoop MapR system to GCP.

Performed data ingestion from multiple RDBMS, vendor files using Qlik and stored in ADLS gen2 (blob containers).

Developed Spark applications using Scala for easy Hadoop transitions.

After the transformation data is stored in data marts and used for reporting layer. Used Snowflake to store the transformed data which is consumed by data scientists, reporting layer etc.

Environment: Spark, Pyspark, Scala, Python, Cloud, GCP, AWS, Glue, HDFS, Hive, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Java, Jenkins, Git, Oozie, MySQL, Soap, AWS, NIFI

Big Data Engineer

Sams club, Bentonville, AR June 2018 to August 2020

Responsibilities:

Developed the ETL Data pipeline for data loading from centralized Data Lake/ AWS service S3 as a data source to PostgreSQL (RDBMS) using Spark.

Spearheading Big Data Project from end-to-end.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python.

Created Data Lake in Hadoop by extracting data from different sources.

Created an AWS Lambda function and configured it to receive events from your S3 bucket.

Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.

Implemented Feature Engineering on data for preparing data for ML algorithms.

Very capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs.

Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)

Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.

Incremental loading of data into HDFS using SQOOP

Strong experience in working with ELASTIC MAP REDUCE (EMR) and setting up environments on Amazon AWS EC2 instances.

Tuning of Hive queries.

Used Snowflake time travel feature to access historical data.

Created shell scripts for Modules Automation

Assigned task to team using Jira; Tracked Task progress across the team

Created TDD (Technical Design Document)

Spearheaded Big Data Project from end-to-end

Creating spark jobs in Python for ETL & Analyzing data

Loaded of XML, JSON, CSV. Parquet files using Pyspark jobs & Spark-Scala

Loaded complex XML files using jaxb libraries

Analyzed real time data using spark streaming

Created Unix shell scripts to call the spark jobs

Environment: Hadoop, PySpark, AWS, Map Reduce, Cloud, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Python, GitHub, Big Data Integration, Impala.

Big Data Developer

Brio Technologies Private Limited Hyd India January 2017 to March 2018

Responsibilities:

Experience in importing and exporting data from different RDBMS like MySQL, Oracle and SQL Server into HDFS and Hive using Sqoop.

Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on PySpark code.

Monitored all Map Reduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.

Debugging the Name Node logs and Resource Manger logs if the cluster is down and jobs are failing.

Experience in Rebalance an HDFS Cluster.

Worked in AWS environment for development and deployment of custom Hadoop applications.

Experience in writing the shell scripting to perform an Audit on the hive tables.

Extracting the data from the HIVE tables for data analysis.

Developed a Map Reduce code for data cleaning and transformation.

Developed Spark applications using Scala for easy Hadoop transitions.

Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary.

Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.

Involved in a comprehensive proof of concept entailing all aspects of development using Spark.

Written Programs in Spark using Scala for Data quality check.

Proficient in programming with Resilient Distributed Datasets (RDDs).

Experience in tuning and debugging Spark application running on both standalone and YARN cluster mode.

Responsible for troubleshooting issues in the execution of Map Reduce jobs by inspecting and reviewing log files

Environment: Hadoop, AWS, HDFS, Hive, Map Reduce, AWS EC2, Sqoop, Kafka, Spark, Python, PySpark, Yarn, Pig, Oozie, Scala, Cloudera, agile methodologies

Application Developer

Dhruvsoft Services Private Limited Hyderabad, Telangana August 2015 to December 2016

Responsibilities:

Prepare Functional Requirement Specification and done coding, bug fixing and support.

Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for the project.

Developed SSIS pipelines to automate ETL activities and migrate SQL server data to SQL database.

Built SSIS packages and scheduled jobs to migrate data from disparate sources into SQL server and vice versa.

Created SSIS packages with which data from different resources were loaded daily to create and maintain a centralized data warehouse. Made the package dynamic so it fit the environment.

Developed schemas for business applications with providing full-life cycle architectural guidance and ensured quality technical deliverables.

Developed data profiling, mugging and missing value imputation scripts in Python on raw data as a part of understanding the data and its structuring.

Manage team of analysts responsible for executing business reporting functions, performing analysis and drive operational metrics.

Formulated the strategy, development, and implementation of executive and line of business dashboards through SSRS.

Managed internal Sprints, release schedules, and milestones through JIRA. Functioned as the primary point of contact for the client Business Analysts, Directors and Data Engineers for project communications.

Involved in design, development and Modification of PL/SQL stored procedures, functions, packages, and triggers to implement business rules into the application.

Developed ETL processes to load data from Flat files, SQL Server, and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.

Have good Informatica ETL development experience in an offshore and onsite model and involved in ETL Code reviews and testing ETL processes.

Environment: MSBI, SSIS, SSRS, SSAS, Informatica, ETL, PL/SQL, SQL Server 2000, Ant, CVS, PL/SQL, Hibernate, Eclipse, Linux

Contact this candidate