Data Engineer Migration

Location:

Novi, MI

Posted:

October 08, 2023

Contact this candidate

Resume:

Padmaja PNV Skype Id: *******.**********@*****.***

Sr. Big Data Engineer/Developer Current Location: Michegan

Email: *******.**********@*****.*** Phone: 248-***-****

Around 12+ years of experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.

Over 6+ years of comprehensive IT experience in BigData and BigData Analytics, Hadoop, HDFS, MapReduce, YARN, Hadoop Ecosystem and Shell Scripting.

6 years of development experience using Java,J2EE, Sping boot rest API

Highly capable for processing large sets of Structured, Semi-structured and Unstructured datasets and supporting Big Data applications.

Hands on experience with Hadoop Ecosystem components like Map Reduce (Processing), HDFS (Storage), YARN, Sqoop, Flume, Storm, Pig, Hive, HBase, Oozie, Kafka, ZooKeeper and Spark,Scala for data storage and analysis.

Very good hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics.

Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, and DB2 using Sqoop.

Experienced with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2 (non EMR)).

Experienced in using distributed computing architectures such as Hadoop, Python, Spark and effective use of map-reduce, SQL and Cassandra to solve big data type problems.

Experience in NoSQL databases like Mongo DB, HBase and Cassandra.

Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala. Solr, Git, Maven, AVRO, JSON and CHEF.

Experience in Apache Spark cluster and streams processing using Spark Streaming.

Expertise in moving large amounts of log, streaming event data

Good experience in handling data manipulation using python Scripts and experience in developing Python scripts for system management.

Experience in developing MapReduce jobs in Java for data cleaning and preprocessing.

Expertise in writing Pig Latin, Hive Scripts and extended their functionality using User Defined Functions (UDF's).

Expertise in handling arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.

Hands on experience in developing workflows execute MapReduce, Sqoop, Pig, Hive and Shell Scripts using Oozie.

Experience working with Cloudera Hue Interface and Impala.

Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.

Have Experience with different File Formats like Text File, Avro File and Parquet for Hive querying and Processing.

Used Avro, Parquet and ORC data formats to store into HDFS.

Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.

Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, JQuery, XML and HTML.

Fluent with the core Java concepts like I/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.

Extensive experience in Java and J2EE technologies like Servlets, JSP, JSF, JDBC, JavaScript, spring, hibernate, and Junit testing.

Performed Unit Testing using Junit Testing Framework and Log4J to monitor the error logs.

Experience in process Improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.

Converting requirement specification, Source system understanding into Conceptual, Logical and Physical Data Model, Data flow (DFD).

Expertise in working with Transactional Databases like Oracle, SQL server, My SQL, and Db2.

building data pipelines and takes a deep dive into how these engines are used for exploring and preparing data, building pipelines for batch processing and streaming data, orchestrating data pipelines, and delivering data sets to Machine Learning or Advanced Analytics applications.

Large-scale migration of application workloads that must be completed quickly will likely use a rehost (lift-and-shift) cloud migration strategy, where applications are migrated to the cloud and re-architected

Manage end-to-end complex data migration, conversion and data modeling

Participate in quality management reviews as outlined in the Verification and Validation Overview

Identifying the data migration impact of all proposed changes

Owning the quality of the migrated data and the validity of the migration processes and operation.

Perform source system data analysis in order to manage source to target data mapping .

Worked in agile framework as an individual contributor Responsibilities include - Interaction with Business team in story grooming, reviewing story/acceptance criteria.

Converted a monolithic app to microservices architecture using Spring Boot using 12-factor app methodology. Deployed, Scaled, Configured, wrote manifest file for various Microservices in PCF.

Implemented REST Microservices using spring boot. Generated Metrics with method level granularity and Persistence using Spring AOP and Spring Actuator.

Integrated Swagger UI and wrote integration test along with REST document.

Used spring config server for centralized configuration and Splunk for centralized logging. Used Concourse and Jenkins for Microservices deployment

Developed story/task by following TDD/BDD and Pair programming concepts. Providing daily status in scrum meetings with the client. Mentored new team members on effective usage of blend of Spring Boot /JPA and Java

Used Java8 features like stream and Lambda expressions.

Maintained Interface compatibility and concurrency in the project using Java 8 new features like default, static methods and Concurrency API.

Used Java 8 Method References feature to point to methods by their names and used functional Interfaces.

TECHNICAL SKILLS:

Big Data Ecosystem

Hadoop, Big Data, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, oozie, Impala, Cassandra, MongoDB, Kafka, Spark

Languages

Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, Solr and Horton works.

Databases

SQL/NOSQL, MySQL, Teradata, MS SQL, Oracle, HBase, Cassandra, MongoDB, Neo4j

Programming Languages

C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services.

IDE & ETL Tools

Eclipse, NetBeans, Intellij, Maven, Jenkins

Other Tools

Putty, WinSCP, Stream Weaver, Amazon AWS, Hortonworks, Cloudera, Azure.

Version Control

GitHub, SVN, CVS

Methodologies

Agile, Scrum, Waterfall

Operating Systems

UNIX, Windows, iOS, LINUX

EDUCATION

Masters in Information Systems from Nagarjuna University, A.P in 2004 with 71%.

BCA (computer Applications) from Nagarjuna University, A.P in 2002 with 60%.

WORK EXPERIENCE:

BCBSM, MI July2021 –Current

Sr. Big Data Developer

Responsibilities:

Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.

Responsible for importing log files from various sources into HDFS using Flume and Worked on tools Flume, Storm and Spark.

Developed a PySpark code for saving data in to AVRO and Parquet format and building hive tables on top of them.

Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.

Experience in extracting appropriate features from data sets in order to handle bad, null, partial records using Spark SQL.

Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.

Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.

Our task is to create a data pipeline which will regularly upload the files to HDFS, then process the file data and load it into Hive using Spark.

Data cleaning and Data validation.

Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.

Create Pyspark frame to bring data from DB2 to Amazon S3.

Provide guidance to development team working on PySpark as ETL platform

Makes sure that quality standards are defined and met.

Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing

develop code locally and share their work using Docker containers.

use Docker to push their applications into a test environment and execute automated and manual tests.

When testing is complete, getting the fix to the customer is as simple as pushing the updated image to the production environment.

Automated the cloud deployments using python and AWS Cloud Formation Templates.

Involved in analyzing and Optimizing RDD's by controlling partitions for the given data and expert in performing business analytical scripts using Hive SQL.

designing, implementing and testing cloud computing solutions using Snowflake technology.

Using datalake,Acquiring and storing of all content from disparate sources, making it available for search, and providing a 360-degree view of your organization-wide data.

With Azure Data Lake your organization can analyze all of its data in a single place with no artificial constraints. Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores.

Delta Lake extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling

combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes

Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs

Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace.

Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, pyspark, AWS EMR, AWS S3, JDBC, Redshift,HBase, Azure, Azure delta lake and Databricks, Snowflake, Java, jdk 1.8

Ford Dearborn MI March2020 –July 2021

Sr. Big Data Developer

Responsibilities:

Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.

Utilize Azure services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.

Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.

Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.

Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques.

Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.

Worked on machine learning on large size data using Spark and MapReduce.

Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.

Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.

Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.

Stored and retrieved data from data-warehouses using Amazon Redshift.

Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.

Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data.

Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Extract Transform and Load data from Sources Systems to Azure Data Storage services

Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, pyspark, Hadoop, Cloudera, Java, jdk 1.8, JDBC, Sqoop, MYSQL, Cassandra, HBase,

Dell, Austin Texas Feb 2019 to Dec 2019

Sr. Big Data Developer

Responsibilities:

Implemented solutions for ingesting data from various sources and processing the Data-at-Rest

Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.

Developed Data pipeline using Spark streaming application to pull data from cloud to hive table and used Spark SQL to process the huge amount of structured data.

Wrote programs in Scala using Spark and worked on migrating MapReduce programs into Spark using Scala

Automated the cloud deployments using chef, python and AWS Cloud Formation Templates.

Involved in analyzing and Optimizing RDD's by controlling partitions for the given data and expert in performing business analytical scripts using Hive SQL.

Using datalake,Acquiring and storing of all content from disparate sources, making it available for search, and providing a 360-degree view of your organization-wide data.

Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, Azure, Azure datalake, Kibana, AWS EMR, AWS S3, JDBC, Redshift, NOSQL, Sqoop, MYSQL, Cassandra, MongoDB, HBase.

Honeywell, Bangalore,india

Sr. Big data Developer Feb 2018 to Feb 2019

Responsibilities:

Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.

Involved in complete Big Data flow of the application starting from data ingestion upstream to HDFS, processing the data in HDFS and analyzing the data and involved Low level design for MR, Hive, Impala, Shell scripts to process data.

Handling Hive queries using Spark SQL that integrate with Spark environment implemented in Scala.

Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.

Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse

Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.

Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters and involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.

Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and generating views on the data source using Shell Scripting and Python.

Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using Python.

Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.

Configured the Message Driven Beans (MDB) for messaging to different clients and agents who are registered with the system.

Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shellscripts (for scheduling of few jobs) extracted and loaded data into DataLake environment (AmazonS3) by using Sqoop which was accessed by business users and data scientists.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, VPC subnets and CloudWatch.

Using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer) and implemented partitioning, dynamic partitions and buckets in HIVE.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java (jdk1.8), Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, ETL, Sqoop, Python, kafka, AWS, S3, MongoDB, Oracle, SQL, Hortonworks, XML.

Volvo/Capgemini, Bangalore,india Oct 2016 –Nov 2017

Java, Bigdata Developer

Responsibilities:

Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.

Developed Oozie workflow's for executing Sqoop and Hive actions and worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.

Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.

Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.

Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.

Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java (jdk1.8), Cloudera, Oracle, Python, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, AWS, S3, EC2, Kafka, Oracle..

Wyndham/Cognizant, Bangalore India

Orlando FL Oct 2014 to Oct 2016

Big data Developer

Responsibilities:

Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data.

Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata and responsible for creation of mapping document from source fields to destination fields mapping.

Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.

Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.

Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.

Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.

Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java (jdk1.8), Cloudera, Oracle,Python, UNIX Shell Scripting, ETL, Flume, Scala, Spark, Sqoop, AWS, S3, EC2, Kafka, Oracle, MySQL, Hortonworks, YARN, Python.

Hewlett-Packard Bangalore india Aug 2011 to Sep 2014

Sr.java Developer

Responsibilities:

Created use cases for the case create service utilization from various systems.

Provided XML and JSON response format to support various service clients.

Jackson processor for JSON data binding and JAXB for XML data binding are utilized.

Designed and developed Customer Event API with all the CRUD capabilities.

Mongo template is used to establish communication with the MongoDB collection.

Published API's for application services, generate CSV formatted data reports.

Developed java RESTfull webservices to upload data from local to Amazon S3, listing S3 objects and file manipulation operations.

Experience in developing Microservices using Spring Boot, and followed domain driven design.

IBM, India March 2007 to July 2011

Java Developer

Responsibilities

Involved in design, development and building the travel network file system to be stored in NAS drives.

Extensively worked with Hibernate Query Language (HQL) to store and retrieve the data from Oracle database.

Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, Rest Web Services, SOAP.

Provide support in all phases of Software development life cycle (SDLC), quality management systems and project life cycle processes. Utilizing Database Such as MYSQL, Following HTTP and WSDL Standards to Design the REST/ SOAP Based Web API’S using XML, JSON, HTML, and DOM Technologies.

Contact this candidate