Data Engineer

Location:

Jacksonville, FL

Posted:

January 20, 2021

Contact this candidate

Resume:

BIG DATA ENGINEER

Name - Sairam

Contact: 469-***-****

************@*****.***

PROFESSIONAL SUMMARY

Having 5+ Years of IT Experience as Big Data Engineer who undertakes complex assignments, meets tight deadlines and delivers superior performance, practical knowledge in data Analytics and Optimization Applies string analytical skills to inform senior management of key trends identified in the data.

Experienced working on Hadoop Framework and its ecosystem like HDFS, MapReduce, Yarn, Spark, Hive, impala, Sqoop and Oozie.

Experience in data ingestion using Spark, Sqoop and Kafka.

Experienced in Spark Programming with Scala and Python.

Experienced in working on NoSQL databases like Cassandra and HBase.

Tuning, Monitoring Hadoop jobs and clusters in a production environment.

Implemented AWS Cloud platform and its features which includes EC2, VPC, EBS, AMI, SNS, RDS, EBS, Cloud Watch, Cloud Trail, Cloud Formation AWS Config, Autos calling, Cloud Front, IAM, S3, R53.

2+ years of experience in Cloud platform (AWS).

2+ Years of Experience on working using Spark Technology.

Expertise on Spark streaming (Lambda Architecture), Spark SQL, Tuning and Debugging the Spark Cluster (MESOS).

Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling.

Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing and other services of the AWS family.

Provisioned the highly available EC2 Instances using Terraform and cloud formation and wrote new plugins to support new functionality in Terraform.

Experience with multiple Hadoop file formats like Avro, Parquet, ORC, and JSON etc.

Selecting appropriate AWS services to design and deploy an application based on given requirements. Implementing cost control strategies.

Setup/Managing CDN on Amazon Cloud Front to improve site performance.

Expertise on working with MongoDB, Apache Cassandra.

Solid programming knowledge on Scala, Python.

Experience in working with Teradata and making the data to be batch processing using distributed computing.

Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and also involved in extracting the data from these tools on to the cluster using Sqoop.

Developed Oozie workflow schedulers to run multiple Hive and Pig jobs that run independently with time and data availability.

Experience on handling cluster when it is in Safe mode.

Getting in touch with the Junior developers and keeping them updated with the present cutting-Edge technologies like Hadoop, Spark, Spark SQL.

All the projects which I have worked for are Open Source Projects and has been tracked using JIRA.

Experience on agile methodologies Scrum.

AWS CERTIFIED:

TECHNICAL SKILLS:

Big Dig Data/Hadoop Technologies: HDFS, MapReduce, Yarn, Spark, NiFi, Hive, HBase, Zookeeper, Oozie, Pig, Sqoop, Flume, Apache Avro, Storm, Kafka.

Programming Languages: Python, Scala, Java.

Reports: Jasper Frame works Struts, Spring, Hibernate.

Server Technologies: Web Logic, Web Sphere, Apache Tomcat.

Operating Systems: Windows, Unix, Linux.

Databases: Oracle 10g, MySQL, SQL Server, DB2.

Cloud Services: AWS (EC2, S3, CloudWatch, RDS, Elastic Cache, IAM).

CI Tools: Jenkins/Hudson, Bamboo.

Version control: TFS, GIT, IBM Clear Case, JIRA, SVN.

ETL TOOLS: Informatica, Ab Initio.

Professional Experience:

Role: Big Data Developer April ’19 - PRESENT

Client: BCBS, FL

Processed data into HDFS by developing solutions, analyzed the data using Spark, Hive and produce summary results from Hadoop to downstream systems.

Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

Developed Spark -Scala based Analytics and Reporting platform for the Ameren Customer Cross Analytics with daily incremental data upload.

Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.

Responsible for fetching real time data using Kafka and processing using Spark streaming with Scala.

Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.

Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.

Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Involved in creating Hive tables, and then applied HiveQL on those tables for data validation.

Moved the data from Hive tables into Mongo collections.

Used Zookeeper for various types of centralized configurations.

Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.

Managed and reviewed Hadoop log files.

Shared responsibility for administration of Hadoop and Hive.

Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.

Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.

Implemented Data Lake to consolidate data from multiple source databases such as Teradata using Hadoop stack technologies SQOOP, HIVE/HQL.

Good Understanding Kafka Architecture and designing consumer and producer Applications.

Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.

Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into HDFS system through Sqoop.

Implemented centralized security management system to secure the data in Hadoop cluster and restrict/allow user level access to the HDFS folders and Hive tables/columns using Ranger Key management service policies.

Strong knowledge of Hadoop in-memory technology stack including Apache Spark and real-time data streaming.

Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.

Environment: Hadoop, Hive, Hbase, Linux, Scala, Hive, Unix shell scripting, Sqoop, Oozie, Spark, Scala, NiFi, Kafka, Hortonworks 2.2, AB INITIO.

Role-Big Data Developer JAN ‘17 - Mar’19

Client-Wells Fargo, NC

Provider Affinity: Customers and Users data is collected to derive the affinity of the member towards a doctor. Data is collected from various different sources and business rules are applied to find out the affinity and then reports are generated which are used by Business.

Responsibilities:

Designed and Developed ETL data pipelines using Scala and Spark.

Automated workflows using Oozie job scheduling tool.

Experienced in creating Hive schema, external tables and managing views.

Worked on both External and Managed HIVE tables for optimized performance.

Responsible for loading from disparate data sets and Pre-processing data using Hive.

And maintaining the application and performing actions if the application is facing any issues.

Implemented ingestion pipelines to migrate ETL to Hadoop using Spark Streaming and Oozie workflow.

Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.

Responsible for using Spark and Hive for data transformation. And intensive use of Spark Sql to perform analysis of vast data stores and uncover insights.

Experience in handling different file formats like Parquet, Sequence file, JSON, Text and XML.

Responsible for extracting data loads from Teradata into Hadoop environment and create Hive tables.

Involved in converting Hive/SQL queries into Spark Transformations using Spark RDD’S and PySpark.

Developed Hadoop scheduler jobs using Oozie coordinator.

Importing and exporting data into HDFS using Sqoop, which included incremental loading.

Created partitioned tables in Hive for best performance and faster querying.

Worked on optimizing and tuning Spark jobs to achieve optimal performance.

Automated script for creating Hive tables, loading the data using Hive queries.

Environment: HDP, Spark, HDFS, Hive, Flat files, Teradata, Sqoop and UNIX Shell Scripting.

Design Tech, Hyderabad, India Jan 2015 - Dec 2015

Hadoop Developer

Responsibilities:

Experienced in creating Hive schema, external tables and managing views.

Worked on both External and Managed HIVE tables for optimized performance.

Experienced on loading and transformation of large sets of structured and semi structured data.

Experience in handling different file formats like Parquet, Sequence file, JSON, Text and XML.

Working close together with QA and Operations teams to understand, design, and develop and end-to-end data flow requirements.

Utilizing Falcon and Oozie to schedule workflows.

Developing efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.

Optimized MapReduce code, pig scripts and performance tuning and analysis.

Storing, processing and analyzing huge datasets for getting valuable insights from them.

Experience in using Spark and Tez as Hive execution engines for faster query response.

Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.

Responsible to manage data coming from different sources.

Experienced shell scripting and Python.

Environment: Hortonworks, Hive, HDFS, Yarn, Shell Scripting.

Contact this candidate