Data Developer

Location:

Berwyn, IL, 60402

Salary:

Industry Standards

Posted:

March 13, 2017

Contact this candidate

Resume:

Manaswi Badam MASTERS IN COMPUTER APPLICATIONS

acy9ss@r.postjobfree.com 312-***-**** 403 S Harlem Ave, Berwyn, IL 60402

EDUCATION AND CERTIFICATIO N TECHNOLOGIES

MASTERS IN COMPUTER APPLICATIONS (M.C.A)

Jun 2007-Sep 2010

Operating System

Windows, DOS and Linux

OSMANIA UNIVERSITY,HYDERABAD,INDIA

Programming Language

C, C++, Java, Scala

BACHELORS OF COMPUTER SCIENCE (B.S.C)

Jun 2004-Apr 2007

Web Technologies

HTML, Jscript

OSMANIA UNIVERSITY,HYDERABAD,INDIA

Database

Oracle, MySql,Cassandra, Red shift and HBase

Tools and Technologies

Spark, Kafka, Eclipse, Intellij, sql workbench

AWS(S3,RDS,EC2,EMR), Hive, PIG & MR

EXPERI ENCE & NOTABLE CONTRIBUTION

Professional Summary

Strong experience in all major Big Data ecosystems such as Spark, Spark SQL & Streaming, Kafka, Cassandra, Hive, Sqoop .

Extensive experience in AWS EMR, EC2, S3, Red shift and IAM.

Hands on experience to Design, Configuring all major Hadoop distributions (Apache Hadoop, Cloud era, Horton works)

Hands-on experience in using Flume and Kafka to ingest data in to the cluster.

Good working knowledge on NoSQL Databases like HBase, Cassandra & Red Shift.

Hands-on experience on Map-Reduce programming model.

Experience in huge data processing with Hadoop technology by writing Pig scripts/UDFs and Hive queries.

Importing and exporting data using Sqoop to and fro from HDFS.

Hands on Experience on Java like oops concepts and Collections.

Self – Managed, Result-oriented and producing quality work on time.

Flexible to work in dynamic and challenging environment.

Very good functional Knowledge of Insurance Domain.

Have very good communication and Interpersonal skills.

Quick learner, easily adaptable to new environments, effective team player.

Strong in Analytical, programming and presentation skills.

Ability to work well in both team and individual environment.

Capability to adapt to new tools and applications.

CSC- Zurich • India • Jun 2014 – Nov 2016

Spark Developer

Data Ingestion: For injection data from Oracle, Mysql to S3 which can be queried using hive and spark SQL tables.

Worked on Sqoop jobs for ingesting data from MySql to Amazon S3

Created hive external tables for querying the data

Use Spark Data frame APIs to inject Oracle data to S3 and stored in Red shift.

Write a script to get RDBMS data to Red shift.

Analytics framework: Process the datasets and apply different transformation rules on the top of different datasets.

Process the complex/nested Json and Csv data using Data frame API.

Automatically scale-up the EMR Instances based on the data.

Apply Transformation rules on the top of Data Frames.

Run and schedule the Spark script in EMR Pipes.

Process Hive, csv, json, oracle data at a time (POC).

Validation framework: Validate and debug the script between source and destination.

Validate the source and final output data.

Test the data using Dataset API instead of RDD

Debug & test the process is reaching Client's expectations or not.

Optimizing Framework: Optimizing the datasets in different ways.

Query execution is trigger. Improve the process timing.

Based on new spark versions, applying different optimization transformation rules.

Debug the script to minimize the shuffling data.

Reporting Framework:

Analyze and report the data using Tableau.

Create dashboards in Tableau.

CSC- Estee Lauder – Web Intelligence • India • Apr 2012 – May 2014

DEVELOPER AND DESIGNER

Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and Map Reduce.

Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.

Used Pig to perform data validation on the data ingested using sqoop and flume and the cleansed data set is pushed into Hbase.

Participated in development/implementation of Cloud era Hadoop environment.

Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.

Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.

Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.

Involved in running Map Reduce jobs for processing millions of records.

Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs

Developed Hive queries and Pig scripts to analyze large datasets.

Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.

Involved in generating the adhoc reports using Pig and Hive queries.

Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.

Developed job flows in Oozie to automate the workflow for pig and hive jobs.

Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.

Environment: HDFS, Map-Reduce, Hive, Java, Pig, Sqoop, Flume, Zookeeper, Oozie, Oracle, Hbase.

IBM - Coating-substrate interaction Tool • India • Sep 2010 – Mar 2012

DEVELOPER AND DESIGNER

• Interacting with customer/end users.

• Understanding the existing architecture, process & developing code.

• Testing and Bug fixing.

Environment: core Java, Oracle and Eclipse

Seeking a full-time opportunity in software engineering field ~ Strong in design and integration problem solving skills

Contact this candidate