Data Developer

Location:

McKinney, TX, 75070

Posted:

January 20, 2021

Contact this candidate

Resume:

SAISREE

***********@*****.***

903-***-****

* ***** ** ** **********

OBJECTIVE

Searching for the opportunity to bring 6 years of programming, technology, and engineering expertise in developing applications while incorporating critical thinking, problem solving, and leadership.

TECHNOLOGIES SKILL SET

Hadoop Core Services

HDFS, MapReduce, Hadoop YARN

Hadoop Data Services

Apache Hive, Pig, Sqoop, Spark

Hadoop Distributions

Hortonworks, Cloudera

Hadoop Operational Services

Apache Zookeeper, Airflow

Cloud Computing Services

AWS (Amazon Web Services), Amazon EC2

IDE Tools

Eclipse, NetBeans, Jira.

Programming Languages

Java, Python, C#

Operating Systems

Windows (XP,7,8,10), UNIX, LINUX, Ubuntu

Reporting Tools /ETL Tools

Powerview for Microsoft Excel, Tableau

Databases

Oracle, MySQL, DB2, Derby, Database (HBase, Cassandra)

Web Technologies

HTML5, JavaScript

Environmental Tools

SQL Developer, Win SCP, Putty, JIRA

Version Control Systems

Git, Tortoise, SVN

EXPERIENCE

DATA ENGINEER/84.51, CINCINNATI-OH

Feb 2020 - Till Date

Experienced in designing and developing highly effective ETL pipelines using multiple big data technologies.

In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.

Developed spark application by using python (pyspark) to transform data according to business rules.

Developed dataflows and processes for the Data processing using SQL (SparkSQL & Dataframes)

Very strong skills working with very large data sets with a variety of tools; experience in dealing with performance and scaling issues

Handled large datasets using optimized techniques in spark by implementing effective & efficient Joins, Transformations during ingestion process.

Experience migrating data from terrestrial platform to cloud-based technology (Google Cloud Platform)

Designed and developed ingestion framework to cleanse, validate, transform and load data on to RDBMS to cloud and vice versa.

Worked on different file formats such as Avro, Parquet, Json, Csv and used various compression techniques to leverage the storage in HDFS.

Responsible for the maintenance of secure PII, CCPA, PHC data and data transfer.

Involved in creating Hive tables, loading with data and then applied HiveQL on those tables for data validation.

Used Airflow scheduler system to automate the pipeline workflow.

Experience writing airflow workflows and Job Controllers for job automation.

Used Shell Scripting for automating the data validations and creating parameter files.

Identified critical failure points during production and created test cases to address failure points.

Involved in POC on GCP and AWS. Good Understanding of AWS services like EMR, RDS, S3, Athena and EC2 instances.

Experience in using GIT Version Control System.

Worked on complete Software Development Life Cycle using Agile and Kanban Methodologies to maintain the user stories and tasks.

Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Work closely with development, test, documentation and product management teams to deliver high quality products and services in a fast-paced environment

Experience in managing and reviewing Hadoop log files.

Worked on Grafana dashboard visualization tool.

Participated in meetings with stakeholders and product managers to obtain domain knowledge and expectations of product

Environment: Apache Hadoop, Spark, Hive, Map Reduce, Sqoop, Yarn, HDFS, Airflow, Python, GitHub, Grafana, GCP, AWS, Oracle, Linux.

HADOOP DEVELOPER/JPMORGAN CHASE, COLUMBUS-OH

Aug 2018 – Feb 2020

Evaluate, extract/transform data for analytical purpose within the context of Big data environment.

We have worked on many ETL’s creating aggregate tables by doing many transformations and actions like JOINS, sum of the amounts etc.zz

Remarkable experience in designing ETL processes and developing source to target mappings.

Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.

Experienced in performance tuning of Spark Applications for setting correct level of Parallelism and memory tuning.

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins.

Sourced Data from various sources into Hadoop Eco system using big data tools like sqoop.

Worked with Oracle and Teradata for data import/export operations from different data marts.

Involved in creating hive tables, loading with data and writing hive queries.

Worked extensively with data migration, data cleansing, data profiling.

Worked in tuning Hive to improve performance and solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.

Implemented partitioning, dynamic partitioning and bucketing in hive.

Model hive partitions extensively for data separation.

Expertise in hive queries, created user defined aggregated function worked on advanced optimization techniques.

Automation tools like Airflow was used for scheduling jobs.

Export the analyzed data to all the relational databases using sqoop for visualization and to generate reports.

Environment: Apache Hadoop, Hive, Map Reduce, SQOOP, Spark, Python, Cloudera Manager CM 5.1.1,HDFS, Airflow, Putty.

HADOOP DEVELOPER/WALMART, BENTONVILLE- AR

Jan 2017 - June 2018

Responsible for building scalable distributed data solutions using Hadoop.

Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, Amazon Elastic Load Balancing, and other services of the AWS family.

Managed data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data

Written multiple MapReduce programs in Java for Data Analysis.

Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.

Load data from various data sources into HDFS.

Kafka- Used for building real-time data pipelines between clusters.

Integrated Apache Kafka for data ingestion.

Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.

Experienced in migrating HiveQL to minimize query response time.

Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.

Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.

Experienced in handling different types of joins in Hive.

Responsible for performing extensive data validation using Hive.

Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.

Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.

Worked on different file formats and different Compression Codecs.

Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Airflow, Java, Linux, Maven, Teradata, Zookeeper,Git, autosys, Hbase,Java.

JAVA DEVELOPER/ PRATIAN TECHNOLOGIES

Aug 2013 - Nov 2015

Analysis, design and development of Application based on J2EE using Struts and Hibernate.

Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.

Developed Rich user interface using RIA, HTML, JSP, JSTL, JavaScript, jQuery, CSS, YUI, AUI using Liferay portal.

Worked on new Portal theme for the website using Liferay and customize for the look and feel.

Involved in developing the user interface using Struts tags, core java development involving concurrency/multi-threading, struts-hibernate integration, database operation tasks.

Implemented core java functionalities like collections, multi-threading, Exception handling.

Performed Code optimization and rewriting the database queries to resolve performance related issues in the application.

Involved in writing SQL, PL/SQL stored procedures using PL/SQL Developer.

Used Eclipse as IDE for application development.

Supported production deployments and validated the flow of the application after each deployment

Used PL/SQL for cresting triggers, packages, procedures and functions.

Contact this candidate