Resume

Praketa Saxena - Hadoop / Big Data

Location:

Chicago, IL

Salary:

Posted:

August 31, 2020

Contact this candidate

Resume:

PRAKETA SAXENA

adfp79@r.postjobfree.com

708-***-****

PROFESSIONAL SUMMARY

Hands-on, successful Big Data Engineer with 10 years of success, leading projects and delivering appropriate technology solutions cloud and on premises products. Comprehensive knowledge of platform development, enterprise architecture, agile methodologies, cloud services, and web-based applications. Innovative change agent with a unique mix of high-level technology direction and deep technical expertise.

AWS Certified Cloud Consultant, professional experience in DevOps, Python and Cloud services like EC2, S3, RDS, EBS, ELB, Auto scaling groups, Elastic Search, VPC, and IAM. Software Configuration and Build Management, process development and tools support including code compilation, packaging and deployment/release methodology.

EMPLOYMENT HISTORY

Dec. 2019 – Present

Chicago, Illinois

Data Engineer

Public Storage

Supporting and maintaining applications under AWS/UNIX/Linux /Windows platform, Redshift

Hands on Experience in various AWS Services such as Redshift Cluster, Route 53 Domain configuration.

Involved in designing Spring MVC application and for crawling and application servlets

Implemented Spring boot microservices to process the messages into the Kafka cluster setup.

Closely worked with the Kafka Admin team to set up Kafka cluster setup on the QA and Production environments.

Have knowledge on Kibana and Elastic search to identify the Kafka message failure scenarios.

Implemented to reprocess the failure messages in Kafka using offset id.

Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.

Develop Kafka Producer Testing, Consumer Testing, Schema testing code for data integrity and cluster validation

Used pring Kafka API calls to process the messages smoothly on Kafka Cluster setup.

Used Spark API to generate PairRDD using Java implementation of Spark

Creating partitions of Kafka messages and setting up the replication factors in Kafka Cluster.

Worked on Big Data Integration & Analytics based on Hadoop, Spark, Kafka, Storm and web Methods.

Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, PairRDD's, Spark YARN.

Implemented a Data Server distributed across the globe to achieve strong and eventual consistency for high data availability.

Used Java multi-threading for sharding and replication of data to scale a Data Store horizontally and improve system performance.

Created Airflow Scheduling scripts in Python

Develop DAG - data pipeline to on-board and change management of datasets

Experience with Apache Airflow, including installing, configuring and monitoring Airflow cluster

Understanding of airflow rest services and integration of airflow of platform eco-system

Orchestrating the Airflow / workflow in hybrid cloud environment

Developed guidelines for Airflow cluster and DAG's. Performance tuning of the DAG's and task implementation

Developed Data layer using Hibernate framework, configured XML mapping files, wrote POJO classes and PL/SQL stored procedures

Implemented module enabling different levels of logging in Google Stackdriver which reduced the time for root cause analysis.

Integrated Apigee gateway into Datalake which enabled its monetization, access control and traffic management.

Responsible to create CRUD methods using Hibernate & Spring framework

Wrote HQL to handle data from databases using Hibernate APIs such as createQuery.

Wrote jobs to move data from RDMBS to Hbase systems

Implemented reverse engineering in Hibernate to create beans w.r.t schema in database

Configured DispatcherServlet and ViewResolver to intercept incoming requests, manage Spring MVC flow and invoke view components with the help of DispatcherServlet

Wrote Rest-Controller w.r.t RESTful standards in Spring MVC to connect model with view

Worked upon React JS Service which interacts with RESTful services at backend

Collaborated for React JS Service which takes care of view part of the application

Made sure Routing in React JS Service application is appropriate which helps wiring views together

Implemented multi-threading to handle synchronization for users accessing modules

Tested Rest APIs in Spring-Controller at backend for JSON data using Postman

Developed various modules w.r.t design patterns such as Factory and Singleton

Used Maven to add dependencies required for the project

Worked on JUnit for unit testing of the application

Used Spring Boot at back-end which helps to develop application with ease

Handled importing of data from various data sources, performed transformations using Hive, Spark

loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.

Used GIT for version control and Eclipse IDE for development

Used JIRA to handle software development issues

Apr. 2018 – Dec. 2019

Birmingham, Alabama

Big Data Engineer

Shipt

Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Migrating various Hive UDF's and queries into Spark SQL for faster requests.

Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.

Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.

Experienced Scheduling jobs using Control-M.

Developed and implemented hive custom UDFs involving date functions.

Used sqoop to import data from Oracle to Hadoop.

Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.

Experienced in developing scripts for doing transformations using Scala.

Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS.

Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.

Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability

Developed Java Web Crawler to scrape market data for Internal products

Created a Breadth First Search Web Crawler.5 Modules: Downloader, Queue, Parser, Cache and DB.

Web Crawling concurrent bots with java ProcessBuilder, Jsoup and Selenium packages

Developed distributed Downloader process using httplib http client running on a server cluster.

Parse HTML/XML with Java HTML Parser lib.Cache frequent crawled URL link querying with Memcached.

Store crawled HTML/XML content into MySQL- database.It communicates with downloader and Parser with AMQP protocol.

Operating System Process Scheduling and Memory Management Simulation.

Designed and Developed web app BI for performance analytics

Designed python based notebooks for automated weekly, monthly, quarterly reporting E.T.L

Designed Java based email solution for automated emailing and publishing of internal reports

Developed a HTML-CSS-Javascript based visualization tool to delineate company's pricing strategy

Developed an ArcGis-Javsacript based map to understand pricing for cost of data transfer across global regions

Tool to compare to different AWS RDS pricing options and identify the best deployment strategy

Created Airflow Scheduling scripts in Python to automate data pipeline and data transfer

Develop DAG - data pipeline to on-board and change management of datasets

Experience with Apache Airflow, including installing, configuring and monitoring Airflow cluster

Understanding of airflow rest services and integration of airflow of platform eco-system

Orchestrating the Airflow / workflow in hybrid cloud environment from local on premise server to the cloud

Designed the backend database and AWS cloud infrastructure for maintaining company proprietary data

AWS Aurora RDS for backend storage of data Cloud Migration pricing Analysis

Shell FTP scripts for migrating data to Amazon S3

AWS lambda for triggering parallel cron jobs scheduler for scraping and transforming data

Dec. 2015 – Apr. 2018

New York, New York

Data Engineer III

Master Card

Designed architecture of data driven applications as part of the Product Development & Management team

Responsible for backend schema design of transactional data

Designed ETL pipelines via Hive MapReduce and Spark Scala

Designed backhand schema of tables and stored procedures for E.T.L pipelines in SQL

Developed Automated report generation scripts in Python

Built Python-Excel applications to generate large scale static reports

Experience on multi-threading, data structures, algorithms, object oriented design, and design patterns

Customer Segmentation Groups clustering for optimized audience targeting using K-Means and python

Designed and Developed Product Range Algorithms- S.K.U x Category Kendall Ranks to optimize merchant's product ranging and Inventory management using Hadoop framework

Designed and Developed Post Campaign Analytical Tool to model campaign performance parameters like uplift, incremental sales, and basket ROI measures using core Hadoop framework

Report to understand the evolution of customer shopping behavior with base SAS

Designed Test and Control Population selection for campaign design and evolution use base SAS, Statistical design of campaign population marketing

Developed Issuer Operations Insights Tool to provide issuer performance insights using Base SAS

Developed automated Acquirer Chargeback Modelling reports with SAS macros for mitigating fraudulent refunds and merchant liability protection

Developed Dynamic Benchmark Selection tool as SQL Server stored procedures for selecting competitor set for custom benchmark

Developed Fraud monitoring visualization for merchant adoption and optimized merchant transaction processing

Responsible for backend schema design and integration of Banking clients transactional data with proprietary MasterCard data

Apr. 2012 – Nov. 2015

Seattle, Washington

Data Engineer

Amazon

Developed Consumer Segments to optimize store inventory and campaign targeting in Scala

Designed and Developed Product Margin Reports, Inventory Profiling reports in Spark/Scala

Responsible to analyze, design, architect and architect performance reports with Hive and HDFS

Developed backend schemas for Databases to manage Q.S.R data in base Hive and custom UDF

Ongoing performance tuning Hive programs and backEnd data tables for efficient OLTP and OLAP queries

Researched, tested, built and coordinated integration of reporting infrastructure

Wrote XML, VBA macro based automated excel based reporting programs for weekly deliverables

Reviewed complex datasets and programmed ship code.

Prepared algorithm based web services by transformation of data.

Created program code for distributed computation in HBase.

Suggested and implemented opportunities in user behavioral aspects.

Designed, developed and implemented critical software services.

Provided technical guidance during development of Scala applications.

Formulated and executed coding and change control procedures.

Debugged existing software systems and conducted integration tests

Expert hand on knowledge of SAS/BASE, SAS/MACRO, SAS/STAT, SAS/ODS and SAS/SQL

Extensive programming experience with SAS datasets and procedures such as proc format, proc tabulate, proc report, proc transpose, proc print, proc SQL and SAS/STAT procedures such as proc Corr, Proc Reg, Proc Glm, Proc Anova, proc freq, proc Means, proc univariate etc.

Worked with the marketing team and analyzed marketing data using SAS tools to generate reports, tables, listing and graphs.

Modification of existing SAS programs and creation of new programs using SAS macros to improve ease and speed of modification as well as consistency of results.

Experience in developing SAS Macros and application for data updates, data cleansing and reporting.

Extracted data from Oracle database using Pass through Facility.

Extensively worked on PROC REPORT and PROC TABULATE to produce reports for the purpose of validation.

EDUCATION

Master of Science: Masters in Electrical and Computer Engineering

U.A.B Birmingham - Alabama U.S.A, Alabama

SKILLS

Expert

Python

Expert

SAS

Expert

Agile

Skillful

Microsoft Azure

Experienced

AWS

Expert

Apache Kafka

Expert

Apache Spark

Expert

Hadoop

Expert

SQL

Experienced

Software Development

Contact this candidate