Resume

Sign in

Data Software Engineer

Location:
Bentonville, AR
Posted:
February 19, 2020

Contact this candidate

Resume:

POOJA BADGUJAR

Phone: 551-***-**** Email: adbvi4@r.postjobfree.com

https://www.linkedin.com/in/pooja-badgujar-1b58a1166/

Professional Summary:

Over all 6+ years of professional experience as Hadoop Developer using Apache Spark Framework and also Oracle Database Administrator

Worked on python application to create data frames to save as table and submit in spark jobs.

Worked on Sqoop import and export from SQL Server, Oracle, DB2 and Teradata and configure to run in parallel in shell scripting.

Loaded large sets of structured, semi structured data from Swift Object Store to HDFS in Edge node with DISTCP command and created staging table.

Worked on POWER BI to create reporting graphs and cards for analysis for different years and months based on different KPI’s requirement.

Worked on TDCH connector to export table to Teradata and configuring the job in shell scripting.

Experience to create HQL scripts and python scripts to submit in pyspark jobs with optimize resources.

Created shell scripts to perform file transfer between Windows server and file server.

Extensively worked on error handling in shell scripting as well as in python depending on downstream jobs requirement.

Extensively worked on Shell Scripting to configure the spark jobs and Sqoop jobs from databases.

Experience of handling huge datasets in Spark Jobs and submit parallelly to run as small subsets of jobs with the help of partitioning.

Hands-on fundamental building blocks of Spark - RDDs and related manipulations for implementing business logics Like Transformations, Actions and Functions performed on RDD.

Depth understanding of Data-frames and Data-Sets in Spark SQL

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Designed good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance

Expert in performing business analytical scripts using Hive SQL.

Worked on IDEs such as Eclipse and IntelliJ for developing, deploying and debugging the applications.

Good knowledge on Data Warehousing, ETL development, Distributed Computing, and largescale data processing.

Experienced in work with different file formats like Text, Sequence, Xml and JSON.

Expertise in working with relational databases such as Oracle 10g, SQL Server 2012.

Collaborated with the infrastructure, network, database, application, to ensure data quality and availability.

Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation

Excellent Communication Skills, Ability to perform at a high level and meet deadlines.

Technical Skills:

Big Data: HDFS, Apache Spark, Spark SQL, Spark streaming, Zookeeper, Hive, Sqoop, HBase, Kafka, Flume, Yarn, Power BI.

Languages: Java, Scala, SQL/PLSQL, Shell Scripting, Python.

Database: MySQL, Mongo DB, Cassandra, Oracle 10g/11g, Microsoft SQL Server 2014

IDE / Testing Tools: Eclipse, IntelliJ IDEA,

Operating System: Windows, UNIX, Linux, MacOS

Tools: SQL Developer, Maven. Hue, TOAD, Snowflake

Professional Experience:

Project 1

Client: Walmart Home Office (Mar 2019 – Till Date)

Project: International Pricing

Position: Software Engineer

Responsibilities:

Involved in gathering business requirement for JDA application.

Worked on python application to create data frames to save as table and submit in spark jobs.

Worked on Sqoop import and export from SQL Server, Oracle, DB2 and Teradata and configure to run in parallel in shell scripting.

Loaded large sets of structured, semi structured data from Swift Object Store to HDFS in Edge node with DISTCP command and created staging table.

Worked on POWER BI to create reporting graphs and cards for analysis for different years and months based on different KPI’s requirement.

Worked on TDCH connector to export table to Teradata and configuring the job in shell scripting.

Experience to create HQL scripts and python scripts to submit in pyspark jobs with optimize resources.

Created shell scripts to perform file transfer between Windows server and file server.

Extensively worked on error handling in shell scripting as well as in python depending on downstream jobs requirement.

Extensively worked on Shell Scripting to configure the spark jobs and Sqoop jobs from databases.

Experience of handling huge datasets in Spark Jobs and submit parallelly to run as small subsets of jobs with the help of partitioning.

Performed transformations between many hive tables to create final feeds.

Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.

Written SQL queries for data analysis to meet the business requirements and perform validations on it.

Hands on experience on troubleshooting spark errors related to executors and java heap memory.

Worked on converting the data sets to Hive ORC format and partitioning the data on week number basis.

Hands on experience on creating pyspark multiple jobs and wrapping into one to optimize the cluster performance.

Worked on gathering requirement for business requirement and configuring the complete feed for JDA applications.

Experience on On Call support and responding the tickets within SLA.

Experienced on concord setup for various environments and configuring dependencies.

Involved in designing software solutions in different reporting and visualization tool like POWER BI that can scan database SQLSERVER which has connectivity to hive using JDBC/ODBC. This involves assess to business rules, collaborate with product owners and perform source-to-target data mapping, design and review.

Designing low level and high level design for various applications for our JDA which involves complete layout from different source and upstream jobs till our final downstream jobs. This also involves lot of analysis of data, gathering requirements, scheduling dependencies, error handling mechanism, documentations.

Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, JDK 1.8, SQL Server, Teradata, DB2, POWER BI, GitHub, Python, HQL, HIVE, Linux, Oracle.

Project 2

Client: Capital One (June 2018– Mar 2019)

Project: Floga (Loan Service)

Position: Hadoop and Spark Developer

Responsibilities:

Involved in requirement gathering to connect with business Analysis.

Used Arrow and Control - M workflow engine for managing and scheduling Hadoop Jobs. Modified and customized our upstream jobs schedule time and dependencies in our Floga (Downstream) jobs and dependencies to sync.

Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like S3 bucket for storage.

Involved in file movements between AWS S3 and Snowflake and extensively worked with S3 bucket in AWS.

Used Spark Jobs to transfer data between S3 bucket and Snowflake.

Performed transformations to load data between history table to production table.

Worked with business functional lead to review and finalize requirements and data profiling analysis.

Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.

Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Scala, Spark-SQL, Data Frame, and Pair RDD's.

Written SQL queries for data analysis to meet the business requirements and perform validations on it.

Perform D2A loader jobs to load data from History to testing Environment.

Experienced working with GitHub and extracting SQL script into D2A loader. Merge changes in GitHub repository and raised pull request.

Show data visualization and to generate reports for clear result.

Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, XML and JSON.

Implemented Data Integrity and Data Quality checks using Nebulla repository and created workaround for various data loading techniques in Test Environment.

Developed a Java Project using JDBC connection to connect to SQL Server to extract metadata of tables.

Worked on Java Project & used Maven build tool to achieve more functionality for build process.

Worked on Kafka Consul to modify and test some tstamp tables. Integrated Kafka with spark streaming APIs to process data and save it to HDFS.

Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.8, SQL Server, Teradata, Snowflake, EMR, S3, GitHub, Control-M & Arrow.

Project 3

Client: Tata Consultancy Services, India

Project: Target Corporation

Designation: Software Engineer (Jan 2016 - Sep 2016)

Position: Hadoop and Spark Developer

Team Size: 7

Responsibilities:

Worked with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Implemented complex Spark programs to perform Joins from Different tables

Developed custom aggregate functions using Spark SQL and performed interactive querying

Designed good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance

Created Hive external tables, views and scripts for transformations such as filtering, aggregation and partitioning tables.

Handled importing of data from various data sources, performed transformations using Hive, and loaded data into Teradata to HDFS

Expert in performing business analytical scripts using Hive SQL.

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

Worked with data in multiple file formats including Parquet, Sequence files and Text/ CSV.

Expertise in creating Thought-spot pin boards and Bringing all data for reports as per Business Requirements

Worked on hive for storing data in ORC file format and performed transformations using various functions. Used Spark SQL to run hive queries.

Involved in developing Thought spot reports and work flows automated to load data.

Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.

Participate in meetings with clients (internal and external), assist in framing projects and designing solutions based on client needs and problems to be solved

Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.7, Sqoop, Eclipse, MySQL, HBase, CentOS Linux and ZooKeeper

Project 4

Client: Rolta India Limited, India.

Designation: Software Engineer (Dec 2014 – Jan 2016)

Project: Government Project (MPSRDH)

Position: Hadoop and Spark Developer

Team Size: 8

Responsibilities:

Involved in Requirement Gathering to connect with BA.

Working Closely with BA & Client for creating technical Documents like High-Level Design and low-Level Design specifications.

Experienced on loading and transforming of large sets of structured data, semis structured data and unstructured data.

Imported data using Sqoop to load data from MySQL to HDFS on regular basis.

Developing RDDS to schedule various Hadoop Program.

Written SPARK SQL Queries for data analysis to meet the business requirements.

Experienced in defining job flows.

Cluster coordination services through Kafka and Zookeeper.

Serializing JSON data and storing the data into tables using Spark SQL.

Writing Shell scripts to automate the process flow.

Storing the extracted data into HDFS using Flume

Experienced in multiple file formats including XML, JSON, CSV and other compressed file formats

Experienced writing queries in Spark SQL using Scala

Communicated all issues and participated in weekly strategy meetings.

Collaborated with the infrastructure, network, database, application, to ensure data quality and availability.

Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.

Prepare daily and weekly project status report and share it with the client.

Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts

Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.7, Sqoop, Eclipse, MySQL, CentOS Linux.

Project 5

Client: EClinicalWorks Software Pvt ltd, India

Designation: Software Specialist (March 2013 - Dec 2014)

Project: Health care

Position: Oracle Database Administrator

Team Size: 8

Responsibilities:

Managing over 20 critical applications single.

Configuring Dataguard for OLTP databases.

Upgrading databases to 12C.

Work with Application team to performance related issues.

Rebuilding of Indexes for better performance, maintenance of Oracle Database.

Generated performance reports and Daily health checkup of the database using utilities like AWR, Statspack to gather performance statistics.

Identified and tuned poor SQL statements using EXPLAIN PLAN, SQL TRACE and TKPROF, analyzed tables, indexes for improving the performance of the Query.

Troubleshooting various issues like database connectivity to users, privileges issue.

Created users and allocated appropriate table space quotas with necessary privileges and roles for all databases.

Wrote script to monitor the database with shell and PL/SQL code or SQL code such as procedure, function and package.

Created or cloned the oracle Instance and databases on ASM. Performed database cloning and re-location activities.

Managed tablespaces, data files, redo logs, tables and its segments.

Maintained data integrity also managed profiles, resources and password security manage Users, privileges and roles.

Performed RMAN backups, restores, cloning, or refreshing databases and applications.

Monitoring and planning ASM storage in all databases.

Environment: Oracle 11g/12C, TOAD, Linux, UNIX, Putty, E-Manager, SQL SERVER, Windows Server, Web services, WebLogic

Education:

Bachelor of Engineering in Computer Science, 2013. ARMIET, Mumbai, India.

Masters of Science in Computer Science, 2018. Monroe College, New York

Certifications:

Oracle Database Certified SQL Expert

Oracle Database Certified Associate

Relevant Coursework:

• Object Oriented Software • Software System Design

• Research & Critical Analysis • Professional Ethics & Leadership

• Computer Architecture • Database Systems

• Hadoop & Spark Development • Project Management

• Scala Programming



Contact this candidate