SANDEEP KUMAR ************.*******@*****.*** Mobile: 832-***-****
IT professional Around 8 years of experience with extensive knowledge and background in Software Development Lifecycle Analysis, Design, Development, Debugging and Deploying various software applications. More than 4+ years of hands on experience in Big Data and Hadoop Ecosystem in ingestion, storage, querying, processing and analysis using HDFS, MapReduce, Pig, Hive, Spark, Flume, Kafka, Oozie and Sqoop etc. 5+ years of work experience using JAVA/J2EE technologies and Oracle SQL Developer.
Professional Summary:
Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache Sqoop.
Good exposure to performance tuning Hive queries, Pig Scripts and SQOOP.
Worked on Multi Clustered environment and setting up Cloudera Hadoop echo System and Hortonworks Hadoop echo system.
Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
Used Sqoop to load data from DB2 to HBase for faster querying and performance optimization.
Used Sqoop incremental imports for ingesting data produced on daily basis by scheduling and monitoring the jobs using Autosys and Cron.
Have good knowledge on Spark and MapReduce Jobs.
Worked on analyzing and writing Hadoop MapReduce jobs using Java API, Pig and Hive. Responsible for building scalable distributed data solutions using Hadoop.
Tuned Pig and HIVE scripts by understanding the joins, group and aggregation between them. Extensively worked on HiveQL join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
Having good knowledge of Oracle Database and excellent in writing the SQL queries.
Experience in SQL Server Import/Export wizard to migrate the heterogeneous databases such as Oracle and MS Access database, excel, flat files to SQL server.
Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Flume.
Well versed with Talend Bigdata, Hadoop, Hive and used Talend Bigdata components like HDFS output, HDFS Input, Hive Load.
Utilized standard Python modules such as CSV and pickle for development.
Worked with data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MYSQL DB package to retrieve information.
Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
Experience in Creating ETL/Talend jobs both design and code to process data to target databases.
Worked with several Python libraries like NumPy, Pandas and MatplotLib.
Experience in Creating Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and write few Java code.
Hands on experience working on NoSQL databases like MongoDB, HBase, Cassandra and its integration with Hadoop cluster.
Experience working in Oracle, DB2, SQL Server and My SQL database.
Replication of tables to cross platform and Creating Materialized Views.
Good exposure in Software Development Life Cycle.
Supported team using Talend as ETL tool to transform and load the data from different databases.
Excellent communication and inter-personal skills, flexible and adaptive to new environments, self-motivated, team player, positive thinker and enjoy working in multicultural environment.
Analytical, organized and enthusiastic to work in a fast paced and team oriented environment. Expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.
Proactive in time management and problem-solving skills, self-motivated and good analytical skills.
Technical Skills:
Programming Languages
Java, C, Python, Shell Scripting
Big Data Technologies
HDFS, MapReduce, Hive, Pig, Hue, Impala, Sqoop, Apache Spark, Apache Kafka, Apache Ignite, Apache Nifi, OOZIE, FLUME, Zookeeper, YARN
No SQL Databases
MongoDB, HBase, Cassandra
Hadoop Distribution
Hortonworks, Cloudera, MapR
Databases
Oracle 10g, MySQL, MSSQL
IDE/Tools
Eclipse, NetBeans, Maven
Version control
GIT, SVN, CLEARCASE
Platforms
Windows, Unix, Linux
BI Tools
Tableau, MS Excel
Web/Server Application
Apache Tomcat, Web Logic, Web sphere, MSSQL Server, Oracle Server
Web Technologies
HTML, CSS, JavaScript, jQuery, JSP, Servlets, Ajax
Professional Experience:
Client: Comerica Bank - Dallas, TX February 2016 to Present
Role: Hadoop Developer
Responsibilities:
Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
Created ETL/Talend jobs both design and code to process data to target databases.
Reviewing and managing Hadoop log files by consolidating logs from multiple machines using Kafka.
Involved in data ingestion into HDFS using Sqoop for full load and Kafka for incremental load on variety of sources like web server, RDBMS and Data API's.
Implemented custom input format and record reader to read XML input efficiently using SAX parser.
Collected the logs data from web servers and integrated them into HDFS using Kafka.
Created Hive tables and loaded the data in to tables for querying using HQL.
Created custom user defined functions in Python language for Pig.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming.
Developed Spark scripts by using Java, and Scala shell commands as per the requirement.
Worked with Nifi for managing the flow of data from source to HDFS.
Worked on storing the dataframe into Hive as table using PySpark.
Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Managed Git repository, code merging, and production deployments.
Used Spark API over Hortonworks Hadoop YARN for performing transformations and analytics on Hive tables.
Experience with AWS S3 services creating buckets, configuring buckets with permissions, logging, versioning and tagging.
Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
Good experience with the bug tracking/management tools like JIRA.
Developed Pig Latin scripts to do operations of sorting, joining and filtering source data.
Involved in the process of Cassandra data modeling and building efficient distributed schema.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
Environment: Hadoop Hortonworks Distribution, MapReduce(Yarn), Python, Scala, Spark, Apache Nifi, Apache Ignite, ETL Tools, Talend, HDFS, Pig, Hive, Kafka, Cassandra, Eclipse, Sqoop, Splunk, Linux shell scripting, BI Tools(Tableau).
Client: TMG Health - Jessup, PA August 2014 to December 2015
Role: Hadoop Developer
Responsibilities:
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
Monitored multiple Hadoop clusters environments and monitored workload, job performance and capacity planning using Cloudera Manager.
Designed and Developed Sqoop scripts to extract data from a relational database into Hadoop.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
Developed Oozie workflow for scheduling Pig and Hive Scripts.
Load and transform large sets of structured, semi structured and unstructured data.
Configured the Hadoop Ecosystem components like YARN, Hive, Pig, HBase and Impala on Amazon EMR cluster
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
Involved in setting QA environment by implementing pig and Sqoop scripts.
Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
Worked on PySpark SQL for faster performance with SQL scripts by defining the number of executors and defining executor memory to execute the pipeline.
Implementation of Sub-Process Module in python to call UNIX shell commands to verify the file existence.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
Participated in functional reviews, test specifications and documentation review.
Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.
Environment: Cloudera Hadoop Distribution, HDFS, Talend, Map Reduce(JAVA), Python, PySpark, Impala, Pig, Sqoop, Flume, Hive, Oozie, HBase, Shell Scripting, Agile Methodologies.
Client: Vodafone - Hyderabad, India May 2013 to July 2014
Role: Hadoop Developer
Responsibilities:
Coordinated with business customers to gather business requirements. And interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
Extracted the data from Oracle into HDFS using Sqoop to store and generate reports for visualization purpose.
Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
Built a data flow pipeline using flume, Java (MapReduce) and Pig.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Worked on analyzing and writing Hadoop MapReduce jobs using Java API, Pig and Hive. Responsible for building scalable distributed data solutions using Hadoop.
Developed Hive scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
Extensive experience in writing Pig scripts to transform raw data into baseline data.
Developed UDFs in Java as and when necessary to use in Pig and HIVE queries.
Worked on Oozie workflow engine for job scheduling.
Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
Leveraged Solr API to search user interaction data for relevant matches.
Designed the Solr Schema, and used the Solr client API for storing, indexing, querying the schema fields
Loading the data to HBASE by using bulk load and HBASE API.
Environment: Hortonworks Hadoop Distribution, MapReduce, HBase, Hive, Pig, Sqoop, Oozie, Flume, Solr, Shell script.
Client: Meta Minds Inc - Hyderabad, Telangana January 2012 to April 2013
Role: Java/J2EE Developer
Responsibilities:
Analyzed project requirements for this product and involved in designing using UML infrastructure.
Interacting with the system analysts & business users for design & requirement clarification.
Extensive use of HTML5 with Angular JS, JSTL, JSP, jQuery and Bootstrap for the presentation layer along with JavaScript for client-side validation.
Taken care of Java Multithreading part in back end components.
Developed HTML reports for various modules as per the requirement.
Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
Created multiple RESTful web services using jersey 2 framework.
Used Aqua Logic BPM (Business Process Managements) for workflow management.
Developed the application using NOSQL on MongoDB for storing data to the server.
Developed complete business tier with state full session Java beans and CMP Java entity beans with EJB 2.0.
Developed integration services using SOA, Web Services, SOAP, and WSDL.
Designed, developed and maintained the data layer using the ORM framework in Hibernate.
Used Spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated Spring with JSF.
Involved in writing Unit test cases using JUnit and involved in integration testing.
Environment: Java, J2EE, HTML, CSS, JSP, JavaScript, Bootstrap, AngularJS, Servlets, JDBC, EJB, Java Beans, Hibernate, Spring MVC, Restful, JMS, MQ Series, AJAX, WebSphere Application Server, SOAP, XML, MongoDB, JUnit, Rational Suite, CVS Repository.
Client: IDBI Bank - Hyderabad, India July 2010 to December 2011
Role: Oracle Developer
Responsibilities:
Developing Oracle PL/SQL stored procedures, Functions, Packages, SQL scripts.
Worked with users and application developers to identify business needs and provide solutions.
Created Database Objects, such as Tables, Indexes, Views, and Constraints.
Enforced database integrity using primary keys and foreign keys.
Tuned pre-existing PL/ SQL programs for better performance.
Created many complex SQL queries and used them in Oracle Reports to generate reports.
Implemented data validations using Database Triggers.
Used import export utilities such as UTL_FILE for data transfer between tables and flat files
Performed SQL tuning using Explain Plan.
Provided support in the implementation of the project.
Worked with built-in Oracle standard Packages like DBMS_SQL, DBMS_JOBS and DBMS_OUTPUT.
Created and implement report modules into database from client system using Oracle Reports as per the business requirements.
Used PL/SQL Dynamic procedures during Package creation.
Environment: Oracle 9i, Oracle Reports, SQL, PL/SQL, SQL*Plus, SQL*Loader, Windows XP.