SMITCHANDARANA
Hadoop Developer
Email: ******.*@******************.*** Contact: 469-***-****
Professional summary:
• Around 5 years of IT experience involving project development, implementation, deployment, and maintenance using Hadoop ecosystem and related technologies with domain knowledge in Finance, Banking, manufacturing, and Health care.
• 3 years of experience in using Hadoop and its ecosystem components like HDFS, Map Reduce, Yarn, Spark, Hive, Pig, HBase, Zookeeper, Oozie, Flume, Storm and Sqoop.
• In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node and Resource Manager concepts.
• In depth understanding of Map Reduce and AWS Cloud concepts and its critical role in Data Analysis of huge and complex datasets.
• Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
• Experienced in processing large datasets of different forms including structured, semi-structured and unstructured data.
• Expertise in usage of Hadoop and its ecosystem commands.
• Expertise at designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS.
• In depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
• Skilled on streaming data using Apache Spark, migrating the data from Oracle to Hadoop HDFS using Sqoop.
• Expertise in using Spark-SQL with various data sources like JSON and Parquet.
• Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
• Import the data from different sources like HDFS/HBase into Spark RDD.
• Proficient on processing the data using Apache Pig by registering User Defined Functions (UDF) written in Java
• Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
• Experienced with different file formats like Parquet, ORC, CSV, Text, Sequence, XML, JSON and Avro files.
• Skilled in scheduling recurring Hadoop jobs using Apache Oozie workflows.
• Proficient in designing and querying the NoSQL databases like MongoDB, HBase, Cassandra.
• Skilled in RDBMS and very good hands on experience on DB systems like Oracle & MySQL.
• Proficiency on Advanced UNIX concepts and working experience on advance scripting/programming.
• Strong experience creating real time data streaming solutions using Apache Spark Core, Spark SQL and Data Frames.
• Hands on experience with Spark streaming to receive real time data using Kafka.
• Worked extensively with Hive DDLs and Hive QLs.
• Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
• Extensive experience with ETL and Query big data tools like Pig Latin and Hive QL.
• Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
• Experience in analyzing data using Hive QL, Pig Latin, and custom Map Reduce programs in Java.
• Developed Hive and Pig scripts for handling business transformations and analyzing data.
• Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs.
• Experience with Big Data ML toolkits, such as Mahout and Spark ML.
• Expert database engineer, NoSQL and relational data modeling.
• Experience in HBase Cluster Setup and Implementation.
• Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration and Recovery from a Name node failure.
• Experience in installation, configuration, support and monitoring of Hadoop clusters using Apache, Cloudera distributions and AWS.
• Understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
• Proficient in using data visualization tools like Tableau and MS Excel.
• Experience in sending weekly and monthly status reports to Client/Higher Management.
• Ready to take on new challenges and flexible, excellent problem-solving skills, Self-starter, ability to explore and learn new concepts, tools and applications.
• Possess good team management, coordination, documentation and presentation skills along with excellent communication and interpersonal skills.
Educational Details:
• Northeastern University, Boston, MA
Master of Science in Computer System Engineering (Internet of Things)
• Gujarat Technological University, Ahmedabad, Gujarat, India
Bachelor of Engineering, Mechanical Engineering
Big data/Hadoop Ecosystem HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Kafka, Impala.
Java / J2EE Technologies Core Java, Servlets, JSP, JDBC, XML, REST, SOAP, WSDL
Programming Languages Java, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases MongoDB, Cassandra, HBase
Database Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.
Web Technologies HTML, XML, JDBC, JSP, CSS, JavaScript, SOAP
Tools Used Eclipse, Putty, WinSCP, NetBeans, QC, QlikView
Operating System Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, Red Hat
Methodologies Agile/Scrum, Rational Unified Process and Waterfall
Distributed plat forms Hortonworks, Cloudera, MapR
Technical Skills:
Professional Experience:
United Airlines, Chicago, Illinois Jan 2019 to Till Date
Hadoop/ Bigdata / SparkDeveloper
Project Description:United Airlines, Inc., commonly referred to as United, is a major United States airline headquartered in Chicago, Illinois. United operates a large domestic and international route network, with an extensive presence in the Asia-Pacific region. Single view of customer (SVOC) wherein worked in POC phases of data acquisition, data modeling & Mapping, Hive Tables & writing query scripts for accessing Teradata in order to create Hive Tables for different Environments.
Responsibilities:
• Did spark streaming and micro-batch processing using Scala as programming language.
• Using Hive Script in Spark for data cleaning and transformation purpose.
• importing of data from various data sources; perform transformations using Hive, MapReduce, load data into HDFS and extract the data from MySQL into HDFS using Sqoop.
• Export the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
• created data pipeline process for structuring, processing, and transforming data using Kafka and Scala.
• created Kafka spark streaming data pipelines for consuming the data from external source and performing the transformations in Scala.
• Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster.
• Extensively used Pig for data cleansing. Create partitioned tables in Hive.
• Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
• Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
• Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
• Created custom python/shell scripts to import data via SQOOP from Oracle databases.
• Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
• Load and transform Design efficient Spark code using Python and Spark SQL, which can be forward engineered by our code generation developers.
• large sets of structured, semi structured and unstructured data.
• Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
• Log data collected from the web servers was channeled into HDFS using Flume and spark streaming.
• Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
• Involved in developing Scala programs which supports functional programming.
• Used Git-Hub for project version management.
Environment: Hadoop, HDFS, Spark, Scala, MapReduce, HQL, Impala, Pig, Java, Spark/Scala, AWS, Kafka, SBT, Maven, Sqoop, Zookeeper
Wells Fargo,Des Moines, IA Sep 2018 to Dec 2019
Hadoop/SparkDeveloper
Project Description:Wells Fargo is international banking and financial services holding company, provides banking, insurance, commercial and consumer finance, mortgage. Worked with data engineers to implement various application and schema objects, support data visualizations, support dynamic reports and dashboards. Implemented solutions using micro strategy to meet operational requirements including maintainability, scalability, reliability security to deliver relevant content.
Responsibilities:
• Analyzing the functional specs provided by the client and developing detailed solution design document with the Architect and the team.
• Discussing with the client business teams to confirm the solution design and changing the requirements if needed.
• Worked on importing and exporting data from DB2 into AWS and HIVE using Sqoop for analysis, visualization and to generate reports.
• Developed Pig Latin scripts to perform Map Reduce jobs.
• Developed product profiles using Pig and commodity UDFs.
• Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
• Created HBase tables and column families to store the user event data.
• Written automated HBase test cases for data quality checks using HBase command line tools.
• Used Hive and Impala to query the data in Hbase.
• Developed and implemented core API services using Java/Scala/Python and Spark.
• Convert CSV files into parquet format and load the parquet file into data frames and query them using Spark and SQL.
• Migrating data from Amazon AWS to databases such as MYSQL, Vertica using Spark dataframes.
• Build a continuous ETL pipeline by using Kafka, Spark Streaming and HDFS.
• Perform ETL on the data from different formats like JSON, Parquet, and Database. Then run ad-hoc querying using Spark SQL.
• Perform complex data transformations in Spark using Scala language.
• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
• Connect Tableau and Squirrel SQL clients to Spark SQL (Spark thrift server) via data source and run queries.
• Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
• Managed and scheduled Jobs on a Hadoop cluster.
• Collected the log data from web servers and integrated into HDFS using Flume.
• Involved in defining job flows, managing and reviewing log files.
• Setting up & configure of Cassandra, Spark and other relevant architecture components
• Installed Oozie workflow engine to run multiple Spark, HiveQL and Pig jobs.
• Created Hive tables to store the processed results in a tabular format.
• Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases
• Done various compressions and file formats like snappy, gzip, Bzip2, avro, parquet, text.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark/Scala, Pig, Sqoop, Oozie, Zookeeper, Teradata, PL/SQL, MySQL, Windows, Oozie, HBase.
INDI Technologies, India May 2015 to Jul 2018
Python Developer
Project Description: The Company is based on medical devices designing and marketing. It gets information from various research and manufacturing organizations. The log files generated by these systems were decided to be maintained in Hadoop& Python cluster for further analysis.
Responsibilities:
• Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
• Worked on requirement gathering and High-level design.
• Used Python scripts to update content in the database and manipulate files.
• Converted Visual Basic Application to Python and MSQL.
• Used HTML/CSS and JavaScript for UI development.
• Developed object-oriented programming to enhance company product management.
• Utilized Python in the handling of all hits on Django, Redis, and other applications.
• Used several Python libraries like wxPython, NumPy and matPlotLib.
• Was involved in environment code installation as well as the SVN implementation.
• Created unit test/regression test framework for working/new code.
• Responsible for debugging and troubleshooting the web application.
• Build all database mapping classes using Django models.
Environment:Python, SVN, Eclipse, Java Script, XML, Linux, Bugzilla, HTML5/CSS.