Big data/Hadoop developer

Location:

Texas City, TX

Salary:

$70/hr

Posted:

May 16, 2022

Contact this candidate

Resume:

Sayeed UR Rahman Mohammed

Email:******@*****.***

Number:469-***-****

Professional Summary:

•Having 7+ years of experience in IT which includes Analysis, Design, Development, Implementation & maintenance of projects in Big Data using Apache Hadoop/Spark echo systems, design and development of web applications using Java technologies.

•Experience in analysis, design, development, and integration using Big Data Hadoop ecosystem components with cloud era in working with various file formats like Avro, Parquet.

•Working with various compression techniques like Snappy, LZO and G Zip.

•Experience in developing customized partitioners and combiners for effective data distributions.

•Expertise in tuning Impala queries to overcome multiple concurrence jobs and out of memory errors for various analytics use cases.

•Rigorously applied transformations in Spark and R programs.

•Worked with AWS cloud based CDH 5.13 cluster and developed merchant campaigns using Hadoop.

•Developed and maintained ETL (Data Extraction, Transformation and Loading) mappings using Informatica Designer 8.6 to extract the data from multiple source systems that comprise databases like Oracle 10g, SQL Server 7.2, flat files to the Staging area, EDW and then to the Data Marts.

•Expertise in using built in Hive Ser De and developing custom SerDes.

•Developed multiple Internal and external Hive Tables using Dynamic Partitioning & bucketing.

•Design and development of full text search feature with multi-tenancy elastic search after collecting the real time data through Spark streaming.

•Hands on experience on AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Workspaces, Lambda, Kinesis, RDS, SNS, SQS)

•Experience in developing NoSQL applications using Mongo DB, HBase and Cassandra.

•Tuned multiple spark applications for better optimization.

•Developed data pipeline for real time use cases using Kafka, Flume and Spark Streaming.

•Experience in importing and exporting multi-Terabytes of data using Sqoop from HDFS, Hive to Relational Database Systems (RDBMS) and vice-versa.

•Developed multiple hive views for accessing HBase Tables data.

•Used complex Spark SQL programs for better joining and display the results on Kibana dashboard.

•Expertise in using various formats like Text, Parquet while creating Hive Tables.

•Experience in analyzing large scale data to identify new analytics, insights, trends, and relationships with a strong focus on data clustering.

•Expertise in collecting data from various source systems as social media and databases.

•End-to-end hands on in ETL process and setup automation to load terabytes data into HDFS.

•Good Experience in Developing Applications using core java, Collections, Threads, JDBC,

•Servlets, JSP, Struts, Hibernate, XML components using various IDEs such as

•Eclipse6.0, MyEclipse.

•Experience in SQL programming in writing queries using joins, stored procedures, triggers, functions and performing query optimization techniques with Oracle, SQL Server, MySQL.

•Excellent team worker with good interpersonal skills and leadership qualities.

•Excellent organizational and communication skills.

•Excellent in understanding of Agile and scrum methodologies.

Education:

●Bachelor of Science in Information Technology, 2013

●Master of Science in Information Technology, Campbellsville University, 2017

Technical Skills:

Programming Languages

C, C++, Java, R, Python, UNIX

Distributed Computing

Apache Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, Hue, Kerberos, Sentry, Zookeeper, Kafka, Flume, Impala, HBase and Sqoop

AWS Components

EC2, S3, RDS, Redshift, EMR, DynamoDB, Lambda, RDS, SNS, SQS

Web Development

HTML, JSP, XML, Java Script and AJAX

Web Application Server

Tomcat 6.0, JBoss 4.2 and Web Logic 8.1

Operating Systems

Windows, Unix, iOS, Ubuntu and RedHat Linux

Tools

Eclipse, NetBeans, Visual Studio, Agitator, Bugzilla, Arc Styler (MDA), Rational Rose, Enterprise Architect and Rational Software Architect

Source Control Tools

VSS, Rational Clear Case, Subversion

Application Framework

Struts 1.3, spring 2.5, Hibernate 3.3, Jasper Reports, JUnit and JAXB

RDBMS

Oracle and SQL Server 2016

NOSQL

MongoDB, Cassandra and HBase

Professional Experience:

JPMC, Chicago, IL June 2020 to Present

Role: Hadoop/Spark/java Developer

Roles &Responsibilities:

•Responsible for building scalable distributed data solutions using Hadoop.

•Responsible for managing and scheduling Jobs on a Hadoop cluster.

•Responsible for loading the customers Data from SAS to MSSQL 2016 and perform data massaging, mining & cleansing then export to HDFS and Hive using Sqoop

•Written PIG scripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants

•Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming

•Responsible for creating data pipeline using Kafka, Spark Streaming

•Loading data from UNIX file system to HDFS and vice versa.

•Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.

•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Python.

•Developed POC using Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.

•Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

•Implemented Data Ingestion in real time processing using Kafka.

•Developed Spark code using Spark-SQL/Streaming for faster processing of data

•Configured Spark Streaming to receive real time data and store the stream data to HDFS.

•nds as per the requirement Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

•Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS

•Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.

•Developed multiple Kafka Producers and Consumers as per the software requirement specifications.

•Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

•Real time streaming the data using Spark with Kafka.

•Responsible for creating Hive tables and working on them using Hive QL.

•Implementing various Hive UDF’s as per business requirements.

•Exported the analyzed data to the databases using Sqoop for visualization and to generate reports for the BI team.

•Involved in Data Visualization using Tableau for Reporting from Hive Tables.

•Developed Python Mapper and Reducer scripts and implemented them using Hadoop Streaming.

•Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.

•Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.

•Responsible for writing Hive queries for data analysis to meet the business requirements.

•Customized Apache Solr to handle fallback searching and provide custom functions.

•Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, Cassandra, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins

Juniper Networks (IBM), Sunnyvale, CA May 2019 – May-2020

Hadoop/java Developer

Responsibilities:

•Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.

•Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.

•Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.

•Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.

•Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.

•Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.

•Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.

•We are having apple applications used Radar for tasks and box, quip for requirement documents.

•Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.

•Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

•Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.

•Implemented Elastic Search on Hive data warehouse platform.

•Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.

•Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.

•Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.

•Used the Spark DataStax Cassandra Connector to load data to and from Cassandra.

•Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).

•Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.

•Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.

•Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.

•Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.

•Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.

•Worked with Log4j framework for logging debug, info & error data.

•Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.

•Developed Custom Pig UDFs in Java and used UDFs from Piggybank for sorting and preparing the data.

•Developed Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.

•Used Amazon DynamoDB to gather and track the event-based metrics.

•Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.

•Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.

•Written several Map reduce Jobs using Java API, also Used Jenkins for Continuous integration.

•Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.

•Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

•Modified ANT Scripts to build the JAR's, Class files, WAR files and EAR files.

•Generated various kinds of reports using Power BI and Tableau based on Client specification.

•Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.

•Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.

•Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.

Environment: Eclipse, jdk1.8.0, ClouderaManager5.14 HDFS, MapReduce, Hive2.0, HBase, Apache-Maven3.0.3, Mongo DB, Splunk6.0, SAP, JIRA, Kubernetes, MicroservicesEC-2, ELB, RDS, S3, CloudWatch, SNS, SQS, EBS.

T-Mobile, Atlanta, GA Jan 2018 – Jan 2019

Hadoop/Spark Developer

Responsibilities:

•Used SCRUM for agile development and participated in requirement gathering, design, implementation, reviewing phases.

•Implemented Micro Services based Cloud Architecture on AWS platform.

• Implemented Spring boot microservices to process the messages into the Kafka cluster setup.

•Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.

•Used Kafka and Kafka brokers, initiated the spark context and processed live streaming information with RDD and Used Kafka to load data into HDFS and NoSQL databases.

•Worked on both Producer API and Consumer API in Kafka.

•Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and created applications, which monitors consumer lag within Apache Kafka clusters.

•Implemented to reprocess the failure messages in Kafka using offset id.

•Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.

•Used Swagger for scheduling process.

•Installed, Configured TalendETL on single and multi-server environments.

•Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.

•Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.

•Worked on continuous Integration tools Jenkins and automated jar files at end of day.

•Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.

•Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.

•Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster.

•Integrated REST API using JWT token for authentication and security for the microservices.

•Involved in deploying systems on Amazon Web Services Infrastructure services EC2, S3, Dynamo DB, SQS, Cloud Formation

•Developing and maintaining cloud-based architecture in AWS, including creating machine image like AMI.

•Implementing jobs using Groovy Scripts for creating Jenkins jobs for continuous integration.

•Utilized most of the AWS services like S3 as a data store for storing the files that fall into the bucket, IAM roles and generated lambda functions to trigger an event that occurs in S3.

•Maintained GIT repo during project development. Conducted merge as part of peer’s reviews.

•Experienced in writing Spark Applications in Python.

•Used Spark SQL to handle structured data in Hive.

•Imported semi-structured data from Avro files using Pig to make serialization faster

•Processed the web server logs by developing multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.

•Experienced in converting Hive/SQL queries into Spark transformations using Spark RDD, and Python.

•Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs

•Developed unit test cases using Mockito framework for testing accuracy of code and logging is done using SLF4j + Log4j.

•Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR, AZURE.

•Used Zookeeper to provide coordination services to the cluster.

•Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.

•Prepare and Implement Project Plan using JIRA and TFS for tracking bugs.

Environment: Java, Python, Sqoop, Spring Boot, Micro services, AWS, jQuery, JSON, Git, Jenkins, Docker, Maven, Apache Kafka, Apache Spark, SQL Server, Kibana, Elastic Search

State Farm Insurance, Bloomington, IL Oct 2015 – Dec 2017

Hadoop Developer

Responsibilities:

•Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.

•Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.

•Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.

•Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.

•Developed PySpark enterprise-wide application to load and process transactional data into Cassandra NoSQL Database.

•Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.

•Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.

•Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.

•Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.

•Created custom new columns depending up on the use case while ingesting the data into Hadoop Lake using pyspark.

•Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.

•Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

•Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.

•Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.

•Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms

•Developed PIG UDFs to provide Pig capabilities for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders and Implemented various requirements using Pig scripts.

•Experienced on loading and transforming of large sets of structured, semi structured and unstructured data

•Created POC using Spark Sql and Mlib libraries.

•Developed a Spark Streaming module for consumption of Avro messages from Kafka.

•Implemented Regression models using PySpark MLlib.

•Converted SQL scripts to PySpark.

•Experienced in managing and reviewing Hadoop log files

•Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.

•Create and Maintain Teradata Tables, Views, Macros, Triggers and Stored Procedures

•Monitored workload, job performance and capacity planning using Cloudera Distribution.

•Worked on Data loading into Hive for Data Ingestion history and Data content summary.

•Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.

•Used Hive and Impala to query the data in HBase.

•Created Impala tables and SFTP scripts and Shell scripts to import data into Hadoop.

•Developed HBase java client API for CRUD Operations.

•Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation

•Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra

•Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig

•Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.

•Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.

•Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS

•Experienced with AWS AZURE services to smoothly manage application in the cloud and creating or modifying the instances.

•Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.

•Used EMR (Elastic Map Reducing) to perform bigdata operations in AWS.

•Loading data from different source (database & files) into Hive using Talend tool.

• Implemented Spark using Python and utilizing Spark Core, Spark Streaming and Spark SQL for faster processing of data instead of MapReduce in Java

•Experience in integrating Apache Kafka with Apache Spark for real time processing.

•Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.

•Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability

•Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.

•Involved in running Hadoop Streaming jobs to process Terabytes of data

•Used JIRA for bug tracking and CVS for version control.

Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, SOLR, CDH3, Cassandra, Oracle, Unix/Linux, Hadoop, Hive, PIG, SQOOP, Flume, HDFS, J2EE, Oracle/SQL & DB2, Unix/Linux, JavaScript, Ajax, Eclipse IDE, CVS, JIRA, AZURE

Capital IQ, India Mar 2014 – Aug 2015

Java Developer

Responsibilities:

•Designed and developed a system framework using J2EE technologies based on MVC architecture.

•Followed agile methodology to implement the requirements and tailored the application to customer needs.

•Involved in the phases of SDLC (Software Development Life Cycle) including Requirement collection, Design and analysis of Customer specification, Development and Customization of the application

•Developed and enhance web applications using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.

•Developed the UI components using jQuery and JavaScript Functionalities.

•Developed J2EE components on Eclipse IDE.

•Created the EAR and WAR files and deployed the application in different environment.

•Used JNDI as part of service locator to locate the Factory objects, Data Source Objects and other service factories.

•Hands on experience using Teradata utilities (Fast Export, Multiload, Fast Load, Tpump, BTEQ and Query Man).

•Implemented test scripts to support test driven development and continuous integration.

•Modifications on the database were done using Triggers, Views, Stored procedures, SQL and PL/SQL.

•Implemented the mechanism of logging and debugging with Log4j.

•Used JIRA as a bug-reporting tool for updating the bug report.

Environment: Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JNDI, JMS, JIRA, JavaScript, XML, DB2, SVN, log4j.

Birla Soft, Hyderabad, India June 2013 – Mar 2014

Java Developer

Responsibilities:

•Involved in designing development, integration testing of modules, requirements.

•Worked on Use case diagrams class diagrams and sequence diagrams using Rational rose for design phase.

•Used Agile methodology for the every module in project for developing the application.

•Developed the application implementing Spring MVC Architecture with Hibernate as ORM framework.

•Developed the application using Front Controller, Business delegate, DAO and Session Facade patterns.

•Designed and developed User Interface using JSP, HTML, CSS, MXML,JSF, JSP, JSTL, AJAX and XML also involved in Designing and developing several Flex UI Screens.

•Involved in Design and developing user interface using Flex Components View Stack, Checkboxes, Repeater, Title.

•Involved in developing database transactions through JDBC.

•Used XML using DOM and SAX parsers between different components for transferring the data.

•Extensively worked in developing Custom tags from Struts tags for highlighting the invalid input fields if validation error occurs.

•Developed WSDL based web services using WSDL, SOAP, JAX-WS, AXIS, APACHE X FIRE, JAXB .

•Used web services like RESTFUL for developing XML and JSON using JAX-RS

•Used CSV for version control.

•Developed and deployed the applications using servers like Apache Tomcat, JBoss.

•Created test cases by using Junit Flex unit.

•Wrote Maven build scripts for building applications.

Environment: Java, J2EE, MVC, Servlets, Spring, JSP, XML, HTML, MXML, Maven, Adobe flex builder, Flex API,Blaze DS, Flex, Tag libs, REST, CSS, JavaScript, jQuery, AJAX, JSON, CAS, Eclipse, Apache Tomcat 7,JBoss, Web Services WSDL, SOAP, Restful, Junit Flex unit, Clear Case,

Contact this candidate