Big Data Engineer

Location:

California Hot Springs, CA

Posted:

June 09, 2022

Contact this candidate

Resume:

Sayeed UR Rahman Mohammed

Sr Big Data Engineer

469-***-****

************@*****.***

Professional Summary:

•Having 8+ years of experience in IT which includes Analysis, Design, Development, Implementation & maintenance of projects in Big Data using Apache Hadoop/Spark echo systems, design and development of web applications using Java technologies.

•Experience in analysis, design, development, and integration using Big Data Hadoop ecosystem components with cloud era in working with various file formats like Avro, Parquet.

•Working with various compression techniques like Snappy, LZO and G Zip.

•Experience in developing customized partitioners and combiners for effective data distributions.

•Expertise in tuning Impala queries to overcome multiple concurrence jobs and out of memory errors for various analytics use cases.

•Rigorously applied transformations in Spark and R programs.

•Worked with AWS cloud based CDH 5.13 cluster and developed merchant campaigns using Hadoop.

•Developed and maintained ETL (Data Extraction, Transformation and Loading) mappings using Informatica Designer 8.6 to extract the data from multiple source systems that comprise databases like Oracle 10g, SQL Server 7.2, flat files to the Staging area, EDW and then to the Data Marts.

•Expertise in using built in Hive Ser De and developing custom SerDes.

•Developed multiple Internal and external Hive Tables using Dynamic Partitioning & bucketing.

•Design and development of full text search feature with multi-tenancy elastic search after collecting the real time data through Spark streaming.

•Hands on experience on AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Workspaces, Lambda, Kinesis, RDS, SNS, SQS)

•Experience in developing NoSQL applications using Mongo DB, HBase and Cassandra.

•Tuned multiple spark applications for better optimization.

•Developed data pipeline for real time use cases using Kafka, Flume and Spark Streaming.

•Experience in importing and exporting multi-Terabytes of data using Sqoop from HDFS, Hive to Relational Database Systems (RDBMS) and vice-versa.

•Developed multiple hive views for accessing HBase Tables data.

•Used complex Spark SQL programs for better joining and display the results on Kibana dashboard.

•Expertise in using various formats like Text, Parquet while creating Hive Tables.

•Experience in analyzing large scale data to identify new analytics, insights, trends, and relationships with a strong focus on data clustering.

•Expertise in collecting data from various source systems as social media and databases.

•End-to-end hands on in ETL process and setup automation to load terabytes data into HDFS.

•Good Experience in Developing Applications using core java, Collections, Threads, JDBC,Servlets, JSP, Struts, Hibernate, XML components using various IDEs such as Eclipse6.0, MyEclipse.

•Good experience of OpenShift platform in managing Docker containers and Kubernetes Clusters.

•Experience in SQL programming in writing queries using joins, stored procedures, triggers, functions and performing query optimization techniques with Oracle, SQL Server, MySQL.

•Excellent team worker with good interpersonal skills and leadership qualities.

•Excellent organizational and communication skills.

•Excellent in understanding of Agile and scrum methodologies.

Education:

●Bachelor of Science in Information Technology, 2013

●Master of Science in Information Technology, Campbellsville University, 2017

Technical Skills:

Programming Languages

C, C++, Java, R, Python, UNIX

Distributed Computing

Apache Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, Hue, Kerberos, Sentry, Zookeeper, Kafka, Flume, Impala, HBase and Sqoop

AWS Components

EC2, S3, RDS, Redshift, EMR, DynamoDB, Lambda, RDS, SNS, SQS

Web Development

HTML, JSP, XML, Java Script and AJAX

Web Application Server

Tomcat 6.0, JBoss 4.2 and Web Logic 8.1

Operating Systems

Windows, Unix, iOS, Ubuntu and RedHat Linux

Tools

Eclipse, NetBeans, Visual Studio, Agitator, Bugzilla, Arc Styler (MDA), Rational Rose, Enterprise Architect and Rational Software Architect

Source Control Tools

VSS, Rational Clear Case, Subversion

Application Framework

Struts 1.3, spring 2.5, Hibernate 3.3, Jasper Reports, JUnit and JAXB

RDBMS

Oracle and SQL Server 2016

NOSQL

MongoDB, Cassandra and HBase

Professional Experience:

JPMC, Chicago, IL June 2020 to Present

Sr.Big Data Engineer

Roles & Responsibilities:

•Responsible for building scalable distributed data solutions using Hadoop.

•Responsible for managing and scheduling Jobs on a Hadoop cluster.

•Responsible for loading the customers Data from SAS to MSSQL 2016 and perform data massaging, mining & cleansing then export to HDFS and Hive using Sqoop

•Written PIG scripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants

•Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming

•Responsible for creating data pipeline using Kafka, Spark Streaming

•Loading data from UNIX file system to HDFS and vice versa.

•Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.

•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Python.

•Developed POC using Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.

•Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

•Implemented Data Ingestion in real time processing using Kafka.

•Developed Spark code using Spark-SQL/Streaming for faster processing of data

•Configured Spark Streaming to receive real time data and store the stream data to HDFS.

•nds as per the requirement Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

•Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS

•Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.

•Developed multiple Kafka Producers and Consumers as per the software requirement specifications.

•Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

•Real time streaming the data using Spark with Kafka.

•Responsible for creating Hive tables and working on them using Hive QL.

•Implementing various Hive UDF’s as per business requirements.

•Exported the analyzed data to the databases using Sqoop for visualization and to generate reports for the BI team.

•Involved in Data Visualization using Tableau for Reporting from Hive Tables.

•Developed Python Mapper and Reducer scripts and implemented them using Hadoop Streaming.

•Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.

•Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.

•Responsible for writing Hive queries for data analysis to meet the business requirements.

•Customized Apache Solr to handle fallback searching and provide custom functions.

•Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, Cassandra, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins

Juniper Networks (IBM), Sunnyvale, CA May 2019 – May-2020

Big Data Engineer

Responsibilities:

•Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.

•Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.

•Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.

•Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.

•Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.

•Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.

•Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.

•We are having apple applications used Radar for tasks and box, quip for requirement documents.

•Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.

•Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

•Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.

•Implemented Elastic Search on Hive data warehouse platform.

•Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.

•Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.

•Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.

•Used the Spark DataStax Cassandra Connector to load data to and from Cassandra.

•Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).

•Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.

•Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.

•Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.

•Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.

•Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.

•Worked with Log4j framework for logging debug, info & error data.

•Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.

•Developed Custom Pig UDFs in Java and used UDFs from Piggybank for sorting and preparing the data.

•Managed host Kubernetes environment, making it quick and easy to deploy and manage containerized applications

•Developed CI/CD system with Jenkins on Kubernetes environment, utilized Kubernetes for the runtime environment for the CI/CD system to build, Test and Deploy.

•Developed Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.

•Used Amazon DynamoDB to gather and track the event-based metrics.

•Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.

•Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.

•Written several Map reduce Jobs using Java API, also Used Jenkins for Continuous integration.

•Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.

•Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

•Modified ANT Scripts to build the JAR's, Class files, WAR files and EAR files.

•Generated various kinds of reports using Power BI and Tableau based on Client specification.

•Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.

•Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.

•Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.

Environment: Eclipse, jdk1.8.0, ClouderaManager5.14 HDFS, MapReduce, Hive2.0, HBase, Apache-Maven3.0.3, DynamoDB, Mongo DB, Splunk6.0, SAP, JIRA, Kubernetes, MicroservicesEC-2, ELB, RDS, S3, CloudWatch, SNS, SQS, EBS.

T-Mobile, Atlanta, GA Jan 2018 – Jan 2019

Data Engineer

Responsibilities:

●Responsible for building an Enterprise Data Lake to bring ML ecosystem capabilities to production and make it readily consumable for data scientists and business users.

●Processing and transforming the data using AWS EMR to assist the Data Science team as per business requirement.

●Developing Spark applications for cleaning and validation of the ingested data into the AWS cloud.

●Working on fine-tuning Spark applications to improve the overall processing time for the pipelines.

●Implement simple to complex transformation on Streaming Data and Datasets.

●Work on analysing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, flume, Oozie.6

●Use Spark Streaming to stream data from external sources using Kafka service and responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like RedShift, Dynamo DB.

●Perform configuration, deployment, and support of cloud services in Amazon Web Services (AWS).

●Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift.

●Design Develop and test ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.

●Migrate an existing on-premises application to AWS.

●Build and configure a virtual data centre in the Amazon Web Services cloud to support Enterprise Data Warehouse hosting including Virtual Private Cloud, Security Groups, Elastic Load Balancer.

●Implement data ingestion and handling clusters in real time processing using Kafka.

●Develop Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.

●Develop Spark application for filtering Json source data in AWS S3 and store it into HDFS with partitions and used spark to extract schema of Json files.

●Develop Terraform scripts to create the AWS resources such as EC2, Auto Scaling Groups, ELB, S3, SNS and Cloud Watch Alarms.

●Developed various kinds of mappings with collection of sources, targets and transformations using Informatica Designer.

●Develop Spark programs with PySpark and applied principles of functional programming to process the complex unstructured and structured data sets. Processed the data with Spark from Hadoop Distributed File System (HDFS).

●Implement Serverless architecture using AWS Lambda with Amazon S3 and Amazon Dynamo DB.

Environment: Apache Spark, Scala, Java, PySpark, Hive, HDFS, Hortonworks, Apache HBase, AWS EMR, EC2, AWS S3, AWS Redshift, Redshift Spectrum, RDS, Lambda, Informatica Center, Maven, Oozie, Apache NiFi, CI/CD Jenkins, Tableau, IntelliJ, JIRA, Python and UNIX Shell Scripting

State Farm Insurance, Bloomington, IL Oct 2015 – Dec 2017

Data Engineer

Responsibilities:

●Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Spark, Kafka, and Talend.

●Experience in developing scalable & secure data pipelines for large datasets.

●Gathered requirements for ingestion of new data sources including life cycle, data quality check, transformations, and metadata enrichment.

●Importing data from MS SQL server and Teradata into HDFS using Sqoop.

●Supported data quality management by implementing proper data quality checks in data pipelines.

●Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.

●Implemented data streaming capability using Kafka and Talend for multiple data sources.

●Responsible for maintaining and handling data inbound and outbound requests through big data platform.

●Working knowledge of cluster security components like Kerberos, Sentry, SSL/TLS etc.

●Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu).

●Involved in the development of agile, iterative, and proven data modeling patterns that provide flexibility.

●Created Oozie workflows to automate and productionize the data pipelines.

●Troubleshooted user's analyses bugs (JIRA and IRIS Ticket).

●Involved in developing spark applications to perform ELT kind of operations on the data.

●Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

●Worked on analyzing and resolving the production job failures in several scenarios.

●Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.

●Knowledge on implementing the JILs to automate the jobs in production cluster.

●Involved in creating Hive external tables to perform ETL on data that is produced on daily basis.

●Utilized Hive partitioning, bucketing and performed various kinds of joins on Hive tables

Environment: Spark, HDFS, Hive, Pig, Sqoop, Scala, Kafka, Shell scripting, Linux, Jenkins, Eclipse, Git, Oozie, Talend, Agile Methodology, Teradata.

Capital IQ, India Mar 2014 – Aug 2015

Java Developer

Responsibilities:

•Designed and developed a system framework using J2EE technologies based on MVC architecture.

•Followed agile methodology to implement the requirements and tailored the application to customer needs.

•Involved in the phases of SDLC (Software Development Life Cycle) including Requirement collection, Design and analysis of Customer specification, Development and Customization of the application

•Developed and enhance web applications using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.

•Developed the UI components using jQuery and JavaScript Functionalities.

•Developed J2EE components on Eclipse IDE.

•Created the EAR and WAR files and deployed the application in different environment.

•Used JNDI as part of service locator to locate the Factory objects, Data Source Objects and other service factories.

•Hands on experience using Teradata utilities (Fast Export, Multiload, Fast Load, Tpump, BTEQ and Query Man).

•Implemented test scripts to support test driven development and continuous integration.

•Modifications on the database were done using Triggers, Views, Stored procedures, SQL and PL/SQL.

•Implemented the mechanism of logging and debugging with Log4j.

•Used JIRA as a bug-reporting tool for updating the bug report.

Environment: Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JNDI, JMS, JIRA, JavaScript, XML, DB2, SVN, log4j.

Birla Soft, Hyderabad, India June 2013 – Mar 2014

Java Developer

Responsibilities:

•Involved in designing development, integration testing of modules, requirements.

•Worked on Use case diagrams class diagrams and sequence diagrams using Rational rose for design phase.

•Used Agile methodology for the every module in project for developing the application.

•Developed the application implementing Spring MVC Architecture with Hibernate as ORM framework.

•Developed the application using Front Controller, Business delegate, DAO and Session Facade patterns.

•Designed and developed User Interface using JSP, HTML, CSS, MXML,JSF, JSP, JSTL, AJAX and XML also involved in Designing and developing several Flex UI Screens.

•Involved in Design and developing user interface using Flex Components View Stack, Checkboxes, Repeater, Title.

•Involved in developing database transactions through JDBC.

•Used XML using DOM and SAX parsers between different components for transferring the data.

•Extensively worked in developing Custom tags from Struts tags for highlighting the invalid input fields if validation error occurs.

•Developed WSDL based web services using WSDL, SOAP, JAX-WS, AXIS, APACHE X FIRE, JAXB .

•Used web services like RESTFUL for developing XML and JSON using JAX-RS

•Used CSV for version control.

•Developed and deployed the applications using servers like Apache Tomcat, JBoss.

•Created test cases by using Junit Flex unit.

•Wrote Maven build scripts for building applications.

Environment: Java, J2EE, MVC, Servlets, Spring, JSP, XML, HTML, MXML, Maven, Adobe flex builder, Flex API,Blaze DS, Flex, Tag libs, REST, CSS, JavaScript, jQuery, AJAX, JSON, CAS, Eclipse, Apache Tomcat 7,JBoss, Web Services WSDL, SOAP, Restful, Junit Flex unit, Clear Case,

Contact this candidate