Post Job Free
Sign in

Big Data Computer Science

Location:
Phoenix, AZ
Posted:
May 17, 2024

Contact this candidate

Resume:

Amna Shahid

Big Data Developer

Education

Cell # Email

480-***-****

ad5rf0@r.postjobfree.com

Bachelors in Computer Science

Masters in Computer Science

Executive Summary

I am a Hadoop and Spark developer, having 4+ Years of extensive Big Data experience within Banking and Retail sector.

I have built my career with variety of technologies, where I am not just a developer but also an admin and the best team player.

Strong experience working in Big Data Technologies that includes Apache Hadoop, Apache Hive, Apache Sqoop, Apache Kafka, Apache Spark, Spark MLlib, MapReduce, Apache HBase, PIG, MongoDB, Apache Ambari on Hortonworks/Cloudera.

I have very good experience in Hadoop & Spark ecosystem components to handle data ingestion, data storage, data processing and data visualization layers effectively.

I am a quick learner and have thirst for learning new things, pursing knowledge, and ability to adapt the changes to new environment at a fast pace.

Technical Skills

Big Data Technologies

HDFS Commands SQOOP

Kafka, Flume

DATA INGESTION

DATA STORAGE VISUALIZATION

Programming

Python, Core Java, C, C++, Maven

RDBMS

MySQL, Oracle, MS SQL Server

Cloud Services

HDFS

MongoDB HBase

DATA PROCESSING

Tableau Qlik View

Amazon Web Services (AWS) S3, EMR

IDEs / Version Control

Eclipse, IntelIiJ, GIT

YARN Map Reduce, PIG, HIVE Spark Core, Spark SQL Spark Streaming, MLIB

SDLC - Methodologies

Agile/Scrum Methodology, Waterfall model

Professional Experience

Techmate Technologies, Phoenix, AZ Jan 2021 – till date Spark and Hadoop developer

Project scope - Ensure all the requirements are translated to technical terms and analyze, develop, maintain, and enhance the Risk assessment.

Skill Set: Hadoop - HDFS, Spark (RDD’s, Data-Frames and Datasets), Apache Kafka, Apache Sqoop, Apache Hive, Ambari - Hortonworks, MySQL, HBase, Python, Unix Shell Scripting, PySpark pip, Intellij.

Role and Responsibilities:

Develop Hadoop batch applications using Python, Unix shell scripting, Hadoop echo system technologies that includes Hive, PySpark in Cloudera/Hortonworks distribution.

Worked on improving the in-memory computing performance of Spark applications by optimizing the Spark core RDD transformations.

Involved in loading large sets of structured, semi structured and unstructured data from various sources and transforming into required formats.

Develop utility to read complex data using Spark Data-Frames and flattening of complex JSON and storing into the designated tables.

Develop data ingestion pipelines using Apache Sqoop and Apache Kafka to ingest the database tables and streaming data into HDFS for analysis.

Worked on ingesting data from My SQL to HDFS using Apache Sqoop.

Extensively worked on optimizing Sqoop data ingestion by enhancing to delta and Incremental data extraction from source systems.

Write complex Hive and HBase queries to test new account rules and data integrity.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Involved in writing custom Hive - HQL queries for order and sales analysis based on adhoc requirements.

Involved in writing Hive scripts with various source formats of same data to identify the performance and query execution time for Hive optimization.

Develop automatic cluster space cleaning utility and HBase unused table deletion utility to clean the space for our queue using Unix script.

Provide technical leadership and expertise as well as manage the development & analysis of business requirements for multiple corporate project initiatives.

Review codes with team members.

Recommend alternative approaches to satisfying business requirements based on detailed analyses.

Use technologies like Nexus, Jenkins, and XLR for Continuous Integration and Continuous Development (CICD).

Involved in Unit Testing, Integrated Testing and Defect Fixing.

PTI - Lahore, Pakistan Jan 2020– Dec 2020

Big Data Developer

Project scope - Common application for logging real-time ingestion using streaming technologies.

Skill Set: Spark Streaming, Kafka, Map-Reduce, PIG, Amazon Web Services - EC2, Lambda services, S3 storage, MongoDB, Oozie workflows, Scala.

Role and Responsibilities:

Use Apache Kafka queues to build common application that support real-time log ingestion.

Developing Spark Streaming application to receive the data streams from Kafka and process the continuous data streams and trigger actions based on fixed events.

Worked extensively on integrating Kafka (Data Ingestion) with Spark Streaming to achieve high-performance real-time processing system.

Involved in designing and development of various custom data processing modules in Spark Core, Spark SQL, Hive and Map-Reduce jobs.

Involved in writing PIG and Hive User Defined Functions (UDFs) based on the custom and frequent analysis requirements.

Worked in Agile development approach and created the estimates and defined the sprint stages.

Provide support for production issues under strict timelines based on the criticality of the issue.

Prepare testing strategy, test plan, test cases and perform unit & integration testing.

Work closely and efficiently with the business owners to understand their needs from an overall application standpoint.

Participate in architecture design sessions to discuss and suggest on new technologies/methodologies to improve current applications and processes.

Perform change management activities like monitoring & submission of defects into tools like Rally, RCM or Jira and generate reports.

Mediland Pakistan(PVT.)LTD, Lahore, Pakistan Mar 2018– Dec 2019 Project Manager

Project scope - Support various teams and projects to utilize project management, leadership, and technical skills to ensure successful delivery of projects and achieve business objectives.

Skill Set: Agile, Waterfall, MS Project, JIRA, Risk Management and Mitigation, Budget Monitoring, Financial forecasting

Role and Responsibilities:

Led multiple IT projects, including software development, infrastructure upgrades, and system implementations.

Defined project scope, goals, and deliverables, and developed project plans and schedules using Agile and Waterfall methodologies.

Coordinated project activities and communication among cross-functional teams, including developers, testers, business analysts, and stakeholders, to ensure project progress and completion.

Monitored project risks, issues, and dependencies, and developed mitigation strategies to address them.

Managed project budget, forecasted resource needs, and tracked project expenses against budget to ensure financial accountability.

Conducted regular project status meetings, reported project progress and issues to senior management, and prepared project-related documentation, including project charters, status reports, and change requests.



Contact this candidate