Senior Java Developer

Location:

Richardson, TX, 75082

Posted:

October 25, 2024

Contact this candidate

Resume:

Abdul Hameed

Sr. Data Engineer

****************@*****.***

+1-475-***-****

SUMMARY:

Big Data and Cloud Data engineering professional with 10+ years of experience in building distributed data solutions, data analytical applications, predictive modeling, and ETL and streaming pipelines leveraging big data, Hadoop ecosystem components, Databricks platform, AWS, and Azure Cloud Services.

Experience in building data solutions using SQL Server, MSBI, and Azure Cloud.

Experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics and Azure Analytical services, Azure Cosmos NO SQL DB, and Data bricks.

Experienced in AWS platform and its features including IAM, EC2, EBS, VPC, RDS, Cloud Watch, Cloud Trail, Cloud Formation AWS Configuration, Autoscaling, Cloud Front, S3, SQS, SNS, Lambda, and Route53.

Expertise working with EC2 instances ECS, EBeanstalk, lambda, Glue, RDS, DynamoDB, CloudFront, CloudFormation, S3, Athena, SNS, SQS, X-ray, Elastic load balancing (ELB), creating auto-scaling groups.

Hands-on experience in in-memory data processing with Apache Spark using Scala and python codes.

Experience in Big data Hadoop, Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, and HBase.

Experienced on Build, Deploying, and managing SSIS packages with SQL server management studio, creating SQL Server agent jobs, configuring jobs, configuring data sources, and scheduling packages through SQL server agent jobs.

Experienced with SSIS performance tuning on Control flow, Data flow, Error handling, and Event handler, re-running of failed SSIS packages.

Adept at utilizing BI tools such as Power BI and QlikView for enhancing reporting capabilities and developing BI applications by client requirements.

Demonstrated capability to liaise with key stakeholders for delivering compelling business value to senior leadership and clients.

Well versed in Data warehouse concepts normalization/de-normalization techniques for optimum performance in relational and dimensional database environments and building Referential Integrity Constraints.

Knowledge in Data Modeling and Data analysis in MS SQL Server and good knowledge in query optimization

Extensive experience in developing complex Stored Procedures, Functions, Triggers, Views, Cursors, Indexes, CTEs, Joins, and Subqueries with T-SQL.

Experienced in managing Azure Data Lakes (ADLs) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.

Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage

Experience in developing ETL data pipelines using PySpark

Hands on experience in working with configuration management tools like Chef, Puppet and Ansible.

Python & Other Experience in installing and setting up Hadoop Environment in the cloud through Amazon Web services (AWS) like EMR and EC2 which provide efficient processing of data.

Experience in working on apache Hadoop open-source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos.

TECHNICAL SKILLS:

Hadoop/Spark Ecosystem

Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow

Cloud Platforms

AWS: Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDB

Azure: Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure Data Factory (ADF), Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsight GCP, OpenStack.

ETL/BI Tools

Informatica, SSIS, Tableau, PowerBI, SSRS

CI/CD

Jenkins, Splunk, Ant, Maven, Gradle.

Ticketing Tools

JIRA, Service Now, Remedy

Database

Oracle, SQL Server, Cassandra, Teradata, PostgreSQL, Snowflake, HBase, MongoDB

Programming Languages

Scala, Hibernate, PL/SQL, R

Scripting

Python, Shell Scripting, JavaScript, jQuery, HTML, JSON, XML.

Web/Application server

Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

Version Control

Git, Subversion, Bitbucket, TFS.

Scripting Languages

Python, Scala, R, PL/SQL, Shell Scripting, GitBash

DevOps Tools

Jenkins, Docker, Kubernetes

IDE & Build Tools

IntelliJ, Eclipse, PyCharm, Maven, Gradle

Platforms

Windows, Linux (Ubuntu), Mac OS, CentOS (Cloudera)

PROFESSIONAL EXPERIENCE:

Centene Corp, St. Louis, Missouri May 2023 – Till Date

Sr. Data Engineer

Responsibilities:

Worked closely with stake holders to understand business requirements to design quality technical solutions that align with business and IT strategies and comply with the organization's architectural standards.

Developed multiple applications required for transforming data across multiple layers of Enterprise Analytics Platform and implement Big Data solutions to support distributed processing using Big Data technologies.

Responsible for data identification and extraction using third-party ETL and data-transformation tools or scripts. (e.g., SQL, Python)

Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).

Developed and managed Azure Data Factory pipelines that extracted data from various data sources, transformed it according to business rules, using python scripts that utilized PySpark and consumed APIs to move data into an Azure SQL database.

Created a new data quality check framework project in Python that utilized pandas.

Implemented source control and development environments for Azure Data Factory pipelines utilizing Azure Repos.

Created Hive/Spark external tables for each source table in the Data Lake and written Hive SQL and Spark SQL to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Designed and developed ETL & ETL frameworks using Azure Data Factory and Azure Data Bricks.

Created generic data bricks NOTEBOOKs for performing data cleansing.

Created Azure Data factory pipelines to refactor on-prem SSIS packages into Data factory pipelines.

Working with Azure BLOB and Data Lake storage for loading data into Azure SQL Synapse (DW).

Ingested and transformed source data using Azure Data flows and Azure HDInsight.

Created Azure Functions to ingest data at regular intervals.

Created Data Bricks notebooks for performing complex transformations and integrated them as activities in ADF pipelines.

Written complex SQL queries for data analysis and extraction of data in required format.

Created Power BI DataMart’s and reports for various stakeholders in the business.

Created CI/CD pipelines using Azure DevOps.

Enhanced the functionality of existing ADF pipeline by adding new logic to transform the data.

Used Chef Recipes to setup Continuous Delivery pipeline with Jenkins, Sonar Qube, Vagrant the infrastructure to run these packages and various supporting software components such as Maven.

Worked on Spark jobs for data preprocessing, validation, normalization, and transmission.

Optimized code and configurations for performance tuning of Spark jobs.

Integrated Ant and Maven with TFS source control to automatically trigger Builds and published results back to TFS.

Worked with unstructured and semi structured data sets to aggregate and build analytics on the data.

Work independently with business stakeholders with strong emphasis on influencing and collaboration.

Daily participation in Agile based Scrum team with tight deadlines.

Environment: Azure Synapse Analytics, Azure Data Factory, Azure Data bricks, Azure Synapse Studio, Hadoop, Chef, SQL Server, Power BI, Oracle 12c/11g, SQL scripting, PL/SQL, Python, Maven, Unix Shell, Jira, Confluence.

American Airlines, Dallas, TX Jan 2021 – Apr 2023

Sr. Data Engineer

This project mainly focuses on a single data warehouse to consolidate all the data sources that the functional groups use to compute and standardize KPIs. I used Azure Data Factory(V2) & PySpark Databricks as ETL tools to increase the speed of information access. I was involved in data warehouse implementations using Azure SQL Data warehouse, SQL Database, Azure Data Lake Storage (ADLS), Azure Data Factory v2.

Responsibilities:

Involved in creating specifications for ETL processes, finalized requirements, and prepared specification documents

Migrated data from on-premises SQL Database to Azure Synapse Analytics using Azure Data Factory, designed optimized database architecture

Created Azure Data Factory for copying data from Azure BLOB storage to SQL Server

Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks

Work with similar Microsoft on-prem data platforms, specifically SQL Server and SSIS, SSRS, and SSAS

Create Reusable ADF pipelines to call REST APIs and consume Kafka Events.

Used Control-M for scheduling DataStage jobs and used Logic Apps for scheduling ADF pipelines

Developing and configuring Build and Release (CI/CD) processes using Azure DevOps, along with managing application code using Azure GIT with required security standards for .Net and java applications.

Migrate the ETL logic, which was currently running in SSIS and MS Access, by Azure Pipeline in Azure data factory without any change in business logic

Developed high performant data ingestion pipelines from multiple sources using Azure Data Factory and Azure Databricks

Extensively worked on creating pipelines in Azure Cloud ADFv2 using different activities like Move &Transform, Copy, filter, for each, Data bricks, etc.

Develop dynamic Data Factory pipelines using parameters and trigger them as desired using events like file availability on Blob Storage, based on schedule, and via Logic Apps.

Writing SQL queries to help the ETL team with a system migration. DDL & DML SQL code to map and migrate data from source to destination new server in Azure DB.

Utilized Polybase, T-SQL queries to efficiently import huge amounts of data from Azure Data Lake Store to Azure SQL Data warehouse.

Created Azure Run book to Scale up and Scale down Azure Analysis Services and Azure SQL Data warehouse.

Upgrading Azure SQL Data warehouse Gen1 to Azure SQL Data warehouse Gen2

Designed, Developed Azure SQL Data warehouse Fact and Dimension tables. Used different distributions (Hash, Replicated, and Round-Robin) while creating Fact\Dim.

Develop Power BI and SSRS reports, Create SSAS Database Cubes to facilitate self-service BI.

Created Azure Data Factory Pipeline to load data from on-premises SQL Server to Azure Data Lake store.

Utilize Azure’s ETL, Azure Data Factory (ADF) services to ingest data from legacy disparate data stores to Azure Data Lake Storage

Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real-time and persists into Cassandra and redshift implementing massive data lake pipelines.

Environment: Azure Data Factory, Selenium ER Studio, Teradata13.1, Oracle, Python, Tableau, Hadoop, Spark, Scala, Hive, SQL Server SSIS, SSRS, and SSAS, Kafka, Redshift

Bank of America, Charlotte, NC Sept 2018 – Dec 2020

Sr. Data Engineer/Data Analyst

Bank of America is committed to continuous innovation to meet digital consumer needs. The company delivered new business and IT initiatives on mobile platform and providing digital app experience to small business banking and lending. The project was the digitization of business Advantage Term Loan and Advantage Credit Line using digital platform and design of data pipelines. I was responsible for creating Data pipelines and extraction of data from various databases to support Data Science research across different business verticals of the Data Intelligence team.

Responsibilities:

Actively participated in Requirement analysis, Designing Mapping Toolkit to build dimension and fact tables for data modelling.

Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.

Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.

Creating pipelines, and data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.

Worked on AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and DynamoDB.

Migrated on premise database structure to Confidential Redshift data warehouse

Extracted the data from Teradata and ingested into Azure Blob storage using Azure data factory (ADF).

Defined and deployed monitoring, metrics, and logging systems on AWS.

Connected to Amazon Redshift through Tableau to extract live data for real time analysis.

Implemented lambda functions to extract the data from an API and load them into Dynamo DB.

Configured step functions to orchestrate the multiple EMR tasks for data processing.

Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.

Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.

Converted Hive/SQL queries into Spark Transformations using Spark RDDs and Scala and involved in using SQOOP for importing and exporting data between RDBMS and HDFS.

Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.

Designed solutions to process high volume data stream ingestion, processing and low latency data provisioning using Hadoop Ecosystems Hive, Pig, Scoop, Kafka, Python, Spark, Scala, NoSQL, Nifi, and Druid.

Designed and implemented big data ingestion pipelines to ingest multi-TB data from various data source using Kafka, Spark streaming including data quality checks, transformation, and stored as efficient storage formats Performing data wrangling for a variety of downstream purposes such as analytics using PySpark.

Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.

Wrote Chef Cookbooks for various DB configurations to modularize and optimize product configuration, converting production support scripts to Chef Recipes and AWS server provisioning using Chef Recipes.

Implemented a Continuous Delivery pipeline with Docker, GitHub, and AWS.

Built performant, scalable ETL processes to load, cleanse and validate data.

Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).

Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler, and database engine tuning advisor to enhance performance.

Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.

Scheduled Airflow DAGs to run multiple Hive and Pig jobs, which independently run with time and data availability and Performed Exploratory Data Analysis and Data Visualizations using Python, and Tableau.

Environment: Hadoop/Bigdata Ecosystem (Spark, Kafka, Hive, HDFS, Sqoop, Oozie, Cassandra, MongoDB), AWS (S3, AWS Glue, Redshift, RDS, Lambda, Athena, SNS, SQS), Chef, Oracle, Docker, Git, SQL Server, Python 3.x, PySpark, Teradata, Tableau, Quick sight, Data warehousing.

AIG, NY Oct 2016 – Aug 2018

Data Engineer

Responsibilities:

Worked on Building and implementing a real-time streaming ETL pipeline using Kafka Streams API.

Worked on Hive to implement Web Interfacing and stored the data in Hive tables.

Migrated Map Reduce programs into Spark transformations using Spark and Scala.

Working with open-source Apache Distribution then Hadoop admins must manually set up all the configurations- Core-Site, HDFS-Site, YARN-Site, and Map Red-Site. However, when working with popular Hadoop distributions like Hortonworks, Cloudera, or MapR the configuration files are set up on startup and the Hadoop admin need not configure them manually.

Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing

Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.

Implemented Hive Partitioning and Bucketing on the collected data in HDFS.

Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.

Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.

Deploy and monitor scalable infrastructure on Amazon web services (AWS) and configuration management instances and Managed servers on the Amazon Web Services (AWS) platform using Ansible configuration management tools and Created instances in AWS as well as migrated data to AWS from the Data Center.

Building CI/CD pipelines using Jenkins for deployments for End-to-End automation to support all builds and deployment as a pipeline.

Created job chains with Jenkins Job Builder, Parameterized Triggers, and target host deployments. Utilized many Jenkins plugins and Jenkins API.

Backing up AWS PostGRE to S3 on daily job run on EMR using Data Frames

Developed Spark scripts using Scala shell commands as per the business requirement.

Worked on Cloudera distribution and deployed on AWS EC2 Instances.

Well-versed in using Data Manipulations, Compactions, in Cassandra.

Automated Weekly releases with GRADLE scripting for Compiling Java Code, Debugging and Placing Builds into Maven Repository.

Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.

Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).

Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.

Environment: Hadoop, Map Reduce, Hive, Spark, Oracle, GitHub, Tableau, UNIX, Cloudera, Kafka, Sqoop, Scala, NIFI, HBase, AWS, Amazon EC2, S3, Maven, Cassandra cluster

Vedic Soft Solutions, PVT LTD, India Dec 2013 – Aug 2016

Hadoop Developer

Responsibilities:

Convert raw data with sequence data format, such as Avro and Parquet to reduce data processing time and increase data transferring efficiency through the network.

Worked on building end-to-end data pipelines on Hadoop Data Platforms.

Worked on Normalization and De-normalization techniques for optimum performance in relational and dimensional databases environments.

Created files and tuned the SQL queries in Hive Utilizing HUE. Implemented MapReduce jobs in Hive by querying the available data.

Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDDs.

Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.

Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.

Created User Defined Functions (UDF), User Defined Aggregates (UDA) Functions in Pig and Hive.

Worked on building custom ETL workflows using Spark/Hive to perform data cleaning and mapping.

Implemented Kafka Custom encoders for custom input format to load data into Kafka portions.

Support for the cluster, topics on the Kafka manager. Cloud formation scripting, security, and resource automation.

Environment: Python, HDFS, MapReduce, Flume, Kafka, Zookeeper, Pig, Hive, HQL, HBase, Spark, Kafka, ETL, Web Services, Linux RedHat, AXIS 1.2, UML, SOA, JAX-WS, Sun Metro stack, RESTful, SOAP UI, Log4J.

EDUCATION:

Bachelors of Technology at JNTUK, 2013.

Contact this candidate