Post Job Free

Resume

Sign in

Data Engineer Big

Location:
Tampa, FL
Posted:
December 21, 2023

Contact this candidate

Resume:

Sathish Mani Email: ad15de@r.postjobfree.com

Senior Data Engineer Phone: 732-***-****

Summary

** ***** ** ********** **** full development lifecycle from inception through implementation leveraging legacy applications and various emerging enterprise frameworks.

Extensive experience in architecting and engineering Data Warehouses, Data Lake, ODS and OLTP data platforms and processing layers on cloud and on-prem platforms.

8+ years of designing and implementing Hadoop technologies including Spark, HDFS, MapReduce, Hive, Sqoop, and python.

5+ years leveraging big data consumption tools such as Impala, Hive, or similar query engines.

5+ years of experience with Big Data Technologies (Hadoop, Hive, Hbase, Pig, Spark, etc.) and adoption of these frameworks with AWS Services

Guide the client and management about what tools and technologies should be applied in a particular scenario for best case utilization and cost optimization.

Experience building/operating highly available, distributed systems of extraction, ingestion, and processing of large set.

Understand the requirements, assess client capabilities and analyze findings to provide appropriate cloud recommendations and adoption strategy.

Expertise in Data Engineering on Big Data Technologies both on in-premise and cloud Platforms, specifically AWS.

Experience in working on AWS ecosystem as a whole and should have experience in migrating ETL pipelines from Talend /Ab Initio to Hadoop / Glue / Spark.

Experience of developing enterprise grade ETL/ELT data pipelines with deep understanding of data manipulation/wrangling techniques.

Used Spark for RDD & Data frame transformations, event joins, filter traffic and some pre-aggregations before storing the data onto HDFS & HIVE.

Involved in creating Hive tables, loading and analyzing data using Hive queries; involved in implementing data visualization using Hive, writing Hive queries to test the business scenarios.

Establish strong relationships with the global business stakeholders and ensure transparency of project deliveries

Ensure and Improve Stability and scalability within the platforms. Mentoring and knowledge building within the group

Professional Skills

• Programming : Spark, Python, Spark SQL, Core Java

Database : Apache Hive, HBASE, SQL Server 2005, Redshift, DynamoDB

Cloud Services : EC2, EMR, RDS, S3, Cloud Watch, IAM, Kinesis

Frameworks : Hadoop, HDFS, EMR

Data Loading tools: SQOOP, FLUME, SPARK

IDE’s : Anaconda Editor, R-Studio

Analytics : Arcadia, Trifacta, Waterline, QlikSense, Paxata, Athena Education: Bachelor of Technology in Information Technology Experience

Citibank, Florida July 2019 – Till Date

Senior Data Engineer

Project : Enterprise Data Analytics

Description:

Data Strategy team builds a next generation Data Fabric to solve the evolving Business, Analytical and Regulatory needs of Marketing Clients Group. The Fabric will bring together disparate data sources for the Global Markets organization and enable industry-leading analytics, client reporting, regulatory, surveillance, supervisory reporting(s), and data science solutions. It will also provide enhanced data quality controls, completeness and accuracy, reconciliations, entitlements, performance, management of data retention and archival per regulatory guidelines. Data program involves centralizing the Risk data across Markets and have one golden source of data for all Risk reporting and analytics. Responsibilities:

• Design and Development experience in distributed platforms using Hadoop, PySpark and/or Python, EMR, S3, EC2, RDS, S3, Cloud Watch, IAM, Lambda, SQS, SNS, Redshift.

• Design and develop data pipeline architectures from on-premises to AWS Cloud using DMS, S3, Glue, Spark, Snowflake and related AWS Services.

• Competent to design and develop architectures for Data Migration, Data Ingestion, Data Storage, Build Data Lakes, creating various layers in Data Lake, ETL using Hadoop tools like Spark, and AWS tools like Glue and EMR.

• Well versed with Performance tuning of ETL pipelines including spark and cloud databases.

• Experience publishing Python REST API s to enable data consumption for Analytic tools.

• Spark/Python code development to perform analytics in financial data and resulted in huge savings for the company and portfolio.

• Define S3 data retention, object tagging & and data archival strategy in collaboration with the application business requirements.

• Architect & develop the overall Redshift infrastructure, including cluster sizing, partition strategies, data distribution and schema design.

• Python code development for various formats of dynamic S3 data ingestion into Snowflake and Redshift.

• Responsible for building scalable distributed data solutions using Hadoop & AWS Services

• Involved on end-to-end Data Modeling project using Hive DB's, analytical tools & wrangling tools

• Expertise in handling structured, semi-structured (json) and unstructured data, data schema drift and understanding various big data file formats like AVRO, Parquet, ORC

• Pyspark code development for variety of business rules transformation logic using data frames and ingest into Hive.

• Analyzed the data by performing Hive/Athena queries and executed Data models to know the insights of existing and new customers.

• Analyze ETL Processing capabilities and choose AWS EMR Vs Glue or both platforms in parallel based on project infrastructure.

• Architect & develop Event Driven Architectures to process S3 objects evens using Lambda.

• Experience with AWS workflow orchestration and AWS monitoring services like CloudWatch

• Experience implementing metadata solutions leveraging AWS relational data solutions such as Glue Catalog for choosing the persistent meta store.

• Experience in proposing AWS Managed Vs Serverless service

• Develop applications in AWS - data and analytics technologies including but not limited to Glue, EMR, Lambda, Airflow, CloudTrail, CloudWatch, SNS, SQS, S3, VPC, EC2, RDS, IAM. Environment: EC2, EMR, S3, Redshift, Snowflake, Cloud Watch, Cloud Monitor, IAM, Lambda, Python, R, REST API, Pyspark, Spark-SQL, MySQL, Hive, Linux, HDFS, Sqoop, Oracle, NoSQL, Apache Airflow, Analytical Tools, Athena, SQS, SNS, EMR, Glue

SunTrust Bank, Georgia Dec 2016 – June 2019

Spark Technology Lead

Project : Marketing Data Analytics

Description:

Marketing Data Analytics uses big data to bring various types of PII information, cin and account holders interactions data from client server to Lake Server for business needs. Huge data loading operations on daily & monthly basis from different type of marketing data mart, DB2 and flat files. All these loads performed as per the client requirements and ingested in native support Hadoop formats. High end analytical tools developed on top of the lake server for users to prepare data model. Responsibilities:

• Good understanding and usage of Hadoop Architecture frameworks and various components such as HDFS, Hive, SPARK SQL, Sqoop, Scala Functions, RDD transformation and map Reduce concepts.

• Data Imports from MySQL and Oracle to HDFS/Hive using SQOOP.

• Experience in writing queries for moving data from HDFS to Hive and analyze data using Hive-QL.

• Create Static and Dynamic Partitions as per data model requirement and perform Hive Optimization.

• Python based REST API's developed and analytics performed on real time data.

• Pyspark code development for variety of data transformation using data frames and ingest into Hive.

• Responsible for building scalable distributed data solutions using Hadoop.

• Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into

HDFS. • Involved on end-to-end

Data Modeling project using Hive DB's, analytical tools & wrangling tools

• Analyzed the data by performing Hive queries and executed Data models to know the insights of existing customers and potential new buyers

• Experience on Agile based methodologies and project environment.

• Involved in end-to-end development of new Data Lake projects and handled multiple Big-Data implementations in Agile based delivery projects.

Environment: Spark/Scala, Python, Spark-sql, Hive, Hbase, Linux, Kafka, Map Reduce, HDFS, Sqoop, DB2, Analytical & Wrangling Tools.

Express Scripts, USA, New Jersey Mar 2014 – Dec 2016 HADOOP Developer

Project : Point of Sale System (POS)

Description:

Point of Sale System uses big data to guide successful customer interactions every day. Targeted messaging helps to deliver the right messages top right customers at right time, building stronger customer relationships. It took into account of around 5 million customer details and 100’s of millions of transactions details. Some of the information would be demographics, transactions, web login counts, call counts, call center notes, email / chat transcripts, household data, life events, etc. We used to receive 50 GB – 500 GB of data per day which is expected to increase up to 1TB per day. So we started using big data to store the huge amount of data and perform analysis. The analysis job which used to take more response time were completed in less than few minutes. Based on analysis, 1) we anticipate customer needs and deliver those needs to customers. Some customers were not utilizing the in-plan features. 2) So generated a report on those in-plan features and communicated to customers. We also had send targeted messages about other products that would be of interest to customers. We also derive new strategies that are needed to engage, satisfy and retain existing customers from data analysis. Some of the product that are offered are pharmacy solutions, insurance solutions, and claims benefit management. Responsibilities:

• Experience in handling structured and unstructured/semi structured data (flat file, json, XML, binary files)

• Good understanding and usage of Hadoop Architecture frameworks and various components such as HDFS, Pig, Hive, SQL, Sqoop, Python Scripting, UDF development and map Reduce concepts

• Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

• Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

• Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.

• Responsible for building scalable distributed data solutions using Hadoop.

• Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.

• Analyzed the data by performing Hive queries and running Pig scripts to know the insights of existing customers and potential new buyers

• Developed Hive queries to process the data and generate reports. Environment: Java 6, Eclipse, Hadoop, Hive, Hbase, Linux, Pig, Map Reduce, HDFS, Shell Scripting, MySQL

Xerox USA, Phoenix, AZ Feb 2011 – Dec 2013

JAVA/J2EE Analyst

Project : Corporate Apps

Description: The project’s mission is to consistently deliver outstanding operating results, and elevate the value of our corporate functions to our business through increased capability and technology innovation. Increased the value by creating lot of web pages like voting/survey websites, healthcare member support websites.

Responsibilities:

• Responsible for Business Analyst activities for critical functionality for the business interfacing projects

• Responsible for creating Business rules from the code as part of reverse engineering activities.

• Provides QA support for the Application using Frameworks / Languages / Tools like JMS, Spring3.0

• As application developer I was actively involved in designing of various Business Layer and Data Management components of this web based system

• Have been involved in the design and key component of the system using PL/SQL procedure on Oracle DB

• Built packages and procedures for designing business rules and Unix shell scripts EOD processes for the applications on the database side for the application

• Providing Quick turnaround for ADHOC Service request and reports and Supporting Liaison Team and Client Services for all customer requests being an SME of the application

• Experience developing for Unix/Linux based systems

• Development of Tools and Value adds to assist Performance Testing. Environment: Java 5, Spring, JavaScript, Oracle 10g, HTML, Autosys, Windows XP, JSP, JMS Project : WGS Pricing System, Thousand Oaks, CA May 2009 – Jan 2011 Mainframe Developer

Client : WellPoint, CA

Description:

Pricing IT is responsible for maintaining the WGS/STAR Code Databases. The Code Databases house: CPT/ HCPCS Procedure, Modifier, Diagnosis, Revenue, ICD9 Procedure and ZIP Code data. While all the data within the Code Databases is owned by Provider Reimbursement, Pricing IT is responsible for the databases structure and programming that supports the functionality of the databases. Responsibilities:

• Gathering the requirements from the various Users/Business. Coordinate the preparation of Test Strategy document and review the same. Coordinate and perform the development activities - Coding and Unit Testing.

• Imparting domain and technical knowledge to the new entrants and also fellow team members.

• Also responsible to perform the analysis for various enhancements, perform impact analysis to find out the systems/programs that could be potentially affected by proposed change(s), coding, and testing and implementation activities.

• Effective offshore coordination to complete the development efforts as per design and review their set of deliverables to meet quality standards and clients expectation.

• Handled a team size of 6 people as an onsite client lead.

• As part of rollout type of works, the responsibilities include gathering the requirements from the clients, performing analysis on the business requirements and development of high level and detailed system design, development activities by involving offshore team, unit testing, system testing, implement the system in to the production environment and providing warranty support.

• Preparation of the following various technical and functional documents (where applicable depending on the request types) Requirement Specification document, High Level Analysis and Approach document, High Level Design Document, Detailed Design Document, System test plan and test procedure document and implementation plans.

• Reviewing the set of deliverables for completeness and correctness to ensure that the business objective is met. Performing rigorous testing. Setting up test environment for User Acceptance Testing.

Environment: COBOL, JCL, IMS DB, DB2, Mainframe OS Project : Consolidated Loan Application System Jan 2007 – Apr 2009 Mainframe Developer

Client : SallieMae Bank Inc

Description:

CLASS is an acronym for the “Consolidated Loan Administration and Servicing System”. CLASS is a mainframe system developed by Sallie Mae for the servicing of student loan accounts. The CLASS system was created in response to the need for an automated computer system for processing, servicing, and maintaining student loans at Sallie Mae Servicing Corporation. There are several characteristics that make the CLASS system beneficial to the loan servicing environment. Responsibilities:

• Involved in Design Review sessions such as Business System Design (BSD), Technical System Design (TSD) Reviews with Business Stakeholders, IT Leads, Business Analyst and Test Leads/Managers.

• Created Business Spec documents, Technical Spec documents and Development Test plan

• Conducting Internal Reviews with Impacted teams and approved the test Documents.

• Coordinate with Testing Team for any defects and issues found in testing.

• Preparing Development test plan approach.

• Conduct Meetings with Client Business Leads for weekly status, test approach, defect analysis, Risk analysis and prepare MOM document.

• Ensure that the project documentation is maintained as per Project Life Cycle and all documents are version controlled and maintained for client review and audits.

• Created JCL for executing the batch programs. Used tool to create JCL. Have good experience with DB2 database. Proficient in SQL queries to retrieve the data from DB2 database

• Created multiple online screens using CICS.

• Conducted unit testing for the programs that are developed. Made sure that all the items in test plan get executed as expected.

• Coordinated with Offshore for requirements clarification, schedule requirements

• Received multiple client appreciation for the excellent offshore coordination and timely completion of the project activities.

Environment: IBM B2 V9.1, COBOL, JCL, VSAM, Mainframe OS Project : Communication Utility System Jun 2004 – Dec 2006 Mainframe Developer

Client : American Express

Description:

American Express uses the Communication Utility (CU) as the point of arrival system of record for Client communications. Communication Utility generates the Emails / Letters that are to be sent to the customers. For instance an update to the card member or a welcome mail to the new customer. Xpression and Autograph are tools which can store the document templates, create customer specific documents based on data, format them as desired and publish it by sending those to printer, mail, archival system, fax or web-site.

Responsibilities:

• Getting the user requirements by interacting with the various interface teams

• Creating detailed level design and specifications for the new development and obtaining sign off from all the stakeholders.

• Code analysis, modification and Coding

• System, Regression and Unit Testing with the Test Plan preparation

• Co-ordination with the onsite development team to complete the development efforts as per the design, solving problems and issues faced by them and handling the development of the critical functionality.

• Installing the enhancements in production environment

• Performing rigorous testing. Setting up test environment for User Acceptance Testing Environment: IBM B2 V9.1, COBOL, JCL, VSAM, DB2 Mainframe OS



Contact this candidate