Informatica Developer Data Engineer

Location:

Houston, TX

Posted:

March 29, 2023

Contact this candidate

Resume:

Mounish

*******.***@*****.***

Mobile: +1-512-***-****

Professional Summary:

Around 7+ Years of experience in IT, specializing in various technical aspects of Big Data, Data Warehousing, Snowflake and AWS Cloud.

Having good knowledge in migrating the data using AWS DMS and SCT tools from on-perm databases to Cloud databases.

Having good knowledge in AWS services like S3, RDS, REDSHIFT, ATHENA, GLUE and EMR.

Good experience on Hive, Sqoop, Hue, etc. on AWS cloud platform using AWS EMR cluster.

Good knowledge on AWS Glue which is fully managed for Extract, transform and Load (ETL).

Good knowledge on Crawler which populates the AWS Glue Data Catalogue with tables and Glue Data Brew which enables users to clean and normalize data without writing any code.

Good knowledge on AWS Athena which is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

Experienced in working with Amazon Web Services (AWS) making use of EC2 for computing and S3 as a storage.

3+ years’ experience with Snowflake Multi-Cluster Warehouses.

Hands-on experience with Snowflake utilities, SnowSQL, Snowpipe, Bigdata model techniques using python language.

Developed ETL pipelines utilizing a combination of Python and Snowflake, as well as SnowSQL to write SQL queries against Snowflake for data warehousing.

3+ years’ experience with Hadoop projects.

Very strong knowledge in delivering end-to-end solutions using various big data technologies such as HIVE, PIG, SQOOP, HDFS, Data-X, TT-Synchronization, and more.

Good Knowledge in AWS utilities like EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.

Having 4+ years in Informatica PowerCenter 8.6, 9.1, 9.5, 9.6 10.1 and 10.2 and developed several pipelines to extract the data from different sources, transform and load the data into multiple targets.

Engaged with stacking the structured and semi structured data into spark clusters utilizing Spark SQL and DataFrames.

Experienced working with Spark Streaming, Spark SQL, and Kafka for real-time data processing.

Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames.

Worked with CSV/TXT/AVRO/PARQUET files using Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS.

Good Knowledge on parts of Spark, and proficient in working with Spark Core, Spark SQL, Spark streaming and building PySpark applications for interactive examination, batch processing and stream handling.

Experience in importing and exporting data using Sqoop from HDFS/Hive to Relational Database Systems and vice - versa.

Having experience in building ETL jobs using Jupyter notebooks with Pyspark. Used Pyspark and Spark-SQL and created a Spark application and applied transformations according to business rules.

Experience in working with Oracle using SQL, PL/SQL.

Executed complex HiveQL queries to extract the necessary data from Hive tables and created Hive User Defined Functions (Udf's) as required.

Good knowledge on TERADATA BTEQ and FLOAD scripts.

Expertise in using Version Control systems like Bitbucket.

Technical Summary

Languages

Python, Pyspark, Java, Scala.

Big data

HDFS, Hive, Pig, Sqoop, Spark, Kafka.

Cloud

AWS services like Redshift, EMR, Glue, Athena, S3, Lambda.

RDBMS

Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL.

NO SQL

MongoDB, Cassandra.

ETL

Informatica PowerCenter.

Reporting tools

Crystals reports, FBI.

Scheduling

ESP (Mainframe), Build forge and Informatica schedular.

Code Build/Deployment tool

GitHub, Terraform, Jules.

Development/Build tools

Eclipse, NetBeans, VS Code, PyCharm, Jupyter Notebook, Anaconda, IntelliJ.

Work Experience

Client : American Equity May 2022– Present

Role : Data Engineer

Responsibilities:

Created end to end data pipelines by utilizing AWS services such as EMR, Redshift, Athena, and S3, as well as coding languages like PySpark and Python.

Construct the pipelines using Glue and PySpark according to the technical specifications.

Utilized Python and PySpark to construct ETL pipelines for loading data into data warehousing tables.

The RDDs and data frames undergo various transformations and actions and are stored in HDFS as parquet Files and in HBase for auto generating CRs.

Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.

Developed functions and allocated roles in Amazon Lambda to run Python scripts and event-driven processing in AWS Lambda using Java.

Create, modify, and execute DDL in table AWS Redshift and snowflake tables to load data.

Implemented schema extraction for Parquet and Avro file formats in Hive.

Extract the data from multiple sources and maintain the data in data lake with iceberg format.

Handled Redshift database operations for data extraction from AWS services such as Glue, S3, and EMR.

Improved the performance of existing jobs that were taking a long time and affecting the cloud resources.

Created documentation and shared multiple best practices and coding standards with various departments to enhance their knowledge on the subject matter.

Worked on structured, unstructured, and semi-structured data from a variety of sources to find patterns in the data and Apply data quality measures by utilizing the appropriate Python scripts, depending on the source.

Created Bash scripts to add dynamic partitions to Hive staging tables. Responsible for loading bulk amount of data into HBase using MapReduce jobs.

Environment: AWS Cloud, EMR, snowflake, Redshift, Athena, Lambda, RDBMS, HDFS, PySpark, Map Reduce, Hive, Sqoop, HBase, Spark SQL, Avro, Parquet.

Client : United Services Automobile Association (USAA) August 2019 – December 2021

Role : Data Engineer

Responsibilities:

Developed and implemented data pipelines as per the technical specifications.

Imported data for AWS S3 and into Spark Data Frame and apply the cleansing rules to process the data and pre-process in to postgres database.

Worked on Snow SQL and Snowpipe.

Created snowpipe for continuous jobs.

Redesigned views in snowflake to increase the performance.

Define virtual warehouse sizing for Snowflake for different types of workloads.

Develop stored procedures/views in Snowflake to load dimensions and facts.

Developed Mappings using various Transformations like Lookup, XML Generator, SQL, Expression, Filter, and Router transformation.

Involved in performance tuning in source, target, transformation, and session level.

Created several FLOAD and MLOAD scripts to process the data using Teradata utility.

Redesigned several Teradata scripts to improve the performance of the job.

Dealing with AWS SNS, which subscribes to AWS Lambda and SNS alert when the data reaches the Lake.

Construct the pipelines using Glue and Pyspark according to the technical specifications.

Developed several mappings by using several transformations like SQ, EXP, AGG, LOOKUP, JOINER, UNION, NORMALIZER etc.

Implemented SCD-1 and SCD-2 logic to load the data in to dimension and fact tables.

Designed and developed different dimension and fact tables as per the business requirement.

Implemented informatica partition techniques to improve the performance of the mappings to run in multiple threads.

Worked on different reusable transformations.

Worked on Mapplets, Worklets and different tasks in Informatica.

Created unit test case documents and performed unit testing and resolved issues.

Environment: S3, Postgres, SNS, Lambda, Snowflake, Informatica PowerCenter, AWS Glue, SQS, Teradata, PySpark, Snow SQL.

Client : Anthem Inc, Connecticut July 2017 – August 2019

Role : Bigdata Developer

Responsibilities:

Created an end-to-end ETL data pipeline that uses Spark to load data from Surge into RDBMSs.

Importing and exporting data into HDFS and Hive using Sqoop and flume to extract from multiple resources.

Supported Hive and SQOOP batch jobs which extracts the data from different sources and loads in to dimension and fact tables.

Created several ETL pipelines using Informatica tool.

Involved in design and development of Snowflake Database components like stages, snow pipe, stream, and task.

Worked on external stages, snow pipe and external tables.

Worked on streams and tasks in snowflake.

Involved in tuning SQL queries in snowflake.

Experience in working with AWS S3 and Snowflake.

Hands-on experience in bulk loading and unloading data into Snowflake tables using COPY command.

Developed and implemented Hive custom UDFs involving date functions.

Experience in importing and exporting data into HDFS and Hive using Sqoop.

Created multiple Hive tables, implemented partitioning, dynamic partitioning, and buckets in Hive for efficient data access.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Writing HIVE queries and functions for evaluation, filtering, loading, and storing of data.

Load data into Spark RDD from a variety of sources, including HDFS/HBase, then use PySpark to do computations to provide the output response.

Performed ETL operations using Python and Pyspark to import substantial volumes of data from various sources (Amazon S3, Parquet, API).

Environment: Snowflake, snowpipe, Informatica PowerCenter, S3, SQS, Sqoop, RDBMS, HDFS, Hive, HBase, PySpark, Parquet.

Client : PayPal Hyderabad June 2015 – July 2017

Role : Informatica developer

Responsibilities:

As per the S2T developed several mappings by using several transformations like SQ, EXP, AGG, LOOKUP, JOINER, UNION, NORMALIZER etc.

Designed and developed different dimension and fact tables as per the business requirement.

Implemented informatica partition techniques to improve the performance of the mappings to run in multiple threads.

Worked on different reusable transformations.

Worked on Mapplets, Worklets and different tasks in Informatica.

Implemented different lookup caches to improve the mapping performance.

Mapped data between source systems and warehouses.

Prepared functional and technical documentation data for warehouses.

Created several ESPs as per the requirement and scheduled the jobs accordingly.

Worked on several QC defect items and deliver the code as per the timelines.

Involved in V8-V9 migration and prepared the documents as per the migration template.

Created unit test case documents and performed unit testing and resolved issues.

Support the UAT and SIT and defect fixing.

Involved in improve the performance of long running jobs.

Environment: Informatica, Oracle, ESP, DB2, Linux, Buildforge.

Contact this candidate