Post Job Free
Sign in

Data Engineer Real Time

Location:
United States
Posted:
July 27, 2023

Contact this candidate

Resume:

N RAMYA

Sr.Data Engineer & ETL Developer PH :+1-512-***-**** Email: *********@*****.***

Certified professional data engineer with overall IT experience more than 9+ years in different domain clients and technology spectrums. Experienced in working in highly scalable and large scale applications building with different technologies using Cloud, BigData, DevOps and ETL. Also, expert in working in different working environments like Agile and Waterfall. Experience in working multi cloud, migration and scalable application projects.

Professional Summary

BigData

Experience in working various hadoop distributions like Cloudera, Hortonworks and MapR.

Expert in ingesting batch data for incremental loads from various RBMS tools using Apache Sqoop.

Developed scalable applications for real-time ingestions into various databases using Apache Kafka.

Developed Pig Latin scripts and MapReduce jobs for large data transformations and Loads.

Experience in using optimized data formats like ORC, Parquet and Avro.

Experience in building optimized ETL data pipelines using Apache Hive and Spark.

Implemented various optimizing techniques in Hive scripts for data crunching and transformations.

Experience in building ETL scripts in Impala for faster access for reporting layer.

Built spark data pipelines with various optimization techniques using python and scala

Experience in loading transactional and delta loads in to NoSQL databases like HBase.

Developed various automation flows using Apache Oozie, Azkaban and Airflow.

Experience in working with NoSQL Databases like HBase, Cassandra and MongoDB.

Experience in various integration tools like Talend, NiFi for ingesting batch and streaming data.

Cloud

Experience in working with various cloud distributions like AWS, Azure and GCP.

Developed various ETL applications using Databricks Spark distributions and Notebooks.

Implemented streaming applications to consume data from Event Hub and Pub/Sub.

Developed various scalable bigdata applications in Azure HDInsight’s for ETL services .

Developed scalable applications using AWS tools like Redshift, DynamoDB.

Worked on building pipelines using snowflake for extensive data aggregations.

Working knowledge on GCP tools like BigQuery, Pub/Sub, Cloud SQL and Cloud functions.

Experience in visualizing reporting data using tools like PowerBi, Google analytics.

ETL

Worked with Informatica business team on Prototype for Informatica Cloud Application Integration.

Worked on scheduling job dependent MFT jobs in Autosys.

Design and automation of secured file transfer, using Managed File Transfer (MFT).

Worked on Informatica Life Cycle Management (ILM) jobs and supporting ILM process.

Experience with Data Profiling/Data Quality using Informatica Developer, BDM and MDM toolset

Experienced in writing Unix Shell Scripts for the automation of ETL processes and using schedulers like Autosys, Control-M.

DevOps

Experience in building continuous integration and deployments using Jenkins, Drone, Travis CI.

Expert in building containerized apps using tools like Docker, Kubernetes and terraform.

Developed reusable application libraries using docker containers .

Experience in building metrics dashboards and alerts using Grafana and Kibana.

Expert in java and scala built tools like Maven, Pom and SBT for application development.

Experience in working with tools like GitHub, GitLab and SVN for code repository.

Expert in writing various YAML scripts for automation purpose.

Experience Summary

Client

Etrade Financial Services

Location

Atlanta, Georgia

Designation

Sr.Data Engineer

Duration

Nov 2021 – Present

Responsibilities

Developed code for importing and exporting data from DB2 into s3 buckets and vice versa.

Handled ETL Framework in Spark with python for data transformations.

Implemented various optimization techniques for Spark applications for improving performance.

Involved in Spark streaming for real-time computations to process JSON files from Kafka.

Integrate AWS Glue job with supporting AWS services including DynamoDB, SQS.

Develop SQL logic in various flavors in SparkSQL, Snowflake, DB2, Oracle.

Developed various scripting functionality using shell, Bash and Python for various operations.

Pushed application logs and data streams logs to GEM server for monitoring and alerting purpose.

Developed Jenkins pipelines for continuous integration and deployment purpose.

Implemented integrations to various cloud environments like AWS, Azure and GCP for external vendor integrations for files exchange systems.

Implemented secure transfer routes for external clients using microservices to integrate external storage locations like AWS S3.

Developed automated file transfer mechanism using python from MFT, SFTP to HDFS.

Gathered requirements from the business users for Genome project and translate them into technical specifications for new features or enhancements.

Developed detailed user stories based on the requirements and worked closely with business users.

Developed implementation plans to streamline processes to reduce costs.

Designed and developed Informatica cloud mappings and Sessions based on business user requirements and business rules to load data from sources such as source flat files and snowflake tables.

Developed informatica cloud mappings to load data from NAS location to snowflake internal stage.

Worked on various transformations like Expression, Aggregator, SQL, Lookup, Filter, Router, Rank.

Environment: Apache Hadoop 2.0, Kafka, Spark, Linux, MySQL, Oozie, SFTP, GEM, DB2, Oracle, Snowflake, IICS

Client

Toyota Financial Services

Location

Dallas,Texas

Designation

Sr.Data Engineer

Duration

Feb 2020 – Oct 2021

Responsibilities

Developed code for importing and exporting data from RBMS into HDFS using Sqoop and vice versa.

Implemented Partitions, Buckets based on State to further process using Bucket based Hive joins.

Developed custom UDF’s in Java as and used whenever necessary to reduce code in Hive queries.

Handled ETL Framework in Spark with python and scala for data transformations.

Implemented various optimization techniques for Spark applications for improving performance.

Involved in Spark streaming for real-time computations to process JSON files from Kafka.

Developed API’s for quick real-time lookup on top of HBase tables for transactional data.

Built optimized dynamic schema tables using AVRO and columnar tables using parquet.

Built various oozie actions, workflows and coordinators for automation purpose.

Developed various scripting functionality using shell, Bash and Python for various operations.

Pushed application logs and data streams logs to Grafana server for monitoring and alerting purpose.

Developed Jenkins and Drone pipelines for continuous integration and deployment purpose.

Worked on building various pipelines and integration using NiFi for ingestion and exports.

Built custom end points and libraries in NiFi for ingesting data from traditional legacy systems.

Implemented integrations to various cloud environments like AWS, Azure and GCP for external vendor integrations for files exchange systems.

Implemented secure transfer routes for external clients using microservices to integrated external storage locations like AWS S3 and Google Storage Buckets(GCS).

Built SFTP integrations using various VMWare solutions for external vendors on boarding.

Developed automated file transfer mechanism using python from MFT, SFTP to HDFS.

Environment: Apache Hadoop 2.0, Cloudera, HDFS, MapReduce, Hive, Impala, HBase, Sqoop, Kafka, Spark, Linux, MySQL, NiFi, Oozie, SFTP

Client

Charter Communications

Location

Hartford, Connecticut

Designation

Sr.Data Engineer

Duration

Oct 2018 – Feb 2020

Responsibilities

Developed Hive ETL Logic for data cleansing and transformation of data coming through RBMS.

Implemented complex data types in hive also used multiple data formats like ORC, Parquet.

Worked in different parts of data lake implementations and maintenance for ETL processing.

Developed Spark Streaming application using Scala and python for processing data from Kafka.

Implemented various optimization techniques in spark streaming with python applications.

Imported batch data using Sqoop to load data from MySQL to HDFS on regular intervals.

Extracted data from various APIs, data cleansing and processing by using Java and Scala

Converted Hive queries into Spark SQL that integrate Spark environment for optimized runs..

Developed a migration data pipelines from HDFS on prem cluster to Azure HD Insights.

Developed Complex queries and ETL process in Jupyter notebooks using data bricks spark.

Developed different modules in microservices to collect stats of application for visualization.

Worked on docker and Kubernetes for deploying application and make it containerize.

Implemented NiFi pipelines to export data from HDFS to cloud locations like AWS and Azure .

Ingested data from Azure Event Hub for Realtime data ingestion into various applications.

Experience designing solutions in Azure tools like Azure Data Factory, Azure Data Lake, Azure SQL & Azure SQL Data Warehouse, Azure Functions.

Implemented DataLake migration from on prem clusters to Azure for highly scalable solutions.

Worked on implementing various airflow automations for building integrations between clusters.

Environment: Hive, Sqoop, Linux, Cloudera CDH 5, Scala, Kafka, HBase, Avro, Spark, Zookeeper and MySQL, Azure, Databricks, Scala, Python, airflow .

Client

Amadeus

Location

Portsmouth, Nh

Designation

Sr.Data Engineer

Duration

Aug 2016 – Sep 2018

Responsibilities

Worked on building and developing ETL pipelines using Spark-based applications.

Worked in migration of RDMS data into Data Lake applications.

Build optimized hive and spark jobs for data cleansing and transformations.

Developed spark scala applications in an optimized way to complete in time.

Worked on various optimizations techniques in Hive for data transformations and loading.

Expert in working with dynamic data schema evolutions like Avro formats.

Built API on top of HBase data to expose for external teams for quick lookups.

Experience in building impala script for quick retrieval of data to expose through tableau.

Experience in developing various oozie actions for automation purpose.

Developed a monitoring platform for our jobs in Kibana and Grafana.

Developed real-time log aggregations on Kibana for analyzing data.

Worked in developed Ni-Fi pipelines for extracting data from external sources.

Developed Jenkins pipelines for data pipeline deployments.

Worked on building different modules in spring boot scalable applications.

Developed Docker container for automating run time environments for various applications.

Expert in building ingestion pipelines for reading real time data from Kafka.

Worked in Poc for setup Talend environments and custom libraries for different pipelines.

Developed various python and shell scripting for various operations.

Worked in Agile environment with various teams and projects in fast phase environments.

Environment: Hadoop, Sqoop, Pig, HBase, Hive, Flume, Java 6, Eclipse, Apache Tomcat 7.0, Oracle, Java, J2ee, Talend, NiFi, Scala, Python .

Client

Cube It Innovations

Location

Hyderabad, TG

Designation

Etl Developer

Duration

Aug 2013 – Oct 2015

Responsibilities

Logical to Physical model conversion

Source views creation for Consumer CCAR

ETL jobs for Base to Main load

Working with data modelling team for solving physical model related issues

Wrote UNIX shell Scripts for file validation

Metadata design for ETL process

Developing loading scripts using Teradata Utilities

Automating the loading process using Unix shell scripts

Creating Edit check views

Creating Business access views

Technical Summary

Big Data

Hadoop, Sqoop, Flume, Hive, Spark, Pig, Kafka, Talend, HBase, Impala

ETL Tools

Informatica, Talend, Microsoft SSIS, Confidential DataStage

Database

Oracle, Mongo DB, SQL Server 2016, Teradata, Netezza, MS Access

Reporting

Microsoft Power BI, Tableau, QlikView, SSRS, Business Objects(Crystal)

Business Intelligence

MDM, Change Data Capture (CDC), Metadata, Data Cleansing, OLAP, OLTP, SCD, SOA, REST, Web Services

Tools

Ambari, SQL Developer, TOAD, Erwin, Visio, Tortoise SVN

Operating Systems

Windows Server, UNIX (Red Hat, Linux, Solaris, AIX)

Web Technologies

J2EE, JMS, Web Service

Languages

UNIX shell scripting, SCALA, SQL, PL/SQL, T-SQL

Scripting

HTML, JavaScript, CSS, XML, Shell Script, Perl, and Ajax

Cloud

Azure, AWS, GCP

Version control

Git, SVN, CVS, GitLab

Tools

FileZilla, Putty, PL/SQL Developer, JUnit,

IDE

Eclipse, Microsoft Visual Studio 2008,2012, Flex Builder



Contact this candidate