Senior Data Engineer

Location:

Charlotte, NC

Posted:

November 21, 2023

Contact this candidate

Resume:

Renuka Nalluru

Email: ******.*************@*****.***

LinkedIn: linkedin.com/in/renuka-nalluru, Phone: +1-704-***-****

CAREER SUMMARY:

Around 8 years of experience with Data Engineer in designing, developing and build data pipelines to process the large amounts of data providing executing solutions for complex business problems.

Hands-on experience with data analytics and big data technologies with Azure cloud services such as Azure Data Factory, Azure Data Lake, Azure Databricks and Azure SQL, Azure EventHub, Azure Logic Apps, Azure Synapse Analytics.

Proficient in Software Development Life Cycle (SDLC) methodologies and Database design for Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP).

Experience in creating Data governance policies, Business glossary, data dictionary, reference data, metadata, data lineage and data quality rules.

Developed Python Script using Spark Streaming to ingest data from Source to data models.

Developed Data Orchestration Pipelines and job and scheduler workflows using Apache Airflow, Apache Kafka’s.

Implementing the Data Quality and Data Lineage using Erwin Data Governance, Manta tool, Collibra Data Governance.

Developing complex Stored Procedures, Functions, Triggers, Views Cursors, Indexes, CTE joins and subqueries, Temporary tables with SQL.

Experience in transforming data using data mapping and data processing capabilities like Kafka, Spark, Spark SQL, HiveQL.

Involving Data integration and Data warehousing techniques, using ETL tools like Informatica Power Center 10.2/9.6/9.1/8.6, Informatica Power Exchange 10.2/9.6, Informatica intelligent Cloud Services (IICS)

Manage workflow process in Collibra via Activiti.

Experienced Data Modeler with strong conceptual, Logical and Physical Data Modeling skills, Maintaining Data Quality, creating data mapping documents and writing functional specifications and queries

Developing complex Stored Procedures, Functions, Triggers, Views Cursors, Indexes, CTE joins and subqueries, Temporary tables with SQL.

Experience in implementing the complex business rules by creating re-usable transformations, developing complex Mapplets and Mappings, PL/SQL Stored Procedure and Triggers

Experience in Creating ETL Design Documents, strong experience in complex PL/SQL packages, functions, cursors, indexes, views, materialized views

Experience using the main ETL tools as Alteryx, Excel, PowerBI, Datastage and ended up using Tableau for the data visualization of the expected end result.

Experience in Development, support and Maintenance for the ETL (Extract, Transform and Load) processes and writing SQL Queries to perform end-to-end ETL validations and support ad-hoc business requests.

Experienced in extracting, transforming and loading (ETL) from spreadsheets, database tables and other sources.

Technical Skills:

Databases

Oracle, Snowflake, SSIS, Snowflake, MySQL Server, Postgre SQL, T-SQL, Oracle, DB2.

Big data Technologies and ETL tools

Azure Databricks, Dbt (Data build tool), Erwin Modeler, AWS S3, AWS glue, Azure Storage, Alteryx, Azure Data Share, HDFS, MapReduce, Sqoop, Azure EventHub)

Programming

Python, SQL, Scala, Unix, Linux.

Developer Tools

Databricks Notebooks, Jupiter Notebooks, IntelliJ, Eclipse.

Tools and languages

SQL, PLSQL, PowerShell, Python and R (Intermediate). Tools: Toad, SQL developer, Teradata, Visual studio, Collibra, Confluence, Teradata SQL Assistant, Salesforce, MuleSoft, Erwin, Manta, Collibra.

Data Visualization & Business Intelligence tools

Tableau User and Power BI (Developer), SAP BO, IBM Cognos, Micro Strategy, Ad-hoc Reporting, PowerBi, Tableau, Tableau Server.

Operating System

Linux, Windows

Replication, Scheduling and Version control tools

Qlik replication, AWS App flow, Airflow, Run deck, GIT, Source Control, Azure KeyValuts, AWS SNS

Functional areas

Finance/Insurance, Healthcare, Marketing, Service providing and Advanced analytics

Professional Experience:

Client: MasterCard, O’Fallon, MO Aug 2022 Till Date,

Sr Data Engineer

Responsibilities:

·Migrating data from On-prem to the AWS cloud using spark streaming with batch-process.

·Developed python scripts to transfer data to AWS S3 buckets. Ingesting, Extracting the data, cleansing and transforming through AWS lambda, AWS Glue and Step functions.

·Configured and implemented the Spark to process data and improve the performance.

·Involved in migrating the Teradata queries into the Snowflake Datawarehouse, Performed query optimization and tuning.

·Developed data pipelines in azure data factory, using Linked Services, Datasets, and loading them to data lake and Snowflake.

·Building Orchestration pipelines and developed PySpark scripts to perform ETL transformations in Databricks with automated jobs.

·Upgraded the Azure SQL data warehouse gen 1 to Azure data warehouse gen2.

·Developed Power BI dashboards and generated reports for the business users.

·Built data orchestration pipelines and workflow jobs and scheduler pipelines with management tools such as Apache Airflow.

·Migrated map reduce programs into Spark transformations using Spark and Scala.

·Implemented data quality checks using Spark streaming and arranged the data using the spark scripts.

·Developed Spark programs, to process raw data populate them in staging tables, and stored refined data (JSON, XML, CSV) files in partitioned table in Data warehouse.

·Visual representation of analyzing the data using Quick sight and share via the organization.

·Building dashboards and reports using the AWS Quick sight and optimize the data quality metrics.

·Worked with No SQL databases such as HBase and integrated with Spark Realtime data processing.

·Deployed pipelines and developed python scripts, YAML files to extract from Netezza databases to the cloud storage (AWS S3, Snowflake).

Environment: No SQL, Spark, JSON, HBase, Python, Netezza, AWS, Snowflake, Scala, Apache, Azure, ETL, PySpark.

Client: Entergy, Louisiana Nov 2018 July 2022

Sr Data Engineer

Responsibilities:

Developed code for importing and exporting data from RBMS into HDFS using Sqoop and vice versa.

Implemented Partitions, Buckets based on State to further process using Bucket based Hive joins.

Developed custom UDF’s in Java as and used whenever necessary to reduce code in Hive queries.

Handled ETL Framework in Spark with python and Scala for data transformations.

Implemented various optimization techniques for Spark applications for improving performance.

Involved in Spark streaming for real-time computations to process JSON files from Kafka.

Developed APIs for quick real-time lookup on top of HBase tables for transactional data.

Built optimized dynamic schema tables using AVRO and columnar tables using parquet.

Built various oozie actions, workflows and coordinators for automation purpose.

Developed various scripting functionality using shell, Bash and Python for various operations.

Pushed application logs and data streams logs to Grafana server for monitoring and alerting purpose.

Developed Jenkins and Drone pipelines for continuous integration and deployment purpose.

Worked on building various pipelines and integration using Nifi for ingestion and exports.

Built custom end points and libraries in Nifi for ingesting data from traditional legacy systems.

Implemented integrations to various cloud environments like AWS, Azure and GCP for external vendor integrations for files exchange systems.

Implemented secure transfer routes for external clients using microservices to integrated external storage locations like AWS S3 and Google Storage Buckets (GCS).

Built SFTP integrations using various VMWare solutions for external vendors on boarding.

Developed automated file transfer mechanism using python from MFT, SFTP to HDFS.

Environment: Apache Hadoop 2.0, Cloudera, HDFS, MapReduce, Hive, Impala, HBase, Sqoop, Kafka, Spark, Linux, MySQL, Nifi, Oozie, SFTP

Client: Eureka IT Solutions, Hyderabad, India April 2017 Jun 2018

Data Engineer

Responsibilities:

Developed Hive ETL Logic for data cleansing and transformation of data coming through RBMS.

Implemented complex data types in hive also used multiple data formats like ORC, Parquet.

Worked in different parts of data lake implementations and maintenance for ETL processing.

Developed Spark Streaming application using Scala and python for processing data from Kafka.

Implemented various optimization techniques in spark streaming with python applications.

Imported batch data using Sqoop to load data from MySQL to HDFS on regular intervals.

Extracted data from various APIs, data cleansing and processing by using Java and Scala

Converted Hive queries into Spark SQL that integrate Spark environment for optimized runs.

Developed a migration data pipelines from HDFS on prem cluster to Azure HD Insights.

Developed Complex queries and ETL process in Jupiter notebooks using data bricks spark.

Developed different modules in microservices to collect stats of application for visualization.

Worked on docker and Kubernetes for deploying application and make it containerize.

Implemented Nifi pipelines to export data from HDFS to cloud locations like AWS and Azure.

Ingested data from Azure Event Hub for Realtime data ingestion into various applications.

Experience designing solutions in Azure tools like Azure Data Factory, Azure Data Lake, Azure SQL & Azure SQL Data Warehouse, Azure Functions.

Implemented Data Lake migration from on prem clusters to Azure for highly scalable solutions.

Worked on implementing various airflow automations for building integrations between clusters.

Environment: Hive, Sqoop, Linux, Cloudera CDH 5, Scala, Kafka, HBase, Avro, Spark, Zookeeper and MySQL, Azure, Databricks, Scala, Python, airflow.

Client: Green Byte Technologies, Hyderabad, India Jan 2016 to Mar 2017

Data Engineer

Responsibilities:

·Migrating data from On-prem to the AWS cloud using spark streaming with batch-process.

·Developed python scripts to transfer data to AWS S3 buckets. Ingesting, Extracting the data, cleansing and transforming through AWS lambda, AWS Glue and Step functions.

·Configured and implemented the Spark to process data and improve the performance.

·Involved in migrating the Teradata queries into the Snowflake Datawarehouse, Performed query optimization and tuning.

·Built data orchestration pipelines and workflow jobs and scheduler pipelines with management tools such as Apache Airflow.

·Experience with Spark Context, Spark SQL, Spark Yarn.

·Implementing Spark Scripts using Scala, Spark SQL to connect hive tables for faster processing.

·Also implemented Hive Partitioning and bucketing on the collected data in HDFS.

·Migrated map reduce programs into Spark transformations using Spark and Scala.

·Implemented data quality checks using Spark streaming and arranged the data using the spark scripts.

·Developed Spark programs, to process raw data populate them in staging tables, and stored refined data (JSON, XML, CSV) files in partitioned table in Data warehouse.

·Worked with No SQL databases such as HBase and integrated with Spark Realtime data processing.

·Deployed pipelines and developed python scripts, YAML files to extract from Netezza databases to the cloud storage (AWS S3, Snowflake).

·Visual representation of analyzing the data using Quick sight and share via the organization.

·Building dashboards and reports using the AWS Quick sight and optimize the data quality metrics.

·Created dashboards and views on the Tableau Server, and scheduled them weekly, monthly according to the business requirement and displayed the data accurately.

Environment: JSON, XML, AWS, Tableau, Snowflake, YAML, CSV, SQL, Scala, Spark, Apache, Lambda, HDFS and etc.

Client: Coupon Up, India Feb 2015 to Dec 2015

Data Engineer

Responsibilities:

·Gather the requirements from end users and analyze them to find the better approach to design the Data Model (Dimensional or ER Data Model).

·Analyzing the data related to project to find out the data statistics, cardinalities and any other information pertaining the data in question.

·Discussing the analyzed data with the Solution Architect and the whole team to find the best possible way to design and implement the data model in targeted environment.

·Run SQL queries in the source database environment in the process of analyzing the data and share the findings with the business team.

·Creation of Conceptual and Logical Model and review it with the Data architect and the business users.

·Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system

·Use ERStudio or Erwin to create a schema for different use cases according to the requirements provided.

·Transform the logical model into Physical data model and develop tables accordingly.

·Develop Source to Target mapping document for the ETL developers for easy transition of data from old to the new schema.

·Created Informatica Source & Targets Instances and maintain shared folders so that shortcuts are used in project.

·Responsible for Unit Testing and Integration testing of mappings and workflows

·Validating the implementation of the requirements provided in the business requirement document into the data model.

·Prepare the data model diagram and list of reports and use cases achieved through the Data Model.

·Design the Stored procedures and triggers for the reporting needs of the business users.

·Familiarizing the business terms and rules according the business domain to be developed like Finance, Services, HR and Sales.

Environment: ERStudio, Erwin, Teradata SQL Assistant, Palantir Foundry (Cloud Database platform), SSMS, SQL Developer, Oracle BI Client, MS Visio, MS Excel, JIRA, Confluence, IICS, XML, SQL.

Contact this candidate