Post Job Free

Resume

Sign in

Data Engineer Lake

Location:
Hartford, CT
Posted:
August 23, 2023

Contact this candidate

Resume:

Name: Praveen Kumar Reddy

Gmail: ady586@r.postjobfree.com

PhNo: +1-860-***-****

Summary:

Data Engineer

• Over 8+ years of IT development experience, including experience in Big Data ecosystem, and related technologies. Expertise in Business Intelligence, Data warehousing technologies, ETL and Big Data technologies.

• Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.

• Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue.

• Expertise in Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, IAM, DynamoDB, Cloud Front, Cloud Watch, Auto Scaling, Security Groups, EC2, Dynamo DB, Auto Scaling, Security Groups.

• Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming.

• Experienced in big data analysis and developing data models using Hive, PIG, and Map reduce, SQL with strong data architecting skills designing data-centric solutions.

• Excellent experience in designing and developing Enterprise Applications for J2EE platform using Servlets, JSP, Struts, Spring, Hibernate and Web services.

• Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

• Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, calculated fields, Sets, Groups, Parameters etc., in Tableau experience in working with Flume and NiFi for loading log files into Hadoop.

• Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming.

• Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.

• Extensively used ETL methodologies for supporting data extraction, transformations and loading processing, in a corporate-wide-ETL Solution using Ab Initio/Informatica Power center 6.2/6.1/5.1.

• Highly experienced in ETL tool Ab Initio in using GDE Designer. Very good working experience with all the Ab Initio components.

• Expertise and well versed with various Ab Initio Transform, Partition, Departition, Dataset and Database components.

• Experience in writing PL/SQL statements - Stored Procedures, Functions, Triggers and packages.

• Good understanding of distributed systems, HDFS architecture, Internal working details of MapReduce and Spark processing frameworks.

• Expertise in writing complex SQL queries, made use of Indexing, Aggregation and materialized views to optimize query performance.

• Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.

• Working experience in developing applications involving Big Data technologies like Map Reduce, HDFS, Hive, Sqoop, Pig, Oozie, HBase, NiFi, Spark, Scala, Kafka and Zookeeper and ETL (Data Stage).

• Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large-scale distributed systems.

• Experience in using build/deploy tools such as Jenkins, Docker and Open Shift for Continuous Integration & Deployment for Micro services.

• Used Data bricks XML plug-in to parse the incoming data in the XML format and generate the required XML as output.

• Proficient in big data tools like Hive and Spark and relational data warehouse tool Teradata etc.

• Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.

• Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, calculated fields, Sets, Groups, Parameters etc., in Tableau experience in working with Flume and NiFi for loading log files into Hadoop.

• Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.

• Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming. Project Experience:

Client: Nationwide, Columbus, OH

Role: Data Engineer

March 2022 – Present

Responsibilities:

• Work with subject matter experts and project team to identify, define, collate, document and communicate the data migration requirements.

• Validate Sqoop jobs, Shell scripts & perform data validation to check if data is loaded correctly without any discrepancy. Perform migration and testing of static data and transaction data from one core system to another.

• Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.

• Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

• Worked with developer teams on NiFi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka.

• Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.

• Worked in highlv parallelized (MP Solution) Ab initio environment to process 1+ Tera bvtes of data dailv.

• Tuning of Ab Initio graphs for better performance.

• Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce and store into HDFS (Horton works).

• Extracted and restructured the data into MongoDB using import and export command line utility tool.

• Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka

• Developing Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.

• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

• Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.

• Design & implement Spark SQL tables, Hive scripts job with stone branch for scheduling and create work flow and task flow.

• Experience in working with different join patterns and implemented both Map and Reduce Side Joins.

• Wrote Flume configuration files for importing streaming log data into HBase with Flume.

• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.

• Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis

• Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra

Environment: Sqoop, Shell scripts, Python, Azure HDInsight, Azure Data Lake, Azure Data Factory, Azure Data Lake Analytics, Azure Stream Analytics, Ab Initio, Azure SQL Data Warehouse, HDInsight/Databricks, NoSQL DB NiFi, Kafka, MapReduce, HDFS, Zookeeper, Azure Cosmos Activity, Hive, Azure SQL Database, Azure Analysis Service, Spark SQL, PySpark, Spark-SQL, Spark Cassandra connector, HBase, Cassandra, Spark Streaming, HDFS

Client: GE Healthcare, Atlanta, GA Aug 2019 –Feb 2022 Role: Big Data Engineer

Responsibilities:

• Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.

• Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.

• Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.

• Data Extraction, Aggregations and consolidation of Adobe data within AWS Glue using PySpark.

• Create external tables with partitions using Hive, AWS Athena and Redshift

• Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.

• Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.

• Fixing the Load balancing issues of Datastage Jobs and Database Jobs on Server.

• Created data models for AWS Redshift and Hive from dimensional data models.

• Involved in selecting and integrating any Big Data tools and frameworks required to provide requested capabilities

• Developed Pig scripts and UDF's as per the Business logic.

• Developed a new architecture for the project which uses less infrastructure and costs less, by converting the data load jobs to read directly from on premise data sources.

• Executed change management processes surrounding new releases of SAS functionality

• Prepared complex T-SQL queries, views and stored procedures to load data into staging area.

• Participated in data collection, data cleaning, data mining, developing models and visualizations.

• Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.

• Developed Spark and Java applications for data streaming and data transformation

• Designed Data Stage ETL jobs for extracting data from heterogeneous source systems, transform and finally load into Data Warehouse

• Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.

• Created Hive External tables to stage data and then move the data from Staging to main tables

• Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.

• Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.

• Worked on Master Data Management (MDM) Hub and interacted with multiple stakeholders.

• Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.

• Extensively involved in development and implementation of SSIS and SSAS applications.

• Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis. Environment: Hadoop, HDFS, HBase, SSIS, SSAS, OLAP, Hortonworks, Data lake, OLTP, ETL, Java, ANSI-SQL, AWS, AWS Glue, SDLC, T-SQL, SAS, MySQL, Big Integrate, HDFS, Sqoop, Cassandra, MongoDB, Hive, SQL, PL/SQL, Teradata, Oracle 11g, MDM Client: Ascena Retail Group, Patskala, Ohio Mar 2017– July 2019 Role: Data Engineer

Responsibilities:

• Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.

• Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.

• Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.

• Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.

• Fixing the Load balancing issues of Datastage Jobs and Database Jobs on Server.

• Created data models for AWS Redshift and Hive from dimensional data models.

• Involved in selecting and integrating any Big Data tools and frameworks required to provide requested capabilities

• Developed Pig scripts and UDF's as per the Business logic.

• Developed a new architecture for the project which uses less infrastructure and costs less, by converting the data load jobs to read directly from on premise data sources.

• Executed change management processes surrounding new releases of SAS functionality

• Prepared complex T-SQL queries, views and stored procedures to load data into staging area.

• Participated in data collection, data cleaning, data mining, developing models and visualizations.

• Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.

• Developed Spark and Java applications for data streaming and data transformation

• Designed Data Stage ETL jobs for extracting data from heterogeneous source systems, transform and finally load into Data Warehouse

• Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.

• Created Hive External tables to stage data and then move the data from Staging to main tables

• Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.

• Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.

• Worked on Master Data Management (MDM) Hub and interacted with multiple stakeholders.

• Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.

• Extensively involved in development and implementation of SSIS and SSAS applications.

• Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis. Environment: Hadoop, HDFS, HBase, SSIS, SSAS, OLAP, Hortonworks, Data lake, OLTP, ETL, Java, ANSI-SQL, AWS, SDLC, T-SQL, SAS, MySQL, Big Integrate, HDFS, Sqoop, Cassandra, MongoDB, Hive, SQL, PL/SQL, Teradata, Oracle 11g, MDM Client: Grapesoft Solutions Hyderabad, India Aug 2015 – Dec 2016 Role: Data Engineer

Responsibilities:

• As a Data Engineer, my role includes analyzing and evaluating the business rules, data sources, data volume and come up with estimation, planning and execution plan to ensure architecture meets the business requirements.

• Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.

• Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.

• Using Sqoop to channel data from different sources of HDFS and RDBMS.

• To meet specific business requirements wrote UDF's in Scala and Store Procedures Replaced the existing Map Reduce programs and Hive Queries into Spark application using Scala.

• Created several types of data visualizations using Python and Tableau.

• Extracted Mega Data from AWS using SQL Queries to create reports.

• Performed Regression testing for Golden Test Cases from State (end to end test cases) and automated the process using python scripts.

• Responsible for data cleansing from source systems using Ab Initio Components such as join, dedup sorted, De normalize, Normalize, Reformat, Filter by expression, Rollup

• Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis pertaining to Loan products.

• Gathered Data from Help Desk Ticketing System and write ad-hoc reports and, charts and graphs for analysis.

• Worked to ensure high levels of Data consistency between diverse source systems including flat files, XML and SQL Database.

• Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple heterogeneous information sources.

• Built reports and report models using SSRS to enable end user report builder usage.

• Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.

• Working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.

• Developed spark code and spark-SQL/streaming for faster testing and processing of data.

• Used Hive to implement data warehouse and stored data into HDFS. Stored data into Hadoop clusters which are set up in AWS EMR.

Environment: Apache Spark, Hadoop, Spark-SQL, AWS, Java, Scala, MapReduce, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting. Client: Ceequence Technologies, Hyderabad, India Apr 2014 – Jul 2015 Role: Data Analyst

Responsibilities:

• Involved in various phases of development Analyzed and developed the system going through Agile Scrum methodology.

• Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.

• Worked with developer teams on NiFi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka.

• Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.

• Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).

• Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.

• Built pipelines to move hashed and un-hashed data from XML files to Data Lake.

• Build large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.

• Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.

• Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.

• Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework

Environment: Hadoop, Java, MapReduce, HBase, JSON, Spark JDBC, Hive, JSON, Pig.



Contact this candidate