Data Engineer Business Intelligence

Location:

Santa Fe, NM, 87501

Posted:

February 23, 2024

Contact this candidate

Resume:

VEERANJANEYULU GANJI

********************@*****.***

+1-940-***-****

Data Engineer

Data Engineering professional with around 4+ years of total IT experience with expertise in Data Pipeline Design, Developing applications and Implementation.

Professional Summary:

•Expert with the design of custom reports using data extraction and reporting tools, and development of algorithms based on business cases.

•Extensive experience in data modeling, data architect, solution architect, data warehousing & business intelligence concepts and maser data management (MDM) concepts.

•Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark SQL, Kafka.

•Excellent understanding of Hadoop architecture and underlying framework including storage management.

•Experience in working with Map Reduce concept and techniques and leveraging Hive commands for faster data retrieval.

•Constituted Hive UDFs and ran complicated HiveQL searches to retrieve necessary data from Hive tables.

•Good understanding of data modeling (Dimensional & Relational) concepts like Star-Schema Modelling, Snowflake Schema Modelling, Fact and Dimension tables.

•Hands-on experience with Snowflake utilities, Snow SQL, Snow Pipe, Big Data model techniques using Python

•Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source systems which include loading JSON formatted data into snowflake table.

•Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture. Creative skills in developing elegant solutions and challenges related to pipeline engineering.

•And also working on the SNAP,MEDICAID,LIHEAP,LIWAP,MSP,CASH Program.

•Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.

•Experience in AWS Cloud Computing and configuring, deploying instances, and automating on cloud environments.

•Experience with AWS Cloud platform and its features which include Amazon AWS Cloud Administration and services including EC2, S3, EBS, VPC, ELB, IAM, Glue, Crawler, Spectrum, SNS, Autoscaling, LAMBDA, Cloud Watch, Cloud Trail, Cloud Formation.

Skills:

Data Technologies: MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, Kafka, Zookeeper, Yarn, Sparklib.

Databases: Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata, Cosmos.

Frameworks: Django, Flask, Hadoop, Apache spark, big query, Apigee.

Programming Languages: Python, PySpark, Scala, R, Shell Scripting, Java.

Web Technologies: CSS, HTML, XHTML, AJAX, XML, JSON, JavaScript

Web Services: AWS, GCP, Azure, Snowflake, Apache Tomcat, WebLogic.

Visualization/Reporting: Tableau, PowerBI, Looker, SSIS, SSRS, SSAS.

Development Tools: Databricks, R Studio, PyCharm, Jupyter Notebook,

Version Control: Git, GitHub, SVN, CVS

Methodologies: Agile (Scrum), Waterfall.

Experience:

Data Engineer State Of New Mexico, Santa Fe, NM April 2023 to Present Responsibilities:

•Part of a data migration team whose goal is to transfer all the data from On-prem Oracle DB into an AWS Cloud Platform.

•Working on data ingestion from an existing on-premises application to AWS and used AWS services like Amazon Kinesis for processing real time data.

•Migrated data from AWS S3 bucket to Snowflake by writing a custom read/write Snowflake utility function using Scala.

•And also working on the SNAP,MEDICAID,LIHEAP,LIWAP,MSP,CASH Program.

•Developed solutions to leverage ETL tools and identify opportunities for process improvements using Scheduling tool.

•Troubleshoot and maintain ETL/ELT jobs running using matillion.

•Handling the streaming data Warehousing onto AWS S3 bucket and Snowflake by adopting Apache Kafka and Making connections between them through spark.

•Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI tools like Alteryx.

•Scheduled all jobs using Airflow scripts using python added different tasks to DAG (Directed Acyclic Graph), LAMBDA.

•Created a Lambda Deployment function and configured it to receive events from the S3 bucket.

•Created data partitions on large data sets in S3 and DDL/DML on partitioned data.

•Designed and implemented for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.

•Data modeling on the data and joined them with other DIM tables for tableau reporting.

•Worked on single view of customer or MDM and Testing of MDM features.

•Implemented continuous integration & deployment (CI/CD) through Jenkins for Hadoop jobs and Managed Hadoop clusters using Cloudera.

•Using CloudWatch created monitors, alarms, alerts, and logs for EC2 hosts, Glue Jobs, and Lambda functions.

Environment: Spark, Glue, SQL, PostgreSQL, AWS S3, EC2, DynamoDB, Mongo, Hive, Pyspark, Snowflake, Matillion, Kafka, Redshift, HDFS, Flask, Alteryx, Lambda, Hadoop, Tableau, CloudWatch, Cloudera, CI/CD.

Data Engineer Capital One Plano Texas. Dec 2021 to April 2023

Responsibilities:

•Migrated data from Snowflake DB to Redshift to support Data Science initiatives.

•Predominantly associated with using Python and PySpark Lambda Functions to create on-demand tables on S3 files and setting up steps when there is a modification of files.

•Filtering data stored in S3, Athena buckets using Elastic search and loaded data into Hive external tables.

•Establishing JDBC connection to connect between DB and Spark to perform Pyspark transformations.

•Using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD, Spark YARN.

•Used SQL for Querying the database in the UNIX environment.

•Worked on AWS RDS & Redshift for implementing models & involved in converting Hive/SQL queries and Map business analysis into Spark transformations using Spark RDDs, Python, and Scala.

•Used Spark SQL for Scala and Python interface that automatically converts RDD case classes to schema RDD.

•Importing the data from different sources like HDFS/HBase into Spark RDD and performing computations using PySpark to generate the output response.

•Integrating Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon Sage maker.

•Implementing AWS Step Functions to automate and orchestrate the Amazon Sage Maker-related tasks such as publishing data to S3, Athena training ML model, and deploying it for prediction.

•Creating AWS Lambda, EC2 instances provisioning on AWS environment and implemented security groups, administered Amazon VPCs.

•Redesigned the Views in snowflake to increase the performance and Unit tested the data between star and Snowflake.

•Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Advance auto AWS S3, Athena bucket.

•Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS, SQL Server using Python.

Environment: Spark, SSIS, Cloudera, SQL, AWS S3, EC2, Hive, Pyspark, Snowflake, Matillion, Kafka, Redshift, HDFS, Flask, Lambda, Hadoop, Alteryx, Talend, Tableau, CloudWatch, Cassandra, Kinesis.

Data Analyst Bizzflo Hyd, India December 2019 to August 2021

Responsibilities:

•Worked as a Data Analyst/Modeler to generate Data Models using SAP Power Designer and developed relational database system.

•Developed long term data warehouse roadmap and architectures, designs and builds the data warehouse framework as per the roadmap.

•Conducted user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP.

•Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.

•Worked on performing transformations & actions on RDDs and Spark Streaming data with Scala.

•Used MS Visio and Rational Rose to represent system under development in a graphical form by defining use case diagrams, activity, and workflow diagrams.

•Created ER diagrams using Power Designer modeling tool for the relational and dimensional data modeling.

•Involved in data mapping document from source to target and the data quality assessments for the source data.

•Responsible for data profiling and data quality checks to suffice the report requirements

•Worked with data investigation, discovery, and mapping tools to scan every single data record from many sources.

•Developed and maintained data dictionary to create metadata reports for technical and business purpose.

•Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL, Transact-SQL.

•Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snowflake Schemas.

•Involved in Normalization / De-normalization, Normal Form and database design methodology.

•Designed & Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.

•Handled performance requirements for databases in OLTP and OLAP models and used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports and generated Metadata while designing OLTP and OLAP systems.

•Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT).

•Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.

•Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.

•Worked on Data analytics migration using Python and Big Data - HDFS & Hive.

•Worked on sort & filters of tableau like Basic Sorting, quick, context, condition, top and operations, created excel summary reports as well as gathered analytical data to develop functional requirements using data modeling and ETL tools.

Environment: SQL, SQL server, PL/SQL, MS Visio, Rational Rose, SSIS, T-SQL, SSRS, SSIS, SSAS Teradata, HDFS, Hive.

SAP, Power Designer 16.6, OLTP, OLAP, ODS, Oracle Database, Lambda, R Studio, Python, Tableau.

Contact this candidate