Resume

Data Warehouse Engineer

Location:

Dallas, TX

Posted:

September 18, 2023

Contact this candidate

Resume:

Vineeth Goshika

E-mail: adzsja@r.postjobfree.com Ph: +1-234-***-****

EDUCATION

Kent State University, Kent, OH

Masters in computational Data Science

JB Institute of Engineering and Technology, Hyderabad, Telangana, India

Bachelor of Technology in Computer Science

SKILLS

Programming Languages: Python, PySpark, Java,

ETL Tool: Informatica, ODI (11g/12c)

Cloud Platforms: AWS, Azure, Snowflake.

Web Technologies: HTML, CSS, XML, JavaScript.

Databases: MySQL, Oracle 11g/12c, Snowflake, Redshift

Version Control: GitHub, Bitbucket

Visualization: Qlik Sense, Tableau, PowerBI

WORK EXPERIENCE

Data Engineer

Webster Bank, Atlanta, GA Mar 2023- till date

•Perform complex transformations from different sources in AWS Redshift and Unload the result dataset into HIVE/Presto stage which is built on AWS Data Lake S3.

•Involved in Requirement Analysis & preparation of Functional Design documents.

•Design and Develop ETL processes in AWS Glue to migrate campaign data from external sources like S3, Parquet/Text files into AWS Redshift.

•Created external tables with partitions using AWS Athena and Redshift. Also developed stage mechanism to store external S3 bucket files.

•Developed business logic in semantic layer by creating views in AWS Redshift to provide transformation logic visibility.

•Create and load staging tables based on the logic defined in views using Distribution, Sort Key’s for optimal performance.

•Experience in data migration management and implementing MySQL Server security and Object permissions like maintaining database authentication modes, creation of users, configuring permissions and assigning roles to users.

•Implemented and distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.

•Developed frontend and backend modules using Python and Django including Tastypie web framework using Git.

•Communicate with peers and supervisors routinely, document work, meetings, and decisions.

•Perform tunning of AWS Redshift SQL queries by effectively using appropriate Distribution Styles and keys.

•Created a Lambda function to run the AWS Glue job based on the need for AWS S3 event.

•Experience developing powerful data visualizations using Microsoft Power bi, Tableau.

•Created alarms, monitors, notifications, and logs for Lambda functions, Glue jobs using Cloudwatch.

•Build Python Programming to extract data from AWS S3 and load into SQL Server for one of the business teams.

•Worked on Big Data technologies (HBase, Hive, MapR, Pig and Talend), implemented Spark using Scala and SparkSQL for fasted testing and processing of data.

•Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking.

•Worked on backend using Scala and Spark to perform several aggregation logics, implementation of hive-HBase integration by creating hive external tables and using HBase Storage handler.

•Creation of Business views using DENODO VDP and scheduling jobs using Active Batch Scheduler.

•Implemented snow pipe for real-time data ingestion and solutions using Snowflake’s data sharing, cloning and time travel.

•Implemented one time data migration of multistate level data from SQL server to Snowflake by using Python and SnowSQL.

•Worked on software development in Python using Pandas, Numpy, Scipy, Matplotlib using dataframes.

•Performed data transformations for huge data using dataframes in pandas, scikit learn algorithms.

•Developed ETL pipelines in and out of data warehouse, developed major financial and regulatory reports using advanced SQL queries in Snowflake.

Software Development Engineer-Big Data

Amazon Web Services Dec 2021- Feb 2023

•Participating in and supporting software development projects across various phases of the software development lifecycle, including design, development, testing, implementation, and maintenance, for enterprise-level, multi-tiered, distributed software solutions.

•Developed automation and processes to enable teams to deploy, manage, configure, scale, monitor applications using Python, Pyspark in AWS Cloud.

•Developed the back-end web services using Python and flask REST framework using JWT.

•Experience working GraphQL APIs via client-side JavaScript or server side via Node.js. Developed web applications using react.js, jQuery. Used frameworks such as Bootstrap and Angular.

•ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Writing SQL queries against Snowflake.

•Experience in handling, configuration, and administration of databases like MySQL and NoSQL databases like MongoDB and Cassandra.

•Implemented end-to-end data ingestion, transformation, and storage solutions using Delta Lake, ensuring data integrity, reliability, and scalability.

•Create a Pyspark frame to bring data from DB2 to Amazon S3.

•Designed and implemented data lake architectures using Delta Lake on Databricks, enabling efficient data

•Created Spark Streaming jobs using Python to read messages from Kafka & download JSON files from AWS S3 buckets.

•Developed Data ingestion modules using AWS Glue, AWS Step Functions and Python Modules.

•Developed data warehouse model in snowflake for over 100 datasets using where Scape.

•Designed front end and backend of the application utilizing Python on Django Web Framework. Develop buyer-based highlights and applications utilizing Python and Django in test driven development and pair-based programming.

•Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL database package.

•Written python scripts integrating Boto3 to supplement automation provided by Ansible and Terraform for tasks such scheduling lambda functions for routine AWS tasks.

•Involved in developing Python microservices which are interconnected in the AWS cloud also involved in consuming and building web services both SOAP and RESTful.

•Worked on data cleaning to ensure data quality, consistency, integrity using Pandas, NumPy.

•Developed Data ingestion modules into data to various layers in S3, Redshift and Snowflake using AWS kinesis, AWS lambda, AWS Glue, AWS Step Functions.

•Understanding of IAM auditing and compliance, including monitoring IAM logs and setting up compliance reports.

•Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source systems which include loading nested JSON formatted data into snowflake table.

•Performed SQL queries using AWS Athena to analyze data in AWS S3 buckets.

•Ability to troubleshoot IAM issues and provide technical support to users and stakeholders.

•Developed a chatbot or virtual assistant using NLTK's natural language understanding and generation tools.

•Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.

•Generated graphical reports using python package NumPy and matplotlib.

•Knowledge and usage of the open-source machine learning frameworks available on the market, like TensorFlow, Python ML and others.

•Utilized OOP to model complex systems and data structures, allowing for easier debugging and maintenance.

•Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and MLlib.

•Created data frames in particular schema from raw data stored at Amazon S3, lambda using PySpark.

•Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using Flask, SQL Alchemy and PostgreSQL.

•Develop Python microservices with Flask framework for Confidential & Confidential internal Web Applications.

•Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.

•Designed and implemented serverless architectures using AWS Lambda and Scala, improving system scalability and reducing operational overhead, optimized EC2 instances to ensure maximum performance and cost efficiency.

•Configured and deployed EMR clusters to batch processing of large datasets using features spot instance, custom AMIs and persistent storage, infrastructure provisioning and configuration management for scaling of resources on EC2 and EMR.

•Designed and developed database solutions using both column-store and row-store tables, applying data modeling standards and optimizing table performance through appropriate data types, keys, and sharding strategies.

•Implemented a data flow from raw data in S3 to organized and structured data using Python in Lambda functions.

Data Engineer

Health Care Service Corporation, Chicago. Feb 2021- Dec 2021

•knowledge of building large-scale data pipelines in distributed environments like Apache Spark, HBase, Hive, Kafka, Azure, AWS, and Integration tools like Mule soft.

•Proficiency in designing, development, and implementation of Big Data technologies in the healthcare space.

•Develop framework for converting existing PowerCenter mappings to PySpark jobs and provide guidance to development team working on PySpark as ETL platform.

•Design and develop ETL integration patterns using Scala on Spark and created PySpark frame to bring data from DB2 to AWS S3.

•Participate in team activities with Strategy and financial, such as design talks, stand-up meetings, design, and code reviews.

•Building ETL jobs using Pyspark with Jupyter Notebooks in on premises cluster for certain transforming needs and HDFS as data storage system.

•Leverage Hive Query Language to perform data analysis, profiling, and performance metrics on data.

•Create software repositories relying on Python, Scala, and UNIX shell scripting to create spark data ingestion mappings that contain data transformations, data quality procedures, and verification checks.

•Enhance Spark Ingestion methodologies to increase the performance and enable faster data processing by employing both code and infrastructure side changes.

•Define Behavioral Data Driven tests for post commit validations and publish coverage reports on GitHub, Design Formal test, Unit test cases, functional test, smoke test, and regression test for spark submit jobs.

•Analyze and create business models, logical specification, and user requirements to achieve solutions for the application environment.

•Create software repositories relying on Python, pyspark and UNIX shell scripting to create spark data ingestion mappings that contain data transformations, data quality procedures, and verification checks.

•Created ETL pipelines and triggers in Azure tool. Also, configure events to perform change data capture (or) bulk load into various storage layers like Raw, Curated and Create date partitions based on file name to load into Azure Data Factory.

•Worked on Spark applications utilizing Spark-SQL in Databricks for data extraction, transformation, and aggregation from various file formats for analysis and transformation of the data to reveal consumer usage trends.

•Using a combination of Azure Data Factory, Spark SQL, and U-SQL Azure Data Lake Analytics, extract, transform, and load data from sources systems to Azure Data Storage services. Data processing in Azure Databricks after being ingested into one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, and Azure DW).

•Build Spark applications using PySpark and Spark SQL to extract, convert, and aggregate data from various file formats in order to analyze and transform the data and learn more about user behavior.

•Optimized table performance through effective indexing, including creating and managing indexes and keys to minimize data skewness and improve query performance.

•Designed, implemented, and maintained stored procedures, including handling exceptions and ensuring database integrity.

•Developed and maintained ETL processes to extract, transform, and load data from various sources into databases.

•Mentor for monitoring and debugging the Spark data bricks cluster, predicting the cluster size, and

•Capability to conduct data manipulation tasks within a spark session using the spark DataFrame API, Solid Grasp of the Spark Architecture, including the execution hierarchy, fault tolerance, DataFrame, Spark SQL, Driver Node, Worker Node, Stages, Executors, and Tasks.

•Enhance Spark Ingestion methodologies to increase the performance and enable faster data processing by employing both code and infrastructure side changes.

Big Data Developer/Data Engineer Dec 2019 -Dec 2020

Credit Acceptance, Michigan.

•Created various stages like internal/external stages to load data from various sources like AWS S3 and on-premises systems.

•Responsible for all activities related to the development, implementation, administration, and support of ETL processes for large scale Snowflake cloud data warehouse.

•Develop stored procedure/views in Snowflake and used Talend for loading Dimensions and Facts.

•Bulk loading from external stage AWS S3 to internal stage Snowflake using COPY command.

•Developed various views using windowing functions like row_number, rank and dense_rank.

•Used COPY, LIST, PUT and GET commands for validating internal and external stage files.

•Writing complex Snowflake SQL scripts in Snowflake cloud data warehouse for Business Analysis and reporting.

•Utilized AWS DataSync to transfer large amounts of data between on-premises storage and S3.

•Designed and implemented S3 bucket policies to ensure proper access controls and security, utilized S3's versioning feature to maintain a historical record of changes to objects.

•Configured DataSync to run in a fully automated and scalable manner.

•Design and develop ETL integration patterns using Python on Spark and created PySpark frame to bring data from Snowflake to AWS S3.

•Validating the data from SQL Server to Snowflake to make sure it has a perfect match and implemented Change Data Capture technology in Talend to load deltas to a Data Warehouse.

•Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.

•Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.

•Create external tables with partitions using Hive, AWS Athena and Redshift

•Designed External and Managed tables in Hive and processed data to the HDFS using Sqoop.

•Create user defined functions UDF in Redshift

•Migrate Adobe Marketing Campaign data from Oracle into HDFS using Hive, Pig, Sqoop Defined virtual warehouse sizing for Snowflake for different type of workloads and built the Logical and Physical data model for Snowflake as per the changes required.

•Used UNIX for Automatic Scheduling jobs, Involved in Unit Testing of newly created PL/SQL blocks of code.

•Worked on the development of self-services for common database services, providing easy-to-use and automated solutions for the team.

•Strong understanding of containerization technologies such as Docker and Kubernetes, and experience deploying containerized applications to production environments

•Created and maintained ETL jobs using AWS Glue to transform data from various sources.

Associate Engineer. Mar 2018-Aug 2019

IBM India Private Limited, India.

•Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.

•Handled all the client-side validation using JavaScript.

•Rewrite existing Python/Django modules to deliver certain format of data.

•Using Subversion control tool to coordinate team-development.

•Used Django Database APIs to access database objects.

•Writing Constraints, Indexes, Views, Stored Procedures, Cursors, Triggers and User Defined function.

•Used JQuery for all client-side Java script manipulation.

•Designed and developed data management system using MySQL.

•Developed entire frontend and backend modules using Python on Django Web Framework.

•Created unit test/regression test framework for working/new code.

•Developed Spark code using Python for faster processing of data on Hive (Hadoop). Developed

•Map Reduce jobs in Python for data cleaning and data processing.

•Used different types of transformations and actions in Apache Spark.

•Experience in writing custom User Defined Functions (UDF) in Python for Hadoop (Hive and Pig).

•Used Spark cluster to manipulate RDD's (resilient distributed datasets). And also used concepts of RDD partitions.

•Connecting my SQL database through Sparkdriver.

Contact this candidate