Resume

Data Engineer Quality

Location:

Phoenixville, PA

Posted:

April 10, 2024

Contact this candidate

Resume:

Shiva Ram Kothapally

ad4wwy@r.postjobfree.com 484-***-****

SUMMARY

Exceptionally skilled data engineer with a deep expertise in AWS Redshift, Athena, Workflows, Lambda, and Glue, as well as a strong command of multi-cloud environments and DevOps tools such as Terraform, Ansible, Docker, and Kubernetes. Proficiency in Spark, MapReduce, SQL, Python, Scala, and various programming languages allows for the design and implementation of highly scalable and performant data pipelines with unwavering focus on data quality and integrity. Adept at swiftly resolving complex issues and providing innovative solutions through advanced analytical and problem-solving skills.

SKILLS

Big Data Technologies: Hadoop, Spark, Hive, MapReduce.

DevOps: Terraform, Ansible, Docker, Kubernetes, Jenkins.

AWS Glue Services: S3, Athena, Crawlers, Workflows, Lambda, CloudWatch, ETL.

Version systems: GIT, Gerrit.

IDE Tools: Visual Studio, PyCharm, NetBeans, XCode, Jupyter Lab.

Language/Scripts: Python, Spark, Scala, CSS, Bootstrap, HTML, JavaScript, NodeJS, React, Swift.

Database: RDBMS (MySQL, Oracle, PostgreSQL, MS SQL Server), Mongo DB, Redshift (columnar Data warehouse), Elastic Search

DevOps: Terraform, Ansible, Docker, Kubernetes, Jenkins.

Data visualizations: Tableau, PowerBI, Metabase.

EXPERIENCE

Data Engineer

SEI Investments Sept 2023 - Present, PA

Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.

Extracted data and load into HDFS using Sqoop commands and scheduling Map/Reduce jobs on Hadoop.

Implemented Lambda to configure Dynamo DB Autoscaling feature and implemented Data Access Layer to access AWS DynamoDB data.

Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.

Experience in Migrating existing databases from on-premises to AWS Redshift using various AWS Services.

Developed the Spark code for AWS Glue Jobs and EMR.

Worked on different files like CSV, JSON, Flat, Parquet to load the data from source to raw tables. Implemented Triggers to schedule pipelines.

Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for data sets processing and storage.

Performed configuration, deployment, and support of cloud services in Amazon Web Services (AWS).

Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.

Transformed the data using AWS Glue dynamic frames with PySpark, catalogued the transformed the data using Crawlers and scheduled the job and crawler using workflow feature.

Used Data Integration to manage data with speed and scalability using the Apache Spark Engine and AWS Databricks.

Design Develop and test ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.

Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high- availability, fault tolerance, and auto-scaling.

Extracted structured data from multiple relational data sources as Data Frames in Spark SQL on Databricks.

Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets.

Performed the migration of large data sets to Databricks (Spark), create and administer cluster, load data, configure data pipelines, loading data from Oracle to Databricks.

Created Databrick notebooks to streamline and curate the data from various business use cases.

Worked on migrating existing on-premises application to AWS Redshift. Used AWS services like EC2 and S3 for processing and storage.

Assist with the development and review of technical and end user documentation including ETL workflows, research, and data analysis.

Worked on Scheduling all Jobs using Airflow scripts using python added different tasks to DAG, LAMBDA. Created database tables, indexes, constraints, and triggers for data integrity.

AWS Data Engineer/Software Developer

AMAZON June 2022 - July 2023, Virginia

Designed and setup Enterprise Data Lake to provide support for various use cases including Storing, processing, Analytics and Reporting of voluminous, rapidly changing data by using various AWS Services.

Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into S3, Dynamo DB and Redshift for storage and analyzation.

Design and develop ETL integration patterns using Python on Spark

Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs

Wrote various data normalization jobs for new data ingested into Redshift.

Worked at optimizing volumes and EC2 instances and created multiple VP instances and on IAM to create new accounts, roles and groups.

Implemented SparkRDD transformations to map business analysis and apply actions on top of Transformations.

Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.

Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics

Integrated services like GitHub, AWS CodePipeline, Jenkins and AWS Elastic Beanstalk to create a deployment pipeline.

Created monitors, alarms, and notifications for EC2 hosts using CloudWatch Implemented new projects builds framework using Jenkins as build framework tools.

Designed and implemented ETL jobs for data processing using various AWS services, including Glue Crawlers for automatic schema discovery, S3 buckets for storage, and Athena for data transformation which led to a 90% reduction in processing time for the application.

Extracted data from multiple source systems S3, Redshift, RDS and Created multiple tables/databases in Glue Catalog by creating Glue Crawlers.

Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.)

To analyze the data Vastly used Athena to run multiple queries on processed data from Glue ETL Jobs and then used Quick Sight to generate Reports for Business Intelligence.

Converted and delivered 300M+ CSV records to 3.5M Parquet files using AWS Glue Jobs and Python scripts.

Redesigned an architecture and changed the triggering of long-running ETL jobs from Lambda functions to Workflows.

Designed various dashboards for the Code Quality team to have insights into developer's productivity based on the number of code reviews, number of bugs logged, number of bugs resolved, and number of bugs reopened using Metabase and Python.

Developed generic framework by using Scala and bash scripts which involves data ingestion, processing of different data formats (CSV, JASON, PARQUET), and feeding the data into Hive tables.

Chunking the data to convert them from larger data sets to smaller chunks using Python scripts which will be useful for faster data processing.

Designed and implemented Spark and Python scripts to fetch build data and test results from Jenkins to push into AWS S3 and Redshift to e nable faster reporting.

Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda function and configured it to receive events from your S3 bucket.

Utilized Python Libraries like Boto3, NumPy for AWS

Created external tables with partitions using Hive, AWS Athena, and Redshift.

Responsible for Designing Logical and Physical data modelling for various data sources on Redshift.

Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.

Data Engineer/Software Developer

Quadrant Resources Jun 2018 - December 2020, Hyderabad, India

Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.

Involved in the development of real time streaming applications using PySpark, ApacheFlink, Kafka, Hive on distributed Hadoop Cluster.

Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.

Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3.

Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.

Creating Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2 resources.

Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS

Packages).

Automated nightly build to run quality control using Python with BOTO3 library to make sure pipeline does not fail which reduces the effort by 70%.

Involved with development of Ansible playbooks with Python and SSH as wrapper for management of AWS node configurations and testing playbooks on AWS instances.

Developed Python AWS serverless lambda with concurrent and multi-threading to make the process faster and asynchronously executing the callable.

Experienced in full software development life cycle (SDLC), architecting scalable platforms, and utilizing object-oriented programming, database design, and agile methodologies.

Proficient in Python and Django, developing web applications with views, templates, and dynamic web pages.

Proficient in version control systems like CVS, Git, and Bitbucket, deploying applications with Apache, GitHub, and Amazon EC2.

Skilled in web scraping, data parsing, and database integration.

Created web crawlers using the requests, Beautiful Soup modules to extract the data from different websites for reports validation and generation.

Extensive experience in working with RESTful web services and leveraging Python libraries including Requests, Numpy, Scipy, Matplotlib, and Pandas for data analysis and visualization.

EDUCATION

Master’s in computer science

Northwest Missouri University • Maryville, MO • 2022

CERTIFICATIONS

Intensive Cloud Computing Hands-on Training

Introduction to Data Science BY CISCO

Contact this candidate