Venkata Satyanarayana Gadhamsetty
+1-903-***-**** ***********.********@*****.*** LinkedIn
PROFESSIONAL SUMMARY
Sr Data Engineer
Experienced Big Data Engineer skilled in architecting scalable data solutions and implementing AWS cloud services such as Lambda, S3, and Glue. Proficient in designing and delivering complex ETL and data integration projects with a strong background in the Hadoop ecosystem, including Spark and Kafka. Dedicated to leveraging expertise to optimize data workflows and enhance analytics capabilities.
TECHNICAL SKILLS
•Hadoop Ecosystems: Hadoop, MapReduce, YARN, Apache Kafka, Spark, Sqoop, Hive, Oozie, Pyspark, PIG, HDFS, HBase
•Cloud Ecosystems: S3, Lambda, SNS, SQS, Cloudwatch, DynamoDB, Glue, Cloud Formation, Identity and Access Management, Terraforms, Athena, ADF, Databricks, CosmosDB
•SDLC Methodologies: Waterfall, Agile/Scrum.
•Reporting Tools: Tableau, Power BI, T-SQL, SQL (Advanced),(ADF).
EDUCATION
Texas A&M University- Commerce
Masters, Business Analytics
Vasavi College of Engineering, Osmania University
Bachelor's, Electronics and Communication Engineering
WORK EXPERIENCE
Rocket Mortgage, LLC Nov 2021 - Present
Big Data Engineer
•Designed data-intensive applications using AWS services like Lambda, Glue, API Gateway, and S3, enhancing data processing capabilities
•Maintained quality reference data by cleaning, transforming, and ensuring integrity in a relational environment, improving data accuracy and stakeholder satisfaction
•Created Glue ETLs to parse deeply nested JSON files and ingest them into the target S3 bucket, streamlining data processing
•Developed a global (generic) ingestion pattern for data ingestion, data transformation and data quality checks and failure notifications using AWS Glue service which saves development and testing efforts by 70%
•Hands on experience on Data Analytics services such as Athena, Glue Data Catalog.
•Wrote IAC (Infrastructure as Code) scripts using Terraform to create AWS resources and attach in-line policies across multiple environments, improving resource management efficiency
•Set up Spark UI for AWS Glue jobs to monitor job performance and memory utilization, enhancing job efficiency and resource allocation
•Orchestrated Glue ETLs, EMR jobs, and Lambda functions using Step Functions to efficiently invoke the EMR cluster, streamlining data processing workflows
•Developed and maintained ETL data pipelines to load on-premises data from sources like SQL Server and Kafka to AWS Data Lake using Attunity or Databricks, ensuring seamless data integration
•Created objects, stored procedures, and schedules on Redshift cluster through Flyway deployment tool, maintaining them to ensure data consistency and reliability
•Extracted, transformed, and loaded data from various sources in formats like JSON, CSV, and Parquet, using AWS compute services, improving data processing efficiency
•Created ETL's using AWS Glue to read data from multiple sources like S3, Catalogs and perform multiple business rules, data cleansing and load data into S3 in a parquet format.
•Implemented Redshift Pre/Post actions in AWS Glue jobs to run stored procedures, enhancing data processing automation and efficiency
•Wrote and scheduled stored procedures on Redshift using Scheduled Queries, improving data management and query efficiency
•Developed a cross-account Lambda function to move files between AWS accounts based on S3 triggers, enhancing data transfer efficiency and security
•Created and managed SSIS packages and data warehousing objects like tables and stored procedures, implementing run managers to efficiently schedule jobs, improving data processing efficiency
•Implemented an end-to-end solution for hosting a REST API using API Gateway on AWS Cloud, integrating with S3 buckets and DynamoDB tables, which enhanced data accessibility and system performance
•Allotted permissions, policies, and roles to users and groups using AWS Identity and Access Management, ensuring secure and efficient access control
•Built Azure Data Pipelines to integrate data from multiple sources into a centralized data warehouse, utilizing ADF and Azure Databricks, which supported the development of ML models for demand forecasting
•Developed common utilities for use across various teams in Azure Databricks, enhancing collaboration and efficiency in Gold Applications
•Created Athena/Redshift views for external and internal data consumers by granting role-based access and maintaining data securi- ty/masking, enhancing data accessibility and security
•Implemented a framework to parse CSV and JSON files exported through DynamoDB and processed via Firehose, improving data handling efficiency
•Chunked data to convert larger datasets into smaller chunks using Python scripts, facilitating faster data processing
•Maintained Apache Iceberg tables by running compaction regularly and applying retention policies to remove outdated snapshots, ensuring data integrity and efficiency
•Worked with different table APIs like Athena, Apache HUDI, and Iceberg formats, enhancing data management capabilities
•Performed API testing using POSTMAN for various request methods such as GET, POST, PUT, and DELETE to ensure accurate responses and effective error handling, improving system reliability
•Developed several PySpark scripts used in AWS Glue jobs to perform ETL processes with Auto Scaling, enhancing data processing efficiency
•Responsible for allotting/removing permissions using AWS Cloud formation service to a specific Glue Catalog databases or tables.
Quicken Loans LLC Sep 2020 - Nov 2021
Software Engineer/ Big Data Engineer
•Developed Python scripts to make API calls and process response data using Lambda functions, improving data transformation efficiency
•Designed and developed Amazon S3, SQS, SNS, and Lambda services, enhancing AWS infrastructure capabilities
•Ran scripts and code snippets using AWS Lambda in response to cloud events, ensuring efficient event-driven processing
•Integrated services like GitHub, CircleCI, and HAL to create a deployment pipeline, streamlining the deployment process
•Managed roles and permissions using AWS IAM, ensuring secure access control for users and resources
•Initiated alarms in CloudWatch to monitor server performance, improving CPU utilization and disk usage for enhanced system efficiency
•Wrote UNIX Shell scripts to automate jobs and scheduled Cron jobs using Crontab, reducing manual intervention and increasing process efficiency
•Built a generic API ingestion pattern to retrieve payloads from multiple APIs and ingest data into Athena tables, streamlining data processing and improving data accessibility
•Exported NoSQL DynamoDB streams using AWS Kinesis Firehose, maintaining near real-time data analytics with low latency for timely insights
•Designed and architected data migration of existing data engineering pipelines from SQL Server DW to AWS cloud using Attunity, enhancing data processing capabilities and scalability
•Utilized Athena and Glue Data Catalog for data analytics, enhancing data processing efficiency
•Wrote Infrastructure as Code scripts using Terraform to create AWS resources and attach in-line policies across multiple environments
•Developed checkpointing logic using DynamoDB tables to integrate data into Lambda functions, improving data processing reliability
•Monitored AWS resources using CloudWatch logs, ensuring system performance and reliability
•Set up environments by copying files from S3, writing files to S3, and configuring Hive tables to point to S3 locations, streamlining data access and management
•Extracted, transformed, and loaded data from various sources in formats like JSON and databases, processing them using AWS compute services, which improved data accessibility and processing efficiency
•Developed hive queries to create Hive tables and views pointing to the S3 location and pre-processing the data according to the business purposes.
•Developed and deployed AWS Lambda functions for ETL migration services, generating serverless data pipelines written to Glue Catalog and queried from Athena, enhancing data processing efficiency
•Created Oozie workflows to schedule Hive partition management strategies for syncing data in S3 and Hive, and developed Coordi- nators to automate these workflows, improving data synchronization and workflow efficiency
•Worked on ingesting data from data lake to data warehouse using SSIS package and setting up connection managers from Hive to Data Warehouse.
•Created stored procedures to remove duplicates and cleanse incoming data according to business requirements, ensuring accurate data loading into target tables
•Worked along with the Server Engineers and Platform engineering teams to deploy the AWS resources and the SSIS packages in the respective production environments.
•Created triggers to SQS and Lambda for the incoming data from SNS and also configured SQS according to the incoming batch data.
Blue Cross Blue Sheild (BCBS) Sep 2019 - Aug 2020
Software Engineer
•Developed Spark applications to perform data cleansing, transformations, and aggregations, improving data quality for downstream teams
•Created Sqoop scripts to import/export user profile data from RDBMS (DB2) to Azure Data Lake, enhancing data accessibility and integration
•Designed and implemented job scheduling using Shell scripting for data ingestion, processing, and storage on HDFS, streamlining data workflows
•Contributed to Agile methodologies by managing Scrum stories and sprints in a Python-based environment, improving team efficiency and project delivery
•Worked on data ingestion to Kafka and processed and stored data using Spark Streaming, enhancing real-time data processing capabilities
•Analyzed and fixed the root cause of technical issues and defects during development, improving system reliability
•Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala, enhancing data processing efficiency
•Troubleshot Spark applications to improve error tolerance, ensuring smoother data processing operations
•Developed Hive Queries in Spark-SQL for data analysis and processing, leading to more accurate data insights
•Used Scala programming to perform transformations and apply business logic, streamlining data workflows
•Created Kafka producer API to send live-stream data into various Kafka topics, improving data processing efficiency
•Created Hive tables and analyzed data using Hive scripts, enhancing data accessibility and insights
•Implemented partitioning and bucketing in Hive, optimizing query performance and data management
•Involved in continuous integration and deployment of applications using Jenkins and Urban Code Deployment, streamlining release processes
•Interacted with infrastructure, network, database, application, and BA teams to ensure data quality and availability, facilitating seamless operations
•Conducted daily data audits for incoming incremental data and monitored scheduled production processes, ensuring data integrity and process efficiency
•Collaborated with UNIX teams to deploy keytabs on multiple Edge nodes, enabling seamless server logon to Active Directory and improving system startup efficiency
•Executed Sqoop scripts on multiple Edge nodes to ingest data from RDBMS to Data Lake, enhancing data processing performance and scalability
•Automated pipeline workflows using ASG ZENA Scheduler to orchestrate MapReduce jobs, ensuring timely data extraction and improved workflow efficiency
•Developed PySpark scripts to ingest XML files from E-Gateway servers and CSV files to Data Lake, performing audit checks to ensure data accuracy and reliability
•Maintained cluster by adding and removing nodes, monitoring and troubleshooting issues, and managing data backups and Hadoop log files, ensuring system stability and data integrity
•Conducted QA testing for end-to-end application testing before deployment, ensuring smooth transition to production servers
•Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning, and Bucketing.
•Created high-level and detailed design documents with architects, facilitating clear communication and successful project execution
•Executed change management activities for production deployment, supporting developers, quality control analysts, and environment management personnel, leading to efficient deployment processes
•Monitored and deployed all production activities using Terraform and AWS Lambda, ensuring seamless operations and minimizing downtime
•Working with Scrum master/System Analysts to create Epics, User stories and updating the JIRA dashboards on a Sprint basis.
Silicon Matrix Enduring Solutions Jan 2016 - Jun 2017
Hadoop Developer
•Built scalable distributed data solutions using Hadoop, enhancing data processing efficiency
•Imported data from various sources and performed transformations using MapReduce and HIVE, improving data accessibility in HDFS
•Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows, streamlining data integration processes
•Worked on loading the data from MySQL to HBase where necessary using Sqoop
•Solved the small file problem using Sequence files processing in MapReduce, optimizing storage and processing efficiency
•Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs, enhancing data processing efficiency on HDFS and Hive
•Created a data pipeline integrating Kafka with Spark streaming applications using Scala, improving real-time data processing
•Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
•Optimized MapReduce Jobs by implementing compression mechanisms, reducing data processing time and resource usage
•Performed click stream analysis to understand customer behavior and used Flume for data ingestion, improving customer insights
•Enhanced the performance of HIVE and MapReduce by tuning, resulting in faster data processing times
•Optimized data processing by using distributed cache for small datasets and implementing partitioning and bucketing in Hive, leading to improved query efficiency
•Developed scripts using Scala for data transformations, which streamlined data processing workflows
•Utilized various Big Data file formats like text, sequence, Avro, Parquet, and Snappy compression to enhance data storage efficiency
•Monitored and managed the Hadoop cluster using Cloudera Manager, ensuring system stability and performance
•Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
•Monitored, controlled, and reviewed code at each phase of Qlik View projects, reducing inefficient coding methods and improving design quality
•Identified source data issues and used Tableau functions to resolve those issues and designed the Data-Model using various data sources like Teradata Database, Excel Sheets.