Rahul Maloth Data Engineer
**************@*****.*** 940-***-**** linkedin.com/in/rahulmaloth/
ABOUT ME
Experienced Data Engineer and cloud Specialist with hands-on expertise in cloud platforms (AWS, Azure), big data technologies (Spark, Python, SQL), and end-to-end ETL development. Proficient in developing ETL pipelines, building data warehouses, and creating advanced analytics solutions. Skilled in database management, data modeling, and automation, with a track record of successful project delivery and team leadership.
PROFESSIONAL SUMMARY
●Over 4+ years of experience as a Data Engineer, product/dashboard development, software development and design using Python, SQL, Redshift, Spark and AWS.
●Have around 1 years of experience in leading/monitoring data engineers in the team, while helping them out with code and database issues.
●Strong knowledge in Spark ecosystems such as Spark core, Spark SQL, Spark Streaming libraries.
●Experience with container-based deployments using Docker, working with Docker images, Docker registries for AWS ECR.
●Extensive experience in Python Libraries like Boto3, Pandas, pyspark, snowpark, NumPy for AWS.
●Developed the PySpark code for AWS Glue jobs and for EMR.
●Experience with AWS services like S3, EC2, Redshift, DynamoDB, SQS, SNS, Lambda functions, IAM, cloudwatch, RDS experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue.
●Proficient in SQL databases Redshift, MSSQL Server, MySQL, Oracle DB, Postgres and DynamoDB, NoSQL.
●Writing SQL queries against Snowflake databases and building SnowSight Dashboards. Have experience with Material Views, Caching, Time Travel, Fail Safe in Snowflake.
●Learning knowledge of AWS Codepipeline, Cloudformation, Snowflake, BigQuery and RDS.
●Good Understanding of various ML and Statistical Models and developed POCs to implement them using scikit-learn, AWS Sagemaker.
EDUCATION
University of North Texas, Texas Jan 2023 – May 2024
Master’s in Advanced Data Analytics Engineering - (GPA: 3.8/4.0)
●Relevant Courses: OOPS, Database Management, Agile Software Development, Data Engineering, Python
Indian Institute of Technology Kharagpur, India Jul 2016 – Apr 2020
Bachelor’s in Electronics Engineering - (GPA: 3.7/4.0)
●Relevant Courses: Data Structures, Design & Analysis of Algorithms, Design Patterns, DBMS
CERTIFICATION
●AWS Solution Architect certification, April 2025.
●AWS Cloud Practitioner certification, July 2024.
TECHNICAL SKILLS
Operating Systems
Linux, Windows and Mac OSX.
Programming Language and Skills
Python 3.7, Python 3.9(snowflake-connector/snowpark/Flask/Django/pytest/pyspark/grpc/pandas/matplotlib/selenium/pyTorch/numpy/sklearn/seaborn/requests/beautifulsoup), Airflow, Advanced SQL, C++, LINUX scripting.
Databases
MySQL, SQLAlchemy, SQL Server, PostgreSQL, MongoDb, and Oracle.
Big Data
Redshift, Delta Lake, Spark RDD MapReduce, Airflow, Spark, Kafka, PySpark.
Tools
DBT, Tableau Prep Builder 2022.1, Docker, Kubernetes, Jenkins, PyCharm, File Zilla, Spyder, VS Code, Jupyter Notebook, SnowSQL, Slack, Oracle, JIRA, Confluence, Github, Azure Devops and Version One.
CI/CD Tools
DevOps, Docker, Jenkins, Kubernetes.
AWS
AWS Redshift, S3, AWS EMR, RDS, Glue, Athena, AWS Step Functions, Cloud9, AWS EC2, AWS SageMaker, Boto3 SDK, Lambda, Cloudwatch, SNS, SQS, AWS Lambda, AWS DynamoDB, AWS CloudPipeline, AWS MWAA.
WORK EXPERIENCE
Role: Cloud Data Engineer
Client: Health Stream, Nashville Sep 2024 - Present
Job Description: At Healthstream, I built scalable ETL pipelines using AWS Glue and PySpark to process and integrate Salesforce data into Redshift and Athena. I automated workflows with Airflow, managed Databricks clusters for large-scale analytics, and triggered Glue jobs using AWS Lambda and S3 events. I also containerized applications with Docker and streamlined deployments using Kubernetes and Jenkins.
Responsibilities:
●Deployed and managed Spark clusters on AWS using Python scripts and performed ETL processes with Spark for data processing and validation.
●Built and maintained Docker container clusters orchestrated by Kubernetes, optimizing CI/CD pipelines for build, test, and deployment.
●Automated data ingestion and ETL workflows using Airflow DAGs, integrating data from Salesforce to AWS Redshift and created external tables with partitions using AWS Athena and Redshift.
●Designed and developed ETL pipelines with AWS Glue and used Athena for querying partitioned datasets from S3.
●Implemented Lambda functions to trigger AWS Glue jobs based on S3 events and set up monitoring using CloudWatch for Glue jobs and Lambda functions.
●Developed and managed NoSQL DynamoDB tables and SQL instances on AWS, ensuring efficient CRUD operations and optimized database performance.
●Created and maintained technical documentation for data pipelines, transformation logic, and system architecture using Confluence and internal wikis.
●Designed unit and regression testing strategies for ETL pipelines to ensure data quality and consistency across deployments.
●Designed and implemented distributed ETL pipelines using Python, AWS Lambda, and Amazon Redshift, processing over 1M+ records daily and optimized system architecture to enable high-speed computations and seamless processing of terabytes of data, reducing latency by 40%.
●Built scalable, fault-tolerant data storage systems with advanced indexing and querying capabilities, improving query performance by 35%.
●Developed data pipelines to process real-time and batch datasets, ensuring a 99.9% uptime and scalability to accommodate future data growth.
Environment: Python 3.x, PySpark, AWS Glue, AWS Lambda, AWS Redshift, AWS Athena, AWS S3, AWS CloudWatch, AWS Lambda, AWS EC2, AWS SNS/SQS, Docker, Kubernetes, Airflow, SQL, AWS DynamoDB, CI/CD, Spark RDD, Docker Compose, Shell, Pandas, Jenkins, JavaScript, MapReduce, GitHub, LINUX.
Role: Data Science Engineer
Client: Sterlite Technologies LTD, Delhi, India Jan 2020 – Dec 2022
Job Description: At Sterlite Technologies, I played a crucial role in enhancing data processing and web development. I developed a user-friendly website interface using Python and Django, and created network mapping microservices deployed on Kubernetes. I built RESTful microservices with Python and Flask and developed serverless AWS Lambda functions to enhance processing speed and concurrency.
Responsibilities:
●Performed exploratory data analysis (EDA) using Python and SQL to investigate data anomalies, validate transformations, and support downstream analytics use cases.
●Collaborated with Data Scientists to develop feature-engineered datasets, supporting machine learning model training and deployment workflows.
●Delivered end-to-end Computer Vision-based solutions, including an Automatic Number Plate Recognition System (ANPR) with 95% detection accuracy and a Facial Recognition System (FRS) achieving 98% precision.
●Developed Deep learning-based object detection models for vehicle identification, license plate detection, and optical character recognition (OCR), achieving high accuracy in text extraction from license plates.
●Knowledge in developing RESTful Microservice in python and Flask.
●Learning and Hands-on experience with Amazon EC2, Developed Python AWS serverless lambda with concurrent and multi-threading to make the process faster and asynchronously executing the callable. Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, AWS CodePipeline, AWS EFS and other services of the AWS family.
●Written complex python pandas code to do Data Manipulation and migrated it from pandas in python to koalas in PySpark.
●Evaluated emerging technologies such as DBT and Snowpark to assess feasibility for enhancing existing transformation and modeling frameworks.
●Engineered and deployed machine learning models and used state-of-the-art deep learning models for face detection, person identification, age, gender, and emotion analysis.
Environment: Python 3.x, Django, Shell Scripting, Pandas, PyQuery, Flask, PHP, HTML5, CSS3, MYSQL, GitHub, LINUX, Shell Scripting, Pandas, PySpark, Hadoop, JSON, Jenkins.