Devasena Malli
Email: *********@*****.*** Ph: +1-341-***-**** LinkedIn: devasenamalli
PROFESSIONAL SUMMARY:
• Data Engineer with 6+ years of experience in designing, developing, and maintaining scalable data pipelines and architectures using AWS services, Python, PySpark and Apache Kafka.
• Proficient in Python, with expertise in object-oriented programming, designing reusable, modular classes.
• Experience with Apache Kafka, including setting up consumers, producers, and S3 sinks for streaming data pipelines, good understanding in Amazon MSK for scalable and secure Kafka deployments.
• Designed and implemented real-time and batch ETL data pipelines using AWS Glue, Lambda, and Step Functions, handling both structured and semi-structured data
• Skilled in creating efficient data ingestion workflows, implementing incremental loads, data quality checks, and integrating streaming data sources with data lakes and data warehouses.
• Deep understanding of modern data lake architectures using AWS S3, optimized for raw and curated data storage, and downstream analytics.
• Expertise in PySpark for querying, transforming, and optimizing data in Redshift, Snowflake, and Athena, ensuring high performance and reliability.
• Experienced in containerizing workflows using Docker using EC2 for scalable, fault-tolerant deployment.
• Built and maintained interactive dashboards with AWS Quick Sight to support data-driven decision-making for business stakeholders.
• Adept at securing AWS env with IAM roles and policies, enforcing best practices in data security & compliance.
• Proficient in monitoring and alerting data workflows using AWS CloudWatch and other tools to ensure operational excellence and high availability.
• Developed robust web applications with Django, focusing on backend logic, API integrations, and secure user authentication to support data platform interfaces and internal tools.
• Experienced with Git for version control, collaboration, and supporting Agile/Scrum workflows. TECHNICAL SKILLS:
Programming Python, PySpark, SQL, Shell Script
Cloud Amazon Web Services
Data Orchestration AWS Glue, Step Function, EMR
Data Warehousing Redshift, Snowflake
Databases MySQL, PostgreSQL, MongoDB, DynamoDB
BI/Visualization Tableau, Power BI, Quick Sight, Splunk Analysis Tools SQL, Pandas, JIRA, MS Excel
Big Data Spark, Kafka
DevOps & CI/CD Docker, Git, GitHub, Gitlab, Jenkins, Postman PROFESSIONAL EXPERIENCE:
Client: Advent HealthCare – Altamonte Springs, FL (Remote) Nov 2023 – Till date Role: AWS Data Engineer
Responsibilities:
● Developed and maintained scalable data pipelines to extract, transform, and load (ETL) data from multiple sources into Amazon S3, ensuring data integrity and availability for downstream analytics.
● Designed and managed IAM roles and policies to enforce security best practices, ensuring least- privilege access across AWS services.
● Built complex data transformations in PySpark, improving the efficiency and scalability of large-scale data processing workflows.
● Utilized Python in Spark applications to optimize performance for high-volume ETL tasks, reducing job execution time by 30%.
● Monitored and optimized workflows using AWS CloudWatch, improving job reliability and ensuring high availability across Glue and Lambda jobs.
● Developed interactive dashboards in AWS Quick Sight, enabling business teams to visualize key KPIs and drive data-informed decisions.
● Integrated AWS Glue Schema Registry to handle schema drift, reducing ETL job failures by 92% and ensuring 99.9% data completeness in Athena queries.
● Implemented JSON Schema validation and Lambda-based preprocessing, filtering malformed records before ETL to improve pipeline stability.
● Enforced data governance using AWS Lake Formation, including schema versioning and access control, achieving 100% compliance with HIPAA audits and 0 audit findings. Env - Python PySpark AWS Glue MongoDB MySQL Athena Lambda S3 Step function Redshift CloudWatch SNS Databricks Git GitLab
Client: Global Atlantic financial group, Indianapolis, IN (Remote) Jan 2022 – Dec 2022 Role: AWS Data Engineer
Responsibilities:
● Developed and maintained data ingestion pipelines for processing structured and semi-structured data from MySQL, MongoDB, and Kafka sources.
● Utilized AWS Lambda functions to securely extract data from MySQL database credentials.
● Implemented data ingestion from MongoDB using Python and stored the extracted JSON data into Amazon S3 buckets for further processing.
● Built Kafka consumers to consume real-time messages from Kafka topics and ingested them into S3.
● Leveraged Boto3 to connect AWS Lambda with S3 for file uploads and organizing data in partitioned folders.
● Automated data ingestion tasks using CloudWatch triggers and scheduled events for time-based execution.
● Ensured secure handling of credentials and connection info using env variables and IAM roles in Lambda.
● Followed best practices for logging and error handling to track ingestion failures and retries via CloudWatch.
● Collaborated with the ETL and analytics teams to ensure ingested data met schema requirements and format standards for downstream Glue processing.
Env - Python PySpark AWS Glue MongoDB MySQL Athena Lambda S3 Kafka Databricks Git Client: COMCAST Philadelphia, PA (Remote) May 2020 – Dec 2021 Role: Data Engineer
Responsibilities:
● Designed and secured a scalable data pipeline to process IoT telemetry data in support of hourly service-level agreements (SLAs).
● Utilized Amazon S3 as the primary data source, processing and transforming data using AWS Glue, and storing the output in internal storage systems.
● Implemented Amazon CloudWatch logging and configured SNS alerts for real-time pipeline monitoring and incident response.
● Applied IAM roles and policies to enforce fine-grained access control, ensuring secure data processing and compliance with internal data governance standards.
● Mitigate straggler tasks Implemented schema evolution handling for dynamic telemetry fields using: Glue Schema Registry for backward compatibility
● Glue Dynamic Frames to merge schema versions Reduced infrastructure. Env - Python PySpark AWS Glue MongoDB MySQL Lambda S3 Jupiter NoteBook Git Client: COMCAST Philadelphia, PA May 2018 – Apr 2020 Role: Web Application Developer
Responsibilities:
● Developed responsive Django-based web applications using Python, HTML5, CSS3, and Bootstrap.
● Maintained accessibility and security compliance with Single Sign On
● Implemented caching mechanisms to reduce redundant API calls.
● Enhanced query performance by optimizing sqlite3 operations, resulting in faster data retrieval.
● Built dashboards integrating real-time data visualization, facilitating quicker and more informed user interactions.
● For data visualization used Numpy and Panad
● Implemented back-end functionality by utilizing Django and MySQL data base. Env - Python Html CSS sqlite3 Splunk Django Framework VS Code Git GitHub EDUCATION:
● Master of Computer Applications from Vikrama Simhapuri University - Nellore, India.
● Bachelor of Computer Science from Sri Venkateswara University - Tirupati, India