Post Job Free
Sign in

Data Engineer Quality

Location:
Hillsboro, OR
Salary:
90000
Posted:
September 10, 2025

Contact this candidate

Resume:

Devasena Malli

Email: *********@*****.*** Ph: +1-341-***-**** LinkedIn: devasenamalli

PROFESSIONAL SUMMARY:

• Data Engineer with 6+ years of experience in designing, developing, and maintaining scalable data pipelines and architectures using AWS services, Python, PySpark and Apache Kafka.

• Proficient in Python, with expertise in object-oriented programming, designing reusable, modular classes.

• Experience with Apache Kafka, including setting up consumers, producers, and S3 sinks for streaming data pipelines, good understanding in Amazon MSK for scalable and secure Kafka deployments.

• Designed and implemented real-time and batch ETL data pipelines using AWS Glue, Lambda, and Step Functions, handling both structured and semi-structured data

• Skilled in creating efficient data ingestion workflows, implementing incremental loads, data quality checks, and integrating streaming data sources with data lakes and data warehouses.

• Deep understanding of modern data lake architectures using AWS S3, optimized for raw and curated data storage, and downstream analytics.

• Expertise in PySpark for querying, transforming, and optimizing data in Redshift, Snowflake, and Athena, ensuring high performance and reliability.

• Experienced in containerizing workflows using Docker using EC2 for scalable, fault-tolerant deployment.

• Built and maintained interactive dashboards with AWS Quick Sight to support data-driven decision-making for business stakeholders.

• Adept at securing AWS env with IAM roles and policies, enforcing best practices in data security & compliance.

• Proficient in monitoring and alerting data workflows using AWS CloudWatch and other tools to ensure operational excellence and high availability.

• Developed robust web applications with Django, focusing on backend logic, API integrations, and secure user authentication to support data platform interfaces and internal tools.

• Experienced with Git for version control, collaboration, and supporting Agile/Scrum workflows. TECHNICAL SKILLS:

Programming Python, PySpark, SQL, Shell Script

Cloud Amazon Web Services

Data Orchestration AWS Glue, Step Function, EMR

Data Warehousing Redshift, Snowflake

Databases MySQL, PostgreSQL, MongoDB, DynamoDB

BI/Visualization Tableau, Power BI, Quick Sight, Splunk Analysis Tools SQL, Pandas, JIRA, MS Excel

Big Data Spark, Kafka

DevOps & CI/CD Docker, Git, GitHub, Gitlab, Jenkins, Postman PROFESSIONAL EXPERIENCE:

Client: Advent HealthCare – Altamonte Springs, FL (Remote) Nov 2023 – Till date Role: AWS Data Engineer

Responsibilities:

● Developed and maintained scalable data pipelines to extract, transform, and load (ETL) data from multiple sources into Amazon S3, ensuring data integrity and availability for downstream analytics.

● Designed and managed IAM roles and policies to enforce security best practices, ensuring least- privilege access across AWS services.

● Built complex data transformations in PySpark, improving the efficiency and scalability of large-scale data processing workflows.

● Utilized Python in Spark applications to optimize performance for high-volume ETL tasks, reducing job execution time by 30%.

● Monitored and optimized workflows using AWS CloudWatch, improving job reliability and ensuring high availability across Glue and Lambda jobs.

● Developed interactive dashboards in AWS Quick Sight, enabling business teams to visualize key KPIs and drive data-informed decisions.

● Integrated AWS Glue Schema Registry to handle schema drift, reducing ETL job failures by 92% and ensuring 99.9% data completeness in Athena queries.

● Implemented JSON Schema validation and Lambda-based preprocessing, filtering malformed records before ETL to improve pipeline stability.

● Enforced data governance using AWS Lake Formation, including schema versioning and access control, achieving 100% compliance with HIPAA audits and 0 audit findings. Env - Python PySpark AWS Glue MongoDB MySQL Athena Lambda S3 Step function Redshift CloudWatch SNS Databricks Git GitLab

Client: Global Atlantic financial group, Indianapolis, IN (Remote) Jan 2022 – Dec 2022 Role: AWS Data Engineer

Responsibilities:

● Developed and maintained data ingestion pipelines for processing structured and semi-structured data from MySQL, MongoDB, and Kafka sources.

● Utilized AWS Lambda functions to securely extract data from MySQL database credentials.

● Implemented data ingestion from MongoDB using Python and stored the extracted JSON data into Amazon S3 buckets for further processing.

● Built Kafka consumers to consume real-time messages from Kafka topics and ingested them into S3.

● Leveraged Boto3 to connect AWS Lambda with S3 for file uploads and organizing data in partitioned folders.

● Automated data ingestion tasks using CloudWatch triggers and scheduled events for time-based execution.

● Ensured secure handling of credentials and connection info using env variables and IAM roles in Lambda.

● Followed best practices for logging and error handling to track ingestion failures and retries via CloudWatch.

● Collaborated with the ETL and analytics teams to ensure ingested data met schema requirements and format standards for downstream Glue processing.

Env - Python PySpark AWS Glue MongoDB MySQL Athena Lambda S3 Kafka Databricks Git Client: COMCAST Philadelphia, PA (Remote) May 2020 – Dec 2021 Role: Data Engineer

Responsibilities:

● Designed and secured a scalable data pipeline to process IoT telemetry data in support of hourly service-level agreements (SLAs).

● Utilized Amazon S3 as the primary data source, processing and transforming data using AWS Glue, and storing the output in internal storage systems.

● Implemented Amazon CloudWatch logging and configured SNS alerts for real-time pipeline monitoring and incident response.

● Applied IAM roles and policies to enforce fine-grained access control, ensuring secure data processing and compliance with internal data governance standards.

● Mitigate straggler tasks Implemented schema evolution handling for dynamic telemetry fields using: Glue Schema Registry for backward compatibility

● Glue Dynamic Frames to merge schema versions Reduced infrastructure. Env - Python PySpark AWS Glue MongoDB MySQL Lambda S3 Jupiter NoteBook Git Client: COMCAST Philadelphia, PA May 2018 – Apr 2020 Role: Web Application Developer

Responsibilities:

● Developed responsive Django-based web applications using Python, HTML5, CSS3, and Bootstrap.

● Maintained accessibility and security compliance with Single Sign On

● Implemented caching mechanisms to reduce redundant API calls.

● Enhanced query performance by optimizing sqlite3 operations, resulting in faster data retrieval.

● Built dashboards integrating real-time data visualization, facilitating quicker and more informed user interactions.

● For data visualization used Numpy and Panad

● Implemented back-end functionality by utilizing Django and MySQL data base. Env - Python Html CSS sqlite3 Splunk Django Framework VS Code Git GitHub EDUCATION:

● Master of Computer Applications from Vikrama Simhapuri University - Nellore, India.

● Bachelor of Computer Science from Sri Venkateswara University - Tirupati, India



Contact this candidate