Post Job Free
Sign in

Data Engineer Quality

Location:
Hyderabad, Telangana, India
Posted:
October 15, 2025

Contact this candidate

Resume:

RAJYA LAKSHMI ALAPATI

Data Engineer

EMAIL ID: **********@*****.*** PHONE NUMBER: +1-945-***-****

LINKEDIN: linkedin.com/in/rajyalakshmi-alapati-a4780b222 PROFESSIONAL SUMMARY

• Possess over 6 years of professional IT experience as a Data Engineer, specializing in big data, data warehousing, and cloud computing, with hands-on work across GCP, AWS, and Azure platforms.

• Proficient in designing and developing ETL pipelines using tools like Apache Airflow, Apache NiFi, and cloud- native services such as AWS Glue, Azure Data Factory, and GCP Dataflow.

• Strong expertise in programming languages including Python, Java, Scala, and SQL, with a focus on building reusable, scalable, and optimized data transformation scripts and data quality routines.

• Experienced in working with distributed data processing frameworks such as Apache Spark, Hadoop, and Dataproc, efficiently handling high-volume batch and real-time data streams.

• Expertise in cloud data warehousing solutions including Snowflake, Google Big Query, and Amazon Redshift, leveraging these technologies for scalable analytics and business intelligence workflows.

• Adept at implementing data modeling techniques such as star schema, snowflake schema, and 3NF, ensuring normalized data structures and optimized query performance in large-scale systems.

• Skilled in managing both structured and unstructured data, using NoSQL databases like MongoDB, Cassandra, and cloud-native storage solutions such as AWS S3, GCS, and Azure Blob Storage.

• Hands-on experience in data orchestration using Apache Airflow and Prefect, with well-designed DAGs to automate data workflows, monitor performance, and handle failure gracefully.

• Deep understanding of Kafka and data streaming pipelines, with practical implementation of real-time processing using Apache Flink, Kafka Streams, and Spark Structured Streaming.

• Strong knowledge of data quality assurance, including data profiling, validation, cleansing, and lineage tracking, ensuring trusted data pipelines for downstream analytics and reporting.

• Collaborated with cross-functional teams using Agile methodologies, participating in daily standups, sprint planning, backlog grooming, and release cycles, promoting team productivity and delivery excellence.

• Created dashboard visualizations and reports using Power BI, Tableau, and Looker, delivering actionable insights to stakeholders by integrating clean data from various enterprise sources.

• Adept at debugging and performance tuning complex ETL processes and SQL queries, reducing latency, improving throughput, and optimizing resource consumption across cloud and on-prem platforms.

• Solid grasp of software development lifecycle (SDLC), version control using Git, CI/CD practices using Jenkins, and container orchestration via Docker and Kubernetes for deployment of data services.

• Known for excellent problem-solving, critical thinking, time management, and communication skills, with the ability to translate complex technical concepts to non-technical stakeholders and business partners. TECHNICAL SKILLS

• Programming Languages: Python, Java, Scala, SQL, Shell Scripting

• Big Data Technologies: Apache Spark, Hadoop, Hive, Pig, Kafka, Flink

• Cloud Platforms: Google Cloud Platform (GCP), Amazon Web Services (AWS), Microsoft Azure

• Cloud Services: Big Query, Dataflow, Dataproc, AWS S3, Glue, Lambda, Azure Data Factory

• Data Warehousing: Snowflake, Google Big Query, Amazon Redshift, Teradata

• Databases: PostgreSQL, MySQL, Oracle, Cassandra, MongoDB

• ETL Tools & Frameworks: Apache Airflow, Talend, Informatica, AWS Glue, Azure Data Factory

• Data Modeling: Star Schema, Snowflake Schema, Normalization, Dimensional Modeling

• Streaming Technologies: Apache Kafka, Apache Flink, Spark Structured Streaming, Kafka Streams

• Orchestration & Scheduling: Apache Airflow, Prefect, Luigi, Oozie

• Data Visualization: Tableau, Power BI, Looker, Google Data Studio

• Version Control & DevOps: Git, GitHub, Jenkins, Docker, Kubernetes

• Software Development Practices: Agile, Scrum, SDLC, CI/CD, TDD

• Operating Systems: Linux, Unix, Windows

• Other Tools: JIRA, Confluence, Postman, Swagger, Visual Studio Code, PyCharm PROFESSIONAL EXPERIENCE

UnitedHealth Group (Jan 2024 – Present)

Data Engineer

• Developed and maintained complex ETL pipelines using Azure Data Factory, ensuring efficient extraction, transformation, and loading of healthcare data while adhering to data governance and security protocols standards.

• Leveraged advanced SQL skills to optimize queries for large-scale data stored in Azure Synapse Analytics, improving performance and reducing runtime by over 30% for critical reporting workflows.

• Designed scalable data warehousing solutions using Azure Synapse and Azure Data Lake Storage, enabling centralized access to massive datasets for analytics and business intelligence purposes.

• Utilized Apache Spark and Hadoop frameworks to process and analyze large volumes of healthcare data, writing custom transformation jobs in Scala and Python to meet data engineering goals.

• Automated workflow scheduling and monitoring with Apache Airflow, integrating job dependencies and alert mechanisms to ensure high reliability and timely completion of data pipeline executions.

• Implemented real-time data streaming solutions using Apache Kafka, building fault-tolerant pipelines to capture and process patient and operational data across multiple healthcare systems.

• Developed robust Bash and Python scripts for pipeline automation, data validation, and system monitoring, improving developer efficiency and data quality checks.

• Collaborated with DevOps teams to establish CI/CD pipelines using Jenkins and GitHub Actions, enabling automated testing and deployment of data engineering code with rapid turnaround.

• Ensured strict adherence to data security protocols and HIPAA compliance by implementing role-based access controls and encryption on sensitive healthcare datasets stored in Azure Blob Storage.

• Applied comprehensive knowledge of the US healthcare domain, including regulatory requirements and common data formats like HL7 and FHIR, to tailor data solutions that support healthcare operations and reporting.

• Created interactive dashboards and visualizations using Power BI, providing business stakeholders with actionable insights into patient outcomes, claims processing, and operational efficiency.

• Participated in Agile ceremonies, including daily stand-ups and sprint retrospectives, collaborating closely with cross- functional teams to iteratively deliver high-quality data products.

• Designed and developed RESTful APIs using .NET and Java frameworks to enable secure, scalable data access for internal applications and external healthcare partners.

• Proactively performed data profiling and quality checks, identifying anomalies and inconsistencies early, leading to improvements in the accuracy and trustworthiness of healthcare data assets.

• Documented data engineering processes, pipeline architecture, and security controls thoroughly, facilitating knowledge sharing across teams and ensuring smooth onboarding of new members. State of Florida (March 2022 – August 2023)

Data Engineer

• Developed efficient ETL pipelines using Apache Spark and custom Python scripts to extract, transform, and load large volumes of state data into Snowflake and Amazon Redshift data warehouses, ensuring data quality and data governance compliance.

• Utilized SQL extensively for complex querying, data transformation, and performance tuning within various relational databases like PostgreSQL and MySQL, optimizing data retrieval for state analytics and reporting needs.

• Designed and implemented robust data models using star schema, dimensional modeling, and normalization techniques to support efficient storage and fast querying in the state’s enterprise data platform.

• Managed cloud infrastructure across AWS, Microsoft Azure, and Google Cloud Platform (GCP), provisioning scalable compute and storage resources to support big data processing workloads and data pipeline automation.

• Built automated data pipelines and workflows using Apache Airflow, enabling reliable scheduling, monitoring, and dependency management of critical data processing jobs across cloud and on-premise environments.

• Developed data integration solutions to consolidate data from multiple state agencies and external sources, employing RESTful APIs, flat files, and data streaming technologies to achieve a unified data platform.

• Collaborated closely with data scientists and machine learning teams to build data pipelines supporting predictive analytics, integrating training datasets and managing feature engineering workflows.

• Created insightful dashboards and reports using Power BI and Tableau, presenting key performance indicators and trends to state stakeholders to inform policy decisions and resource allocation.

• Optimized database management by tuning PostgreSQL, MySQL, and NoSQL databases such as MongoDB and Cassandra, improving query performance and ensuring high availability of critical data services for state applications.

• Applied data governance principles, implementing data cataloging, lineage tracking, and security protocols to maintain data compliance and regulatory adherence within the state data environment.

• Automated deployment and version control of data engineering code using Git, integrating CI/CD pipelines with tools like Jenkins to ensure consistent delivery of enhancements and bug fixes across multiple environments.

• Participated actively in Agile teams, engaging in sprint planning, backlog refinement, and retrospectives, fostering collaboration and continuous improvement in project execution.

• Developed and maintained Java applications and microservices for data ingestion and transformation, ensuring reliable and scalable data flows across distributed systems.

• Employed problem-solving skills to troubleshoot complex data issues, collaborating with cross-functional teams to resolve bottlenecks and improve overall data pipeline stability.

• Documented technical workflows, data models, and architecture diagrams thoroughly, promoting knowledge sharing and smooth onboarding for new engineers within the Florida data team. Citizens Bank (April 2019 – February 2022)

Data Engineer

• Developed and maintained robust ETL/ELT pipelines using Informatica PowerCenter, IICS, AWS Glue, and DBT to ensure seamless data integration and high-quality data delivery across banking systems.

• Executed complex SQL queries for data extraction, transformation, and performance optimization on large datasets stored in Amazon Redshift, Snowflake, and Teradata data warehouses.

• Designed and implemented scalable data models using both dimensional modeling and relational modeling principles to support business intelligence and reporting initiatives.

• Managed cloud infrastructure primarily on AWS, leveraging services such as S3 for storage, Redshift for data warehousing, Glue for ETL processes, and Athena for serverless querying.

• Utilized big data frameworks such as Apache Hadoop (including HDFS, Hive, Pig, HBase, and HiveQL) and Apache Spark to process and analyze massive volumes of transactional and customer data.

• Built and maintained real-time data streaming solutions with Apache Kafka, enabling low-latency data ingestion and processing for critical banking operations.

• Enforced strict data governance policies by implementing data quality checks, lineage tracking, and compliance with regulations like GDPR and CCPA to ensure data protection and auditability.

• Performed performance tuning on data pipelines and database objects, optimizing query execution plans and reducing data processing times by up to 40% for key workflows.

• Automated deployment workflows and code versioning using GitLab, integrating CI/CD pipelines to accelerate release cycles and ensure reliable, repeatable deployments. Collaborated with cross-functional teams to develop Python and Java scripts for data manipulation, automation, and integration with banking applications and APIs.

• Supported the migration of legacy data systems to cloud platforms by designing scalable architectures that leveraged Snowflake, AWS Redshift, and Oracle databases.

• Designed and implemented robust data security mechanisms, including role-based access controls and encryption, to protect sensitive financial and customer information.

• Developed dashboards and visual reports using tools like Tableau and Power BI to deliver actionable insights to banking leadership and operational teams.

• Worked within Agile development teams, participating in sprint planning, backlog grooming, and daily standups to deliver high-quality data engineering solutions on time.

• Documented technical designs, data models, pipeline workflows, and compliance protocols, enabling efficient knowledge transfer and team scalability within the data engineering department. EDUCATION

Master of Science in Computer Information Science

Southern Arkansas University, Magnolia, Arkansas— 2025 CERTIFICATION

• Google Cloud Professional Data Engineer

• Microsoft Certified: Azure Data Engineer Associate



Contact this candidate