Meghana Kolluri
Senior Data Engineer
****************@*****.*** +1-817-***-**** LinkedIn
Professional Summary
Senior Data Engineer with 6+ years of experience designing and implementing scalable data pipelines, cloud-native architecture, and big data solutions across healthcare, financial services, and e-commerce domains. I am an expert in ETL/ELT, AWS, Azure, Apache Spark, Kafka, and Hadoop. Proven experience architecting data lakes and data warehouses on Snowflake, Amazon Redshift, and Azure Synapse Analytics, optimized for performance, scalability, and cost efficiency. Strong proficiency in Python, PySpark, and SQL, with deep expertise in real-time streaming, data modeling, and workflow orchestration. Experienced in data quality frameworks, HIPAA compliance, and delivering analytics-ready datasets for BI and machine learning initiatives. Technical Skills
Programming Languages: Python, SQL, PySpark, Scala, Shell Scripting Cloud Platforms - AWS: Glue, EMR, S3, Redshift, Lambda, Kinesis, DynamoDB, Athena, Step Functions, CloudWatch, IAM Cloud Platforms - Azure: Data Factory, Databricks, Data Lake Storage, Synapse Analytics, Blob Storage, Event Hub Big Data Technologies: Apache Spark, PySpark, Hadoop, HDFS, Hive, Kafka, Databricks, Delta Lake Data Warehousing: Snowflake, Amazon Redshift, Azure Synapse Analytics, Teradata Databases: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, DynamoDB ETL/ELT Tools: AWS Glue, Apache Airflow, Azure Data Factory, Informatica, Talend, SSIS Streaming Technologies: Apache Kafka, AWS Kinesis, Spark Streaming, Azure Event Hub Data Formats: Parquet, Avro, ORC, JSON, CSV, XML
DevOps & CI/CD: Docker, Kubernetes, Jenkins, Git, GitHub, Terraform, Azure DevOps Data Quality: Great Expectations, Deequ, Apache Griffin Data Visualization: Tableau, Power BI, Looker
Methodologies: Agile/Scrum, DataOps, CI/CD
Professional Experience
FedEx Dataworks July 2023 – Present Memphis, TN Senior Data Engineer
● Designed, developed, and maintained ETL workflows using AWS Glue and Python, integrating diverse data sources, including APIs, flat files, and relational databases.
● Improved ETL performance by 15% by optimizing Glue scripts with efficient data transformations and Spark SQL.
● Automated data pipelines using AWS Lambda, Glue, and Step Functions, ensuring seamless processing and reducing operational errors.
● Configured and optimized AWS Cloud Services, including S3, EC2, RDS, and Glue Tables, for scalable and secure backend systems.
● Queried and integrated data using Amazon Athena, processing complex datasets and enabling advanced analytics for business insights.
● Built and managed CI/CD pipelines with Git, GitHub Actions, and Jenkins, streamlining deployments and infrastructure automation.
● Implemented automated testing frameworks for data validation, ensuring the accuracy and reliability of workflows.
● Monitored and debugged production pipelines using AWS CloudWatch, SNS, and SQS, ensuring minimal downtime and SLA adherence.
● Developed real-time data pipelines using Python and PySpark, enabling high-performance backend and middle-tier operations.
● Designed and implemented orchestration workflows with Apache Airflow and Step Functions to streamline ETL processes.
● Created and maintained Glue Tables for efficient data cataloging and querying across multiple pipelines.
● Collaborated with data architects to modernize enterprise data systems, integrating PostgreSQL (Amazon RDS) for enhanced relational database support.
● Enhanced search functionality by implementing semantic search solutions, including vector embeddings and similarity scoring.
● Integrated file-consuming workflows using AWS Glue and Athena, ensuring efficient processing and storage in Amazon S3.
● Debugged and resolved issues in production pipelines using AWS tools, ensuring data consistency and operational reliability.
● Configured AWS CloudFormation templates to automate infrastructure provisioning for EMR clusters and Lambda functions.
● Participated in Agile methodologies, including sprint planning and retrospectives, to ensure timely delivery of development milestones.
● Designed and developed real-time Power BI dashboards with data sourced from AWS services for actionable insights.
● Built scalable backend systems to support enterprise compliance modernization, leveraging AWS technologies.
● Processed and validated large-scale datasets using PySpark, ensuring high levels of data accuracy and integrity.
● Supported production environments by troubleshooting pipeline failures, resolving data inconsistencies, and enhancing workflow stability.
Environment: AWS (Glue, Lambda, S3, Glue Tables, Athena, StepFunctions, EC2, RDS, CloudFormation, CloudWatch, SNS, SQS), Python, PySpark, Jenkins, GitHub Actions, Git, Apache Airflow, Postgres (RDS), Power BI, Agile eHealth Insurance Oct 2020 – Dec 2021 Hyderabad, India Data Engineer
● Architected scalable ETL pipelines using AWS Glue, Apache Spark, and Python, processing large-scale healthcare datasets
(claims, enrollment, provider data) and loading into Snowflake data warehouse.
● Built real-time streaming pipelines using Apache Kafka and Spark Streaming, enabling near real-time analytics for care management and operational reporting.
● Designed dimensional data models in Snowflake, implementing star schema, fact tables, and dimension tables to optimize query performance.
● Implemented data quality frameworks using Great Expectations and Python, ensuring data accuracy, completeness, and consistency across healthcare systems.
● Integrated FHIR APIs and HL7 feeds using AWS Lambda and API Gateway, improving clinical data interoperability from EMR systems.
● Developed automated workflows using Apache Airflow, ensuring dependency management, fault tolerance, and reliable pipeline execution.
● Implemented HIPAA-compliant security controls using AWS IAM, encryption at rest and in transit, and audit logging to protect PHI.
● Optimized Spark jobs using partitioning, broadcast joins, and caching, significantly improving processing performance.
● Collaborated with data scientists and analysts to deliver analytics-ready datasets supporting ML models, risk scoring, and readmission prediction.
● Built scalable ETL workflows using PySpark, supporting insurance quotes, policy conversions, and claims analytics.
● Created dimensional data models with Slowly Changing Dimensions (SCD Type 2) for historical and regulatory reporting.
● Implemented data validation and quality checks using Python and PySpark.
● Optimized Spark workloads using caching, broadcast joins, and resource tuning to reduce processing time and cost.
● Partnered with business stakeholders to deliver analytics-ready datasets for decision-making. Environment: Python, PySpark, SQL, AWS (Glue, S3, Redshift, Lambda, Kinesis), Snowflake, Apache Spark, Apache Kafka, Apache Airflow, Great Expectations, HIPAA Compliance, Healthcare Data. eBay May 2019 – Oct 2020 Bangalore, India
Data Engineer
● Developed ETL pipelines using Python and Apache Spark processing e-commerce transaction data, user behavior analytics, and product catalog information.
● Built data processing workflows using Hadoop and Hive analyzing large-scale marketplace data supporting seller analytics and fraud detection systems.
● Implemented streaming data pipelines using Apache Kafka processing real-time events including listings, bids, and transactions enabling operational dashboards.
● Developed data warehouse models implementing dimensional schemas for product performance, seller metrics, and customer engagement analytics.
● Created data validation frameworks using Python ensuring data quality across transaction systems and analytics platforms.
● Developed reporting solutions by integrating Amazon Redshift with Tableau, delivering BI dashboards on loan volumes, approval rates, and portfolio performance.
● Optimized Hive queries and Spark jobs implementing partitioning and bucketing strategies improving processing performance for e-commerce analytics.
● Built automated reporting pipelines delivering daily metrics on marketplace activity, transaction volumes, and seller performance.
● Collaborated with product teams and data scientists providing clean datasets for recommendation systems and search optimization algorithms.
Environment: Python, PySpark, SQL, Hadoop, Hive, Apache Spark, Apache Kafka, HDFS, E-commerce Analytics. Education
Master of Science in Computer Science
Rivier University, Nashua, NH