MANASA KONAGANTI
Senior Data Engineer
Contact: +1-469-***-**** Email:*****************@*****.***
LinkedIn : https://www.linkedin.com/in/manasa-reddy-6a52b724b/
Professional Summary:
Data Engineer with 7+ years of experience in designing, building, and maintaining scalable data infrastructure and ETL/ELT pipelines across cloud and on-premises environments, supporting analytics and machine learning initiatives for enterprise-level organizations.
Extensive experience with Big Data technologies including Apache Spark, Hadoop ecosystem (HDFS, MapReduce, Hive), Apache Kafka, Apache Airflow, and Databricks for processing large-scale datasets and real-time streaming analytics.
Proficient in cloud data platforms such as AWS (S3, Redshift, EMR, Glue, Kinesis, Lambda), Azure (Data Factory, Synapse Analytics, Data Lake Storage, Event Hubs), and Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage) for building modern data architectures.
Snowflake experience in building modern data transformation pipelines using dbt for SQL-based transformations, testing, and documentation while leveraging Snowflake's cloud data warehouse capabilities for scalable analytics workloads.
Strong programming skills in Python, Scala, Java, SQL, and R with expertise in data manipulation libraries including Pandas, NumPy, PySpark, Dask, and Apache Spark SQL for data processing and analysis.
Database expertise across multiple systems including PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, Redis, Snowflake, and Amazon DynamoDB with experience in database design, optimization, and migration strategies.
Data warehouse and data lake architecture experience using Amazon Redshift, Google BigQuery, Snowflake, Azure Synapse Analytics implementing dimensional modeling, star/snowflake schemas, and modern lakehouse architectures.
Advanced ETL/ELT pipeline development using tools like Apache Airflow, Prefect, Luigi, AWS Glue, Azure Data Factory, Talend, and Informatica with focus on data quality, monitoring, and error handling mechanisms.
Real-time data streaming expertise with Apache Kafka, Amazon Kinesis, Azure Event Hubs, Apache Pulsar, and Apache Storm for building event-driven architectures and real-time analytics platforms.
Container and orchestration technologies including Docker, Kubernetes, Apache Mesos for deploying and scaling data applications with experience in Helm charts and container registries.
Infrastructure as Code (IaC) proficiency using Terraform, CloudFormation, Azure Resource Manager (ARM), and Ansible for automated provisioning and management of data infrastructure.
Data modeling and schema design experience with dimensional modeling, data vault methodology, and modern approaches like dbt (data build tool) for maintaining data lineage and implementing data transformations.
CI/CD pipeline implementation for data engineering workflows using Jenkins, GitLab CI, Azure DevOps, GitHub Actions with automated testing, deployment, and monitoring of data pipelines.
Data governance and quality frameworks implementation including Apache Atlas, AWS Data Catalog, Azure Purview, Great Expectations for metadata management, data lineage tracking, and quality monitoring.
Performance optimization expertise in query tuning, partitioning strategies, indexing, and caching mechanisms across various database systems and big data platforms to handle high-volume data processing.
Machine Learning pipeline integration experience working with MLflow, Kubeflow, SageMaker, Azure ML to build end-to-end ML data pipelines and feature stores for data science teams.
Monitoring and observability tools proficiency including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Datadog, and CloudWatch for pipeline monitoring and troubleshooting.
Agile and DevOps methodologies experience working in Scrum, Kanban environments with strong collaboration skills across data science, analytics, and engineering teams to deliver data-driven solutions
Technical Skills:
Category
Technologies/Tools
Big Data Frameworks
Apache Spark, Hadoop (HDFS, MapReduce, Hive), Databricks
Cloud Platforms
AWS (S3, Redshift, EMR, Glue, Kinesis, Lambda), Azure (Data Factory, Synapse Analytics, Data Lake Storage), GCP (BigQuery, Pub/Sub), Snowflake
Programming Languages
Python, Scala, Java, SQL, Shell Scripting
Data Processing Libraries
PySpark, Pandas, NumPy, Apache Spark SQL
Streaming Technologies
Apache Kafka, Amazon Kinesis, Azure Event Hubs, Confluent Platform
ETL/ELT Tools
Apache Airflow, AWS Glue, Azure Data Factory, dbt
Data Warehouses
Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics
Data Lakes/Storage
Delta Lake, Amazon S3, Azure Data Lake Storage, Google Cloud Storage
Infrastructure as Code
Terraform
CI/CD Tools
Jenkins, GitLab CI, Azure DevOps, GitHub Actions
Version Control
Git, GitHub, GitLab, Bitbucket, Azure Repos
Monitoring & Observability
Prometheus, ELK Stack, CloudWatch
Development IDEs
PyCharm, IntelliJ IDEA, VS Code
Workflow Orchestration
Apache Airflow, AWS Step Functions, Azure Logic Apps
Data Quality Tools
Great Expectations, dbt tests
Methodologies
Agile, Scrum, Kanban
Professional Experience:
Client: - General Motors, Columbus, OH July 2023 - Present Senior Data Engineer
Responsibilities:
Architected enterprise-wide data lake platform using AWS S3, Glue, and EMR to process 500TB+ of insurance claims data, implementing Delta Lake for ACID transactions and enabling real-time analytics for risk assessment models
Designed and implemented real-time streaming pipelines using Apache Kafka and Apache Flink to process 10M+ daily transactions from policy management systems, reducing data latency from hours to sub-second for fraud detection algorithms
Led data warehouse modernization initiative migrating legacy systems to Snowflake, implementing dbt for transformation logic and establishing data governance frameworks that improved query performance by 75% and reduced infrastructure costs by 40%
Implemented advanced data observability platform using Great Expectations with custom monitoring solutions, reducing data quality incidents by 80% and establishing automated data lineage tracking across 150+ data pipelines
Optimized big data processing workflows using Apache Spark on Databricks with PySpark and Scala, processing insurance claims data 10x faster through partitioning strategies and adaptive query execution
Established CI/CD practices for data pipelines using Jenkins, GitLab, and Terraform with automated testing and infrastructure as code, reducing deployment time from days to hours
Collaborated with data science teams to build feature stores and real-time inference pipelines supporting machine learning models for customer segmentation and risk analysis
Designed event-driven architecture using Apache Kafka Connect and AWS Kinesis to capture real-time policy changes and claims events for immediate processing and alerting
Environment: Python, Apache Spark, Databricks, Snowflake, dbt, Apache Kafka, Apache Flink, AWS (S3, EMR, Glue, Kinesis, Lambda), Apache Airflow, SageMaker, Terraform, Docker, Kubernetes, PostgreSQL, Delta Lake
Client: Cenit IT Hub Pvt Ltd, Hyderabad, India Sept 2018 - July 2022
Data Engineer
Responsibilities:
Developed scalable ETL pipelines using Apache Airflow and Python to process 100GB+ daily data from multiple e-commerce sources, implementing error handling and data quality validation for production reliability
Built data warehouse solution on Amazon Redshift with optimized table design, compression strategies, and query optimization, improving dashboard load times by 60% and supporting analytics for 1M+ customers
Implemented streaming data processing using Apache Kafka and Spark Streaming to handle real-time inventory updates and customer behaviour tracking, enabling real-time recommendation engines
Designed dimensional data models for retail analytics using star schema methodology, creating fact and dimension tables that support complex analytical queries for business intelligence reporting
Created automated data quality monitoring using Python scripts and SQL to validate daily data loads, implement data profiling, and alert on anomalies, establishing foundational data governance practices
Developed API data integration solutions using Python and REST APIs to extract data from CRM, ERP, and third-party systems, consolidating data into centralized analytical databases
Built NoSQL data solutions using MongoDB and Cassandra for handling semi-structured customer interaction data and product catalogue information, implementing appropriate data modeling strategies
Implemented CI/CD for data workflows using Jenkins with automated testing and deployment, reducing manual deployment efforts and eliminating configuration errors
Optimized database performance on PostgreSQL and MySQL through indexing strategies, query tuning, and partitioning, improving analytical query response times by 50%
Collaborated with business stakeholders to gather requirements and translate business needs into technical data solutions, supporting cross-functional analytics initiatives
Environment: Python, Apache Airflow, Apache Spark, Apache Kafka, Amazon Redshift, PostgreSQL, MySQL, MongoDB, Cassandra, AWS (S3, EC2, RDS), Jenkins, Docker, Pandas, SQL
Client:Indus Business Systems Ltd, India Jan 2018 – Aug 2018
Data Engineer
Responsibilities:
Built foundational ETL processes using Python and Pandas to extract data from MySQL and Oracle databases, transforming and loading data into centralized reporting database for business analytics
Created automated reporting solutions using Python scripts and SQL queries to replace manual Excel-based processes, saving 15+ hours per week and improving data accuracy for financial reporting
Implemented basic data quality checks using SQL and Python to validate data consistency and completeness, creating alerts for data anomalies and establishing data monitoring practices
Supported database administration tasks including backup and recovery procedures for
PostgreSQL and MySQL, ensuring data availability and implementing basic security measures
Created documentation for data sources, transformations, and business logic, establishing knowledge base for data lineage and supporting future development efforts
Participated in requirements gathering sessions with business users to understand reporting needs and translate them into technical data solutions
Developed basic web scraping solutions using Python and BeautifulSoup to collect external market data for competitive analysis and enrichment of internal datasets
Environment: Python, Pandas, PostgreSQL, MySQL, Oracle, Tableau, SQL, BeautifulSoup, Excel, Basic ETL tools
Client: Standalone IT Solutions, Hyderabad, India July 2016 - Dec 2017
Software Developer
Responsibilities:
Performed data extraction and analysis using SQL queries on Oracle 9i to support business reporting requirements and ad-hoc analytical requests from various departments
Created basic data reports using Excel and SQL to track business KPIs, customer metrics, and operational performance, providing insights to management teams
Developed simple data validation scripts using SQL to ensure data quality and identify inconsistencies in transactional systems, supporting data cleanup initiatives
Assisted in manual testing of data-related applications and documented test results, gaining foundational understanding of data quality and testing methodologies
Supported data migration projects by writing SQL scripts to transfer data between systems and validate data integrity during system upgrades
Created basic data visualizations using Excel charts and pivot tables to present analytical findings to business stakeholders in accessible formats
Participated in database maintenance activities including data archiving and cleanup procedures under senior supervision
Documented data sources and business rules to support knowledge transfer and maintain institutional knowledge about data systems
Environment: Oracle 9i, SQL, Excel, Basic reporting tools, Manual testing, Data documentation
Education: -
New England College Oct 2023
Master’s in computer & information systems
JNTU University May 2016
Bachelor of Technology, Computer Science and Engineering