MEGHANA EMMIDI
***************@*****.*** +1-314-***-**** LinkedIn Github Portfolio
SUMMARY
Results-oriented Data Engineer with over 4 years of experience designing, developing, and optimizing robust data pipelines in cloud and on-prem environments. Proficient in building scalable ETL/ELT solutions using tools such as Apache Spark, Kafka, Airflow, and AWS Glue. Strong expertise in handling large-scale data platforms, data modeling, and integrating structured and semi-structured data from diverse sources. Adept at streamlining data workflows, enabling real-time analytics, and improving data quality and governance across business domains.
TECHNICAL EXPERIENCE
Programming: Python, Java, Scala, SQL, PySpark, KornShell, Bash, JSON, YAML.
Data Engineering Tools: Apache Spark, Apache Kafka, Apache Airflow, Apache NiFi, AWS Glue, AWS Data Pipeline, Azure Data Factory, Google Dataflow, Informatica, Talend, SSIS, ODI, BODI, dbt, Datastage.
Databases & Warehousing: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, DynamoDB, Cosmos DB, AWS RDS, Azure SQL DB, Snowflake, Amazon Redshift, Google BigQuery, Hive, HiveQL, HBase.
Big Data Technologies: Hadoop, HDFS, YARN, MapReduce, Spark (SQL, Streaming, MLlib), EMR, Flink, Presto, Delta Lake.
Cloud Platforms: AWS (S3, Lambda, Redshift, Glue, Athena, CloudWatch, EC2, IAM), Azure (ADF, Blob Storage, Synapse, Databricks), GCP (BigQuery, Pub/Sub, Dataflow).
DevOps & CI/CD: Docker, Kubernetes, Terraform, Jenkins, GitHub Actions, GitLab CI/CD, Airflow DAGs.
Data Modelling & Governance: Star/Snowflake Schema, Data Vault, DDL/DML, Data Lineage, MDM, Data Governance, Data Quality (DQ).
Visualization Tools: Power BI, Tableau, Looker, AWS QuickSight, Google Data Studio.
Libraries & APIs: Pandas, NumPy, PyArrow, FastAPI, RESTful APIs, JDBC, ODBC.
Monitoring & Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Prometheus, Grafana, Splunk, Datadog, CloudWatch.
Testing & Validation: PyTest, JUnit, TestNG, dbt Tests.
PROFESSIONAL EXPERIENCE
Jenius Bank California, US
Data Engineer Aug 2024 – Present
•Designed and maintained scalable ETL pipelines using Apache Airflow, AWS Glue, and PySpark, processing over 2TB of data daily.
•Implemented data ingestion workflows from APIs, Kafka, and RDS into Snowflake and Redshift for analytical reporting.
•Built reusable data models using dbt for marketing, operations, and finance teams, improving query efficiency by 40%.
•Automated AWS infrastructure deployment using Terraform and managed CI/CD pipelines with GitHub Actions.
•Led a migration project from on-prem SQL Server to AWS Redshift, reducing latency by 60%.
•Created dashboards using Power BI and Tableau to track operational KPIs and data pipeline health.
•Integrated data quality checks using Great Expectations, improving data trustworthiness across teams.
•Collaborated with business analysts and stakeholders to align data architecture with business goals.
Fiserv Alpharetta, GA
Data Engineer June 2023 – May 2024
•Developed and optimized data pipelines using Apache Spark and Kafka to support near real-time data processing.
•Utilized Azure Data Factory and Databricks to transform and store data for financial analytics applications.
•Created custom monitoring dashboards using ELK Stack and Prometheus for performance visibility and alerts.
•Built star and snowflake schemas in Snowflake and Azure Synapse for improved BI reporting.
•Reduced data load time by 30% through query optimization and parallel processing in SparkSQL.
•Conducted end-to-end testing using PyTest and integrated data validation with dbt tests.
•Enhanced data governance by implementing MDM and metadata lineage tracking using Informatica.
•Coordinated with cross-functional DevOps teams to manage containerized deployments using Docker and Kubernetes.
Capgemini Hyderabad, India
Jr Data Engineer Jan 2021 – Dec 2022
•Assisted in building automated ETL pipelines with SSIS and Talend for data migration and transformation.
•Supported Apache NiFi-based ingestion workflows to collect streaming data from multiple IoT sensors.
•Performed SQL tuning and schema optimization in Oracle and SQL Server to enhance data processing performance.
•Worked with Hadoop and Hive to support batch data workloads and generate reports.
•Collaborated with QA teams to validate pipeline output and verify metadata integrity using dbt.
•Built Tableau dashboards for end-users to visualize metrics and operational KPIs.
•Managed AWS S3 and EC2 environments to stage and process large datasets for downstream applications.
•Documented data lineage and process flow diagrams to improve pipeline transparency and troubleshooting.
EDUCATION
Webster University January 2023 – May 2024
Master of Science in Information Systems 3.6 GPA