Data Engineer Governance

Location:

Denton, TX

Salary:

80000

Posted:

September 10, 2025

Contact this candidate

Resume:

Meghana Karre

Data Engineer

Location: Texas, USA Mail: ****************@*****.*** Ph.: +1-469-***-**** LinkedIn PROFESSIONAL SUMMARY:

3+ years of experience as a Data Engineer specializing in architecting, optimizing and deploying robust, fault-tolerant data pipelines for high-volume, real-time financial and enterprise data.

Proficient in Spark Structured Streaming, Apache Kafka, Snowflake, Azure Databricks, Azure Synapse Analytics and AWS services (S3, EMR, Lambda, EC2).

Expertise includes ETL/ELT development with Apache Airflow, dbt and SQLAlchemy; implementing data governance, lineage and security frameworks using Collibra and Apache Atlas; and ensuring stringent regulatory compliance.

Skilled in real-time data ingestion via Apache NiFi and Kafka Connect, leveraging Apache Flink CEP for complex event processing and fraud detection.

Adept at data quality with Great Expectations, monitoring with Prometheus and Grafana and CI/CD automation with Jenkins and Docker.

Experienced in developing recommendation engines using MLlib and Scikit-learn and delivering actionable insights through SQL and Power BI.

TECHNICAL SKILLS:

Programming Languages: Python, SQL, Scala, NoSQL

Big Data Frameworks: Spark, Spark Structured Streaming, Apache Flink CEP, Apache Kafka, Kafka Connect, AWS Glue, Pyspark, Hadoop, Hive, Big Data Engineer ETL/ELT: Apache Airflow, dbt, SQLAlchemy, Apache NiFi Data Modeling: Data Vault 2.0, Erwin Data Modeler

Data Warehousing: Snowflake, Azure Synapse Analytics, Oracle, Unix, Shell Scripting Cloud Platforms: Azure Databricks, Azure Data Lake Storage (ADLS Gen2), AWS (S3, EMR, Lambda, EC2),GCP

Machine Learning: MLlib, Scikit-learn

Monitoring & Logging: Prometheus, Grafana

Data Governance: Collibra, Apache Atlas

Security & Compliance: RBAC, Fine-grained encryption, GDPR, SOX, CCPA, GLBA, PCI-DSS, FFIEC, BSA, FATCA, OFAC

Streaming Protocols: SWIFT, ACH, FEDWIRE, FIX, ISO 20022 Visualization & BI: Power BI

DevOps & CI/CD: Jenkins, Docker

Version Control: Git, GitHub,GitLab

PROFESSIONAL EXPERIENCE:

State Street – TX August 2024 – Present

Data Engineer

• Optimized end-to-end batch processing by 48% by leveraging Spark Structured Streaming with adaptive query execution

(AQE) on Azure Databricks, reducing operational costs in treasury data reporting.

• Maintained 99% data availability SLA for regulatory and risk reporting pipelines, ensuring audit-readiness for SOX, PCI- DSS and FFIEC compliance across investment banking platforms.

• Architected and deployed low-latency, fault-tolerant data pipelines using Apache Kafka, Spark Structured Streaming and Snowflake, enabling real-time ingestion of high-volume financial transactions (SWIFT, ACH, FEDWIRE).

• Designed KYC/AML data pipelines utilizing Data Vault 2.0 architecture with Erwin Data Modeler, supporting regulatory reporting for BSA, FATCA and OFAC compliance frameworks.

• Developed modular ETL/ELT pipelines using Apache Airflow, dbt and SQLAlchemy to automate ingestion from SWIFT, FIX and ISO 20022 feeds, improving reconciliation processes for capital markets data.

• Integrated Apache NiFi with Azure Data Lake Storage (ADLS Gen2) to enable real-time ingestion of multi-source retail and commercial banking data for fraud detection and transaction monitoring.

• Implemented robust data governance and security frameworks by enforcing role-based access controls (RBAC), fine- grained encryption and generating audit-ready artifacts to maintain compliance with SOX, GDPR, GLBA and CCPA.

• Executed phased migration of legacy Oracle ETL frameworks to Azure Synapse Analytics, achieving a 35% improvement in query execution times and reducing technical debt in enterprise data warehouses.

• Developed Kafka Connect pipelines to stream real-time transactional updates from core banking systems (FIS, Fiserv) to downstream analytics platforms for risk and liquidity monitoring.

• Built enterprise-wide data lineage and metadata management frameworks using Collibra and Apache Atlas, enhancing data traceability, lineage visualization and ensuring regulatory audit readiness.

• Integrated Apache Flink CEP (Complex Event Processing) to process high-frequency trading event streams, enabling real- time fraud detection and alert generation for anti-money laundering (AML) workflows. Accenture – India March 2021 - June 2023

Data Engineer

• Optimized data processing pipelines using Apache Spark and AWS Glue, resulting in a 40% reduction in data processing time for daily sales data aggregation.

• Implemented data validation and cleansing processes leveraging Great Expectations, leading to a 25% increase in data quality as measured by reduced data discrepancy rates in inventory management reports.

• Developed a real-time product recommendation engine utilizing Spark Streaming & Kafka to deliver personalized product suggestions based on user browsing history and purchase behavior, directly contributing to a 15% uplift in click-through rates on recommended products.

• Implemented collaborative filtering (ALS) and content-based filtering algorithms using MLlib and Scikit-learn to enhance the accuracy of product recommendations, resulting in a 10% improvement in recommendation conversion rates.

• Built a robust and scalable data pipeline using Apache Airflow to ingest, transform and load data from diverse sources, including transactional databases (MySQL, PostgreSQL), web logs (Apache logs) and social media feeds (Facebook, Instagram APIs), ensuring timely and accurate data availability for business intelligence and analytics.

• Proactively monitored the performance of the recommendation engine and data pipeline using Prometheus and Grafana, identifying and resolving performance bottlenecks, thereby maintaining 99.9% uptime for critical data services.

• Leveraged AWS services such as S3, EMR, Lambda and EC2 to build and deploy scalable and cost-effective data solutions, optimizing cloud resource utilization and reducing infrastructure costs by 12%.

• Developed and maintained CI/CD pipelines using Jenkins and Docker to automate the deployment of code changes, ensuring rapid and reliable software releases with minimal downtime.

• Maintained complex SQL queries and stored procedures for data analysis and reporting, providing actionable insights to business stakeholders on key performance indicators (KPIs) such as sales trends, customer segmentation and marketing campaign effectiveness.

• Utilized Power BI to create interactive dashboards and reports to visualize data and provide insights to business stakeholders, enabling data-driven decision-making and improved business outcomes. EDUCATION:

Master’s of Information Technology - University of North Texas, TX, USA Bachelor’s of Information Technology – Malla Reddy Engineering College for Women, Hyderabad, Telangana, India CERTIFICATES

• pl-300 Microsoft power bi data analyst associate

• google certified professional data engineer

Contact this candidate