Post Job Free
Sign in

Data Engineer Governance

Location:
Delray Beach, FL, 33484
Salary:
69000
Posted:
September 10, 2025

Contact this candidate

Resume:

MANI SHANKAR CHAMIDISETTY

+1-561-***-**** ***********@**********.*** LinkedIn GitHub

PROFESSIONAL SUMMARY

Experienced Data Engineer with 5 years of expertise in designing, developing, and managing scalable data pipelines across AWS, Azure, and GCP. Skilled in real-time analytics, data modeling, and regulatory compliance for large-scale datasets. Proficient in Apache Airflow, Spark, AWS Glue, and ETL/ELT optimization. Adept at data governance, performance tuning, cloud cost optimization, and ensuring compliance with GDPR, HIPAA, SOC 2.

TECHNICAL SKILLS

Programming & Scripting: Python, Scala, Java, R, Bash, SQL, PL/SQL, Shell Scripting Big Data Technologies: Apache Spark, Apache Flink, Apache Kafka, Apache Hadoop, Hive, HBase, Impala, Sqoop, Pig, Zookeeper

Cloud Platforms: AWS (S3, EMR, Redshift, Glue, Lambda, Athena, RDS, DynamoDB, QuickSight), Azure

(Data Factory, Synapse, Blob Storage, Data Lake, SQL Database), GCP (BigQuery, Dataflow, Cloud Storage, Pub/Sub), IBM Cloud

ETL/ELT & Orchestration: Apache Airflow, Azure Data Factory, AWS Glue, Apache NiFi, Talend, Informatica, DBT, Luigi, SSIS

Databases: PostgreSQL, MySQL, MongoDB, MS SQL Server, Oracle, Snowflake, BigQuery, Cassandra, Redis, DynamoDB

DevOps & CI/CD: Jenkins, Terraform, Git, Docker, Kubernetes, GitHub Actions, Maven, Apache Ant Data Visualization: Power BI, Tableau, AWS QuickSight, Looker, Apache Superset, SSRS ML & Data Science: Pandas, NumPy, Scikit-learn, TensorFlow, Matplotlib, Seaborn, PyTorch, Jupyter Data Governance: Data Lineage, Data Catalog, Data Masking, Metadata Management, GDPR, HIPAA, SOC 2 Testing & Monitoring: JUnit, PyTest, Postman, Swagger, ELK Stack, Prometheus, Grafana, New Relic Project Management: Agile, Scrum, SDLC, Jira, Confluence PROFESSIONAL EXPERIENCE

State Street, USA — Data Engineer Sep 2024 – Present

Engineered high-performance Elasticsearch clusters, improving insurance and financial policy search operations and reducing latency by 65%, enhancing call center efficiency.

Designed real-time streaming architectures using Apache Kafka, Spark Structured Streaming, and Databricks to process over 1M transactions/day for fraud detection and claim triage.

Built scalable AWS data lake solutions (S3, Glue, Athena) for structured/unstructured data, improving compliance and audit readiness.

Developed hybrid ETL/ELT pipelines for actuarial, underwriting, and claims data, increasing reporting accuracy and reducing auditor delays by 35%.

Created centralized reporting systems in Redshift and AWS Data Pipeline, optimizing portfolio queries and regulatory disclosures.

Delivered dynamic Tableau dashboards with real-time KPI tracking for claims cycle, SLAs, and retention metrics.

Optimized DBT models, cutting load times by 40% and cloud costs by 20%.

Automated CI/CD pipelines for data infrastructure, improving release velocity and reliability.

Implemented data governance policies to ensure HIPAA, GDPR, and SOC 2 compliance.

Mentored junior engineers, promoting best practices in data pipeline optimization and fault tolerance. Accenture Solutions, India — Data Engineering Analyst Oct 2021 – Jul 2023

Developed ETL workflows in Informatica, Teradata SQL, and Oracle SQL for insurance KPI reporting, claims, commissions, and premium data, resolving abends and ensuring production data quality.

Orchestrated healthcare data migration using PySpark, AWS Step Functions, Lambda, SNS, JDBC, reducing Oracle-to-Redshift transfer time from 5 hours to 30 minutes via parallel processing.

Built and modified Informatica jobs linked to Oracle schemas and views, and created Linux shell scripts for automated file transfer, data validation, and batch processing.

Implemented AWS security architecture with IAM roles, policies, Step Functions, Lambda, and SNS integrated PySpark with JDBC for secure Oracle Information Schema extraction.

Designed Tableau dashboards for multi-frequency reporting; optimized PySpark performance with scaling, partitioning, and custom Oracle-to-Redshift data mapping. Cybage Software, India — Jr. Data Engineer Jan 2020 – Sep 2021

Developed ETL solutions using Python and PySpark to ingest structured/semi-structured data from APIs and CRM tools.

Enabled metadata tracking and lineage for 150+ datasets using Apache Airflow and Talend.

Built real-time processing jobs in Apache Flink for retail inventory analytics.

Optimized Spark jobs via partitioning, caching, and broadcasting to enhance performance.

Created Power BI dashboards visualizing onboarding KPIs and usage metrics.

Migrated workloads to Delta Lake with structured logging for better traceability.

Integrated Flink with Kafka for low-latency analytics, reducing time-to-insight.

Supported Azure Data Factory enhancements, lowering pipeline failure rates.

Built reusable Python scripts for log analysis and data quality checks.

Developed robust QA tests for ETL validation and data governance enforcement. EDUCATION

Master’s in Data Science and Analytics — Florida Atlantic University, Boca Raton, FL (May 2025) CERTIFICATIONS

AWS Certified Solutions Architect – Associate

Microsoft Certified: Azure Administrator Associate

IBM Data Engineering Professional Certificate

Google Data Analytics Professional Certificate

HashiCorp Certified: Terraform Associate (003)

TECHNICAL PROJECTS

Parkinson’s Prediction – Built ML model using XGBoost, SVM, and Extra Trees for disease prediction.

Reddit Sentiment Analyzer – Analyzed Reddit posts using Python, PostgreSQL, and Tableau.

Movie Analytics Pipeline – Designed ETL pipeline on AWS with Python and Tableau.

GCP Image Web App – Developed photo storage app using Flask, Firebase, and GCP.

Spotify Analyzer – Visualized track popularity using Spotify API and Python.



Contact this candidate