Post Job Free
Sign in

Data Engineer Real-Time

Location:
Vadodara, Gujarat, India
Posted:
October 15, 2025

Contact this candidate

Resume:

Chandu Somepalli

*****************@*****.*** +1-443-***-**** Maryland, USA LinkedIn

SUMMARY

Results-driven Data Engineer with 4+ years of experience building scalable data platforms, real-time pipelines, and enterprise analytics across banking, healthcare, and compliance domains. Proficient in Python, SQL, Spark, Kafka, and cloud ecosystems (AWS, Azure, GCP), with expertise in Snowflake, BigQuery, and Databricks. Skilled in ETL automation, regulatory compliance (GDPR, HIPAA, AML, KYC), and BI dashboards that accelerate executive decision-making. Adept at optimizing data quality, reducing costs, and delivering secure, cloud- native solutions powering mission-critical analytics and fraud detection systems. TECHNICAL SKILLS

Programming & Data Processing: Python, SQL, Java, Scala, PySpark, Bash, NumPy, Pandas, Matplotlib Big Data & Databases: Apache Spark, Hadoop, Hive, Pig, Apache Kafka, HDFS, Flink, SparkSQL, Snowflake, Redshift, BigQuery, Synapse Analytics, MS-SQL, MySQL, PostgreSQL, NoSQL, MongoDB, Oracle, Delta Lake, Vector DB, Cassendra, Data Lake ETL, Data Pipelines & Orchestration: Apache Airflow, AWS Glue, Talend, Informatica, Databricks, Microsoft Fabric Suite, dbt, Dagster, HVR, Apache NiFi, Docker & Docker-Compose, Data Vault, Star Schema Data Warehousing, Analytics & BI: Power BI, Looker, Mode Analytics, Google Data Studio, DAX, OLAP, SQL Server Reporting Services

(SSRS), KPI Dashboards, Data Governance & Quality, Streaming Data Cloud Computing & DevOps: AWS (S3, EMR, Glue, Lambda, RDS, Redshift), Kubernetes, Terraform, CI/CD Data Engineering, Optimization & Security: Data Lakes, Streaming (Kafka, Kinesis, Pulsar, Pub/Sub), Real-time & Batch Processing, Query Optimization, Indexing, Data Lineage, GDPR, HIPAA, Compliance & Security, Distributed Systems PROFESSIONAL EXPERIENCE

Data Engineer, Capital One Oct 2024 – Present Remote, USA

• Architected centralized data lakehouse using Snowflake and AWS, merging transactional, risk, and customer records, improving compliance reporting efficiency and accelerating cross-departmental analytics turnaround by 39%.

• Developed scalable ETL pipelines via Airflow, Python, and SQL, processing millions of daily records, reducing reconciliation inconsistencies by 34%, and strengthening compliance-focused financial reporting frameworks.

• Implemented Spark Structured Streaming integrated with Kafka, enabling real-time fraud detection under three seconds, securing daily transaction volumes exceeding $2M across enterprise-scale banking platforms.

• Streamlined onboarding, KYC, and AML data integration from multiple sources, enhancing regulatory accuracy by 29%, and significantly reducing dependency on manual intervention during audits.

• Built Python and dbt-based monitoring frameworks ensuring data lineage visibility, boosting quality metrics by 23%, and delivering regulator-ready transparency across critical enterprise banking datasets.

• Created interactive Power BI dashboards for risk executives, improving visibility into liquidity exposure trends, driving faster portfolio actions, and raising decision-making efficiency by 26%. Data Engineer, Artificial Inventions LLC Apr 2022 – Sep 2024 Maryland, USA

• Architected end-to-end data processing pipelines with PySpark and Snowflake, integrating cross-domain datasets, optimizing queries, and improving transformation efficiency, ensuring 30% faster analytics delivery across enterprise workloads.

• Developed and automated Databricks workflows for PySpark and Snowflake jobs, reducing manual effort by 40% while ensuring seamless scheduling, monitoring, and governance of large-scale financial and healthcare data pipelines.

• Migrated 200+ Hive data transformation scripts to BigQuery SQL, modernizing enterprise pipelines, leveraging clustering and partitioning features, and cutting query runtimes by 38% in production environments.

• Implemented Airflow DAGs orchestrating end-to-end data ingestion from GCS to BigQuery, enabling serverless workflows, improving automation reliability by 32%, and supporting real-time reporting requirements.

• Established rigorous data quality frameworks comparing Hive and BigQuery outputs, improving migration accuracy by 29% and ensuring consistent results across financial, compliance, and customer-facing reporting systems. Data Engineer, HCL Tech Jan 2019 – Dec 2019 Chennai, India

• Streamlined regulatory workflows with Informatica and SQL, processing 600K+ loan records daily, boosting AML compliance accuracy by 32%, and enabling reliable reporting for financial oversight.

• Developed Python and Apache NiFi accelerators, shortening onboarding timelines from weeks to days, accelerating data integration processes across regulated banking systems with greater consistency and reliability.

• Implemented granular security in Snowflake and Synapse, enforcing column-level access restrictions, strengthening GDPR and HIPAA compliance while increasing confidence during external regulatory audits.

• Built automated anomaly detection alerts using AWS Lambda and CloudWatch, proactively monitoring streaming pipelines, reducing incident recovery times by 43%, and significantly improving overall system reliability. EDUCATION

Masters in Information technology, UNIVERSITY OF CUMBERLANDS Jan 2025 – May 2026 Williamsburg, USA Master of Science, UNIVERSITY OF MARYLAND Jan 2020 - Dec 2021 Maryland, USA Data Science

Bachelors of Science, SRM INSTITUTE OF SCIENCE AND TECHNOLOGY Apr 2015 - May 2019 Chennai, India



Contact this candidate