Data Engineer Machine Learning

Location:

Raniganj, West Bengal, India

Salary:

80000

Posted:

October 29, 2025

Contact this candidate

Resume:

Ranith Reddy Veeramalla

*************@*****.*** LinkedIn +1-682-***-**** Arlington, TX

SUMMARY

Data Engineer with 3+ years of experience designing, building, and optimizing scalable data pipelines and cloud-based data infrastructure. Proficient in Python, SQL, PySpark, and cloud platforms (AWS, Azure, GCP) to enable robust data processing and warehousing solutions. Demonstrated expertise in automating ETL workflows, developing real-time streaming applications with Kafka, and deploying machine learning models into production. Experience in enhancing data reliability, improving processing efficiency by up to 75%, and supporting data-driven decision-making across financial and insurance domains. TECHNICAL SKILLS

• Data Engineering: ETL/ELT Pipeline Development, Data Warehousing, Data Modeling (Star/Snowflake Schema), Data Lakehouse, Batch & Stream Processing, Data Integration, Data Quality Frameworks (Great Expectations, DBT), Data Governance

• Programming & Scripting: Python (Pandas, PySpark), SQL (Advanced Queries, Query Optimization, Stored Procedures, CTEs), Scala, Bash/Shell Scripting

• Big Data: Apache Spark (Spark SQL, Structured Streaming), Apache Kafka, Hadoop (HDFS, Hive), Databricks

• Cloud Platforms: AWS (S3, Redshift, Glue, SageMaker, CloudFormation, EC2), Azure (Data Factory, Blob Storage), GCP

• Databases & Data Warehouses: Snowflake, PostgreSQL, MySQL, Google BigQuery, Amazon Redshift, Oracle, SQL Server

• Data Pipeline Orchestration: Apache Airflow, AWS Step Functions, Prefect, Luigi

• Containerization & DevOps: Docker, Git, GitLab CI/CD, Jenkins, Kubernetes (Basic)

• Machine Learning & ML Ops: Model Deployment, MLOps Principles, Scikit-learn, TensorFlow, AWS SageMaker

• Methodologies & Tools: Agile (Scrum), Jira, Confluence, Advanced Excel WORK EXPERIENCE

Data Engineer, Capital One, Dallas, TX April 2024 – Present

• Automated ETL pipelines for multi-terabyte financial datasets using PySpark and Airflow, reducing manual data reconciliation by 75% and cutting pipeline processing time by 40%.

• Built real-time streaming pipelines with Kafka and Spark Structured Streaming to process 5M+ daily transactions, enabling fraud detection with under 2-second latency and reducing false positives by 15%.

• Deployed scalable ML models for credit risk, improving prediction accuracy by 12% and reducing portfolio default rates by 8% through containerized APIs on AWS SageMaker.

• Optimized complex SQL queries and data models in Redshift and Snowflake, improving query performance by 30% and supporting 50+ executive KPIs for compliance and business intelligence.

• Authored and maintained 60+ data dictionaries and governed Git repositories, standardizing analytics assets for 5+ cross- functional teams and reducing onboarding time for new engineers by 20%. Data Engineer, Assurant, Hyderabad, India July 2022 – June 2023

• Automated end-to-end ETL pipelines for credit risk and claims reporting using Python and BigQuery, processing 2M+ records monthly and saving 15+ hours of manual effort for actuarial and compliance teams.

• Executed a large-scale data remediation project across 5+ years of historical data, resolving 200,000+ data integrity issues to improve the accuracy of fraud detection models by 25%.

• Migrated 3 mission-critical Excel reserving models to modular Python frameworks, reducing error-prone manual calculations by 25% and enhancing model reliability for financial forecasting.

• Engineered Tableau dashboards integrated with real-time data sources for 50+ regional branches, enabling data-driven interventions that improved claims processing efficiency by 20%.

• Established data governance standards and documented data lineage for insurance data marts in Snowflake and BigQuery, improving metadata clarity and reducing data-related inquiry resolution time by 30%. Data Engineer, HCL, Hyderabad, India May 2021 – June 2022

• Initiated and optimized complex ETL pipelines and modular SQL procedures, reducing query execution time by 30% and improving data processing scalability for enterprise-wide reporting.

• Integrated disparate data sources from ERP (SAP, Oracle) and CRM systems (Salesforce), standardizing data schemas and improving overall data accuracy for analytics from 85% to 95%.

• Developed interactive Power BI dashboards with custom DAX measures, enabling business leaders to monitor KPIs and increasing decision-making efficiency by 25% within the first three months.

• Applied data clustering and outlier detection algorithms to sales and customer data, identifying key trends that guided margin optimization strategies and reduced operational costs by 10%.

• Conducted root-cause analysis on data anomalies using Python and R, resolving 95% of recurring data quality issues and increasing data trustworthiness across business teams. EDUCATION

Master of Science in Data Science August 2023 – May 2025 University of Texas at Arlington

B.Tech in Electronics and Communications August 2019 – May 2023 J.B. Institute of Engineering and Technology

CERTIFICATIONS

Microsoft Certified: Power BI Data Analyst Associate (PL-300) AWS Certified Data Engineer – Associate

Tableau Desktop Specialist

Microsoft Certified: Azure Data Fundamentals (DP-900)

Contact this candidate