Data Engineer Machine Learning

Location:

Lewisville, TX

Salary:

70000

Posted:

October 15, 2025

Contact this candidate

Resume:

Professional Summary:

Cherishma Meher Medisetty

Data Engineer

*******************@*****.*** Phone: +1-940-***-**** LinkedIn

4+ years of experience in developing and automating scalable data pipelines in AWS, Azure, Snowflake, PySpark, and Airflow. Proven experience in the financial and IT services domains with clients such as Capital One and Wipro. Experienced in enhancing data processing, maintaining quality and security, and providing actionable business insights. A team player who can adapt quickly, with exten sive experience in cloud data architecture and agile environments.

Technical Skills:

Programming Automation: Python, SǪL, PySpark, Automation of Data Workflows, Performance Monitoring

Data Engineering Architecture: Data pipeline development, ETL processes, Data Architecture, Data Manipulation, Data Analysis, Snowflake

Cloud Big Data Technologies: Azure (Data Factory, Data Lake, Databricks, Machine Learning, Synapse Analytics), AWS (Lambda, S3, Redshift, Glue), Apache Airflow

Machine Learning Statistical Analysis: Predictive modeling, Statistical Analysis, Machine Learning Integration

Data Visualization: Power BI, MS Excel

Data Ǫuality Security: Data Ǫuality Assessment, Data Integrity, HIPAA compliance

Methodologies Collaboration: Agile, Scrum, MS Teams, Cisco Webex

Other Skills: Data Warehousing, Real-Time Data Processing, System Integration, Data Cataloging, CI/CD Pipelines, Data Governance, Data Lifecycle Management

Professional Experience:

Capital One, USA Data Engineer Apr 2024 - Current

Designed, developed, and maintained scalable ETL pipelines to process high-volume datasets using AWS Glue, Lambda, and PySpark, improving data processing speed by 35%.

Built end-to-end data pipelines using Apache Spark on AWS EMR, efficiently handling large-scale batch processing and in-memory computation.

Integrated data from multiple sources (S3, RDS, Kafka, and on-premise databases) into Spark DataFrames, improving data ingestion workflows by 30%.

Implemented and optimized data transformation logic using PySpark and Spark SQL, enhancing the accuracy of analytical models by 20%.

Enhanced data model efficiency by leveraging Snowflake’s virtual warehouses and time travel features, supporting multiple analytical teams with minimal resource contention.

Leveraged AWS services (S3, EC2, Glue, Lambda, CloudWatch) to develop cloud-native data processing solutions and automate job orchestration.

Migrated over 100 legacy data programs from SAS to Snowflake, enhancing data processing speed and enabling cloud-based scalability.

Migrated legacy ETL jobs from Teradata to modern AWS OneLake architecture using PySpark, reducing data latency and storage cost by 25%.

Used CI/CD pipelines with Jenkins, Git, and Ansible to automate deployment and manage code versioning, ensuring minimal downtime and improved code quality.

Scheduled and monitored batch jobs using Autosys, ensuring on-time and reliable data availability for downstream consumers.

Performed unit, integration, and end-to-end ETL testing to ensure robust data quality and integrity across multiple stages of data pipelines.

Collaborated with business analysts and product owners in Agile teams to gather requirements and translate business logic into scalable data solutions.

Conducted performance tuning and optimization for Spark applications, leveraging broadcast variables, partitioning, and efficient joins.

Worked with diverse file formats including Parquet, Avro, ORC, and CSV, and handled schema evolution to support changing data requirements.

Ensured security and governance by managing credentials via AWS Key Management Services (KMS) and validating JSON schemas for data validation.

Participated in onshore-offshore collaboration for 24/7 pipeline support and incident management.

Wipro, India Data Engineer Jun 2021 - Jun 2022

Designed and automated a batch ETL pipeline using Azure Functions, to parse and transform 30 GB of daily transaction data from core banking systems; incorporated Change Data Capture (CDC) for incremental updates.

Optimized data ingestion from 11 in-house databases with volume of 3 TB+ using Python scripts with Pandas, removing duplicate transaction data and achieving a 31% reduction in processing time, enabling faster forecasting of processing loads.

Built optimized Snowflake-based ETL pipelines to unify and query large volumes of structured financial data, streamlining regulatory reporting.

Developed an automated workflow using PySpark on a distributed Spark cluster to extract risk and fraud indicators from unstructured transaction data, transforming raw financial data into structured risk profiles.

Authored and optimized SQL queries by implementing indexing and efficient joins in Azure Synapse Analytics to streamline data retrieval, reducing query execution time by 23%.

Deployed and managed Airflow DAGs using Azure Kubernetes Service and Azure Monitor to orchestrate and monitor data pipelines, ensuring high reliability, and automated troubleshooting workflows to detect and resolve errors.

Built a centralized PowerBI dashboard that streamlined data aggregation and analysis, resulting in a 37% reduction in reporting time for compliance and risk management teams.

Seven Fincorp, India Data Engineer May 2020 - Jun 2021

Designed and optimized end-to-end data pipelines using Azure SQL and Synapse Analytics, reducing report processing time by 25% and enhancing real-time business intelligence capabilities.

Improved data governance by integrating Snowflake’s role-based access controls and schema enforcement to meet compliance needs across financial applications.

Automated complex forecasting workflows by integrating Azure Data Factory and Python, resulting in a 30% increase in prediction speed and more accurate resource planning.

Built scalable data lake solutions in Azure to transform raw enterprise data into actionable insights, driving a 20% improvement in operational efficiency and warehouse utilization.

Developed high-performance ETL pipelines using Azure Databricks, PySpark, and SQL to cleanse and validate large datasets, increasing data reliability by 40% and supporting critical decision-making across finance and operations.

Education:

Master in Computer and Information science University Of North Texas Aug 2022 – May 2024

Bachelor in Computer Science Engineering Amrita Vishwa Vidyapeetham Jun 2017 – May 2021

Contact this candidate