Data Engineer Machine Learning

Location:

Hyderabad, Telangana, India

Posted:

September 11, 2025

Contact this candidate

Resume:

Krishnakanth Manchiraju

Data Engineer

513-***-**** ***************@*****.*** LinkedIn Cincinnati, OH Professional Summary

Data Engineer with 5+ years of experience in designing, developing and maintaining robust, scalable and secure data pipelines across Capital Markets, Banking and Healthcare domains. Adept in building both batch and real- time ETL/ELT workflows with Apache Spark, PySpark, SQL and Kafka and deploying data infrastructure on leading cloud platforms including AWS (Glue, S3, Redshift, Lambda, Kinesis) and Azure (Data Factory, Synapse, Data Lake). Proficient in working with structured, semi-structured and unstructured datasets, implementing data quality checks and automating ingestion pipelines for large-scale data lakes and warehouses. Demonstrated success in managing data analytics, reporting, fraud detection, compliance and machine learning projects by ensuring data is accurate, timely and reliable. Experienced in coordinating with cross-functional teams including Data Scientists, BI Developers and Product Owners to gather requirements, improve data models and deliver high-impact data solutions. Skilled in utilizing Airflow, Git, Docker and DevOps tools to maintain agile, production-grade environments. Technical Skills

● End-to-End Data Analysis: SQL, Python (Pandas, NumPy, SciPy, Scikit-learn), R, Excel

● Databases: Snowflake, MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Amazon Redshift, Google BigQuery

● Cloud Platforms: AWS (S3, Glue), Azure (Blob, Synapse), Google Cloud (BigQuery, GCS)

● ETL & Data Integration: Apache NiFi, Talend, AWS Glue, Azure Data Factory, Informatica

● Data Visualization & BI: Power BI, Tableau, Looker (for DataOps & reporting support)

● Data Warehousing: Snowflake, Amazon Redshift, Azure Synapse, Google BigQuery

● Big Data & Distributed Systems: Apache Spark, Hadoop, Hive, HDFS, Kafka

● Data Security & Compliance: HIPAA, GDPR, Data Masking, Encryption, IAM

● DevOps & Automation: Airflow, Git, Jenkins, Docker, Terraform

● Version Control & CI/CD: Git, GitHub, Bitbucket, Jenkins

● Deep Learning: CNN, RNN, LSTM, GRU, Transformer, BERT, GPT, ResNet, YOLO, GAN

● Natural Language Processing: NLTK, spaCy, Transformers, Word2Vec, BERT, GPT-3/4, Sentiment Analysis

● Others: REST APIs, JSON, XML, JIRA, Agile/Scrum, SDLC Work Experience

Data Engineer Jun 2024 – Present

Jefferies New York City, NY

● Designed and built real-time AI/ML data pipelines using Apache Kafka, AWS Kinesis, and Flink to process high- frequency trading (HFT) feeds from Bloomberg and Reuters, powering quant models for predictive trading strategies.

● Developed feature engineering workflows in PySpark & Scala, transforming structured/unstructured trading data into Amazon S3 and Snowflake to create training-ready datasets for ML models.

● Integrated and standardized credit risk data from Calypso and Murex into Snowflake using AWS Glue and Step Functions, supporting machine learning risk-scoring models with a 40% faster data refresh cycle.

● Orchestrated metadata-driven ML pipelines with Airflow (Amazon MWAA), dynamically generating DAGs to automate model retraining, feature refresh, and hyperparameter tuning workflows.

● Implemented data validation and anomaly detection using Great Expectations and AWS Lambda, ensuring consistent input quality for downstream fraud detection and compliance ML models.

● Built scalable cloud-native ML data infrastructure with AWS S3, EMR, Redshift, and Glue to support training, testing, and deployment of predictive and anomaly detection models across financial domains.

● Optimized SQL and Spark-based transformations for feature extraction and time-series analysis, enabling forecasting and AI-driven insights for trading and compliance teams.

● Deployed CI/CD for ML pipelines with GitLab, Terraform, and Docker, ensuring reproducibility, automated testing, and environment consistency for model development and deployment.

● Partnered with Quants and Data Scientists to operationalize ML models, provisioning structured datasets and embedding pipelines for fraud detection, trade surveillance, and AI-powered reporting.

● Strengthened AI/ML governance and monitoring with AWS CloudWatch, Glue Data Catalog, and Apache Atlas to ensure traceability, auditability, and regulatory compliance of model-driven workflows. Data Engineer Sep 2021 – Dec 2023

Capgemini India

● Engineered a robust data ingestion framework with help of Apache NiFi & Azure Data Factory to load daily credit card transactions into a centralized Azure Data Lake Storage (ADLS Gen2) for a large Indian private bank.

● Designed PySpark-based ETL jobs to process sensitive banking data from OLTP systems & transform into dimensional models maintained in Azure Synapse Analytics, ensuring secure access via Azure RBAC & fine-grained roles.

● Engaged with risk and fraud analytics teams to provide enriched customer profiles by joining datasets from core banking, CRM and external fraud APIs with optimized Spark and SQL transformations.

● Integrated transaction anomaly alerts into data pipeline using Azure Functions and Logic Apps, reducing turnaround time for fraud detection by over 30%.

● Built scalable data pipelines using Azure (Data Lake, Data Factory, Functions, Synapse) to support batch and near real-time data processing across structured and semi-structured data formats including JSON, Parquet and CSV.

● Developed and maintained Spark jobs for data cleansing, enrichment and standardization, enabling better usability for reporting, analytics and ML use cases.

● Implemented CI/CD pipelines using Git, Azure DevOps and Docker for automated deployment and version control of data engineering components in dev, QA and production environments.

● Worked closely with cross-functional teams (data scientists, BI analysts, product owners) to understand data requirements and helped in creating pipelines aligned with evolving business goals and KPIs.

● Created reusable data quality validation scripts for customer onboarding, account status & loan lifecycle data having Great Expectations & embedded data in Azure Data Factory pipelines or Apache Airflow for daily validation jobs.

Data Engineer Intern Jan 2020 – Sep 2021

Citius Tech India

● Assisted in developing a patient data integration pipeline used to collect real-time data from EMR systems (Electronic Medical Records), lab systems and appointment scheduling tools into a unified reporting layer.

● Built basic ETL workflows using Python and SQL to transform clinical data, patient demographics and treatment records for monthly reporting and compliance audits.

● Assisted with hospital IT and BI teams to support COVID-19 case tracking and generate daily reports using pre- aggregated datasets in Excel and Power BI.

● Helped in securing data storage practices for sensitive health information by uploading anonymized files into Azure Data Lake Storage under supervision.

● Contributed to developing scalable Azure Data Factory pipelines for transferring and transforming healthcare data to cloud environments.

● Utilized Azure Synapse Analytics and Azure SQL Database for creating and querying structured datasets helped in hospital performance dashboards and departmental KPIs.

● Performed data cleaning and transformation with PySpark and Azure Databricks, ensuring healthcare data met quality and privacy standards (HIPAA-aligned).

● Monitored and maintained pipeline performance and failures in Azure Monitor and Log Analytics, reducing downtime by 50% and improving troubleshooting by 40%.

● Participated in version control and deployment practices with Git and Azure DevOps, supporting the CI/CD process for pipeline updates and enhancements.

Certifications

Big Data: The Overview – Pluralsight Big Data - Data Engineer - Level 1 - Capgemini. Data Analytics Professional Certificate - Google Data Science: The Big Picture- Coursera Getting Started with HDFS – Pluralsight Hands-on Introduction to Linux Commands and Shell Scripting - Coursera. Intro to Analytic Thinking, Data Science and Data Mining - Coursera Python Basics - Pluralsight. Education

Master of Science in Information Technology University of Cincinnati, OH, USA. Bachelors of Technology in Electrical & Electronics Engineering, from Pragati Engineering College, India.

Contact this candidate