Data Engineer Machine Learning

Location:

Jersey City, NJ

Salary:

90000

Posted:

October 15, 2025

Contact this candidate

Resume:

Sri Chandana Purella

AI DATA Engineer

Email: ************@*****.*** Contact: 347-***-****

PROFESSIONAL SUMMARY

AI Data Engineer with 3+ years of experience designing and optimizing large-scale ETL pipelines, real-time data streaming solutions, and machine learning models across multi-cloud platforms Azure, AWS, GCP.

Skilled in Python, PySpark, SQL, Scala, and MLlib, with expertise in financial risk modeling, fraud detection, customer segmentation, and NLP-driven analytics.

Adept at building and operationalizing AI/ML models, integrating MLOps practices, and managing data lakes and warehouses for governance, performance, and scalability.

Proven track record of delivering data-driven insights through BI dashboards Tableau, Power BI, Looker, D3.js and enabling organizations to enhance risk management, compliance, and decision-making.

TECHNICAL SKILLS

Programming & Scripting: Python, Pandas, NumPy, Scikit-learn, NLTK, Matplotlib, Seaborn, R, Scala, SQL, SAS

Big Data & ETL: PySpark, Spark SQL, MLlib, Hadoop, Hive, Sqoop, Flume

Cloud Platforms: Microsoft Azure, Data Factory, Databricks, Data Lake, Google Cloud Platform BigQuery, Dataflow, Amazon Web Services S3, AWS ML

Databases & Storage: SQL Server, MongoDB, Cassandra, HDFS, Azure Data Lake, GCP BigQuery, AWS S3

Streaming & Messaging: Apache Kafka, Spark Streaming

Machine Learning & AI: Supervised & Unsupervised ML SVM, Random Forest, Gradient Boosting, Neural Networks, K-Means, PCA, Factor Analysis, Model Validation ROC, AUC, Precision/Recall, A/B Testing, Regularization, NLP & Text Analytics

Business Intelligence & Visualization: Tableau, Power BI, Looker, D3.js

Version Control & DevOps: Git, MLOps Model Deployment, Monitoring, Cross-validation

Operating Systems: Linux, Windows

PROFESSIONAL EXPERIENCE

Wells Fargo, New Jersey AI Data Engineer Jan 2024– Present

Responsibilities:

Engineered scalable ETL pipelines using PySpark, Databricks, Hive, and Scala to process structured and unstructured financial data, improving data quality and reducing ETL runtime.

Automated CCAR financial reporting workflows by sourcing and integrating large-scale datasets across SQL Server, MongoDB, and Hadoop ecosystems.

Designed and implemented machine learning models (SVM, Neural Networks, Gradient Boosting, K-Means, PCA) for stress testing, customer segmentation, and risk modeling, boosting predictive accuracy.

Developed real-time data pipelines with Kafka + Spark Streaming to detect anomalies and aggregate event-driven data.

Partnered with Data Scientists to build and deploy ML models on Azure Databricks with MLlib and Python libraries (scikit-learn, NLTK), ensuring production scalability.

Applied text analytics and NLP (topic modeling, sentiment analysis) for unstructured financial documents and customer data.

Administered and optimized multi-cloud data lakes (AWS S3, Azure Data Lake, HDFS), implementing fine-grained governance and access controls.

Built interactive dashboards in Tableau, Power BI, and D3.js to communicate insights for compliance, risk, and fraud detection.

Integrated MLOps practices (Git, model validation, cross-validation, ROC/AUC monitoring) to streamline deployment and ensure robust performance in production.

Delivered measurable impact by enabling data-driven risk assessment, reducing reporting cycles, and improving fraud detection accuracy.

Environment: Python, R, PySpark, Databricks, Azure Data Lake, AWS S3, Hadoop, Hive, Kafka, SQL Server, MongoDB, Tableau, Power BI, Git

Cyient, India Data Engineer Jul 2021 – Dec 2022

Responsibilities:

Designed and maintained ETL pipelines using PySpark, Scala, Sqoop, and Flume to process billions of claim records from RDBMS and Hadoop clusters.

Engineered real-time streaming solutions with Kafka Spark Streaming to capture, clean, and analyze event data from multiple sources.

Optimized data pipelines in Azure Data Factory, Azure Databricks, and GCP Dataflow, improving workload performance and cost efficiency.

Developed fraud detection and risk models using ML algorithms (XGBoost, Random Forest, SVM, Neural Networks) with Python and MLlib, improving detection accuracy.

Applied dimensionality reduction (PCA, Factor Analysis) and feature engineering to enhance ML model performance.

Built business intelligence dashboards in Tableau, Power BI, and Looker to deliver actionable insights to stakeholders.

Managed and secured data lakes and warehouses across Azure Data Lake, BigQuery, and SQL Server, ensuring governance and compliance.

Implemented model validation and tuning (ROC, F-score, Precision/Recall, A/B testing, Regularization) to ensure robustness and prevent overfitting.

Delivered data visualizations with Python (Matplotlib, Seaborn), R, and D3.js to support fraud analysis and decision-making.

Collaborated with cross-functional teams to enable cloud-native, scalable data engineering solutions across Azure, GCP, and hybrid big data ecosystems.

Environment: PySpark, Scala, SQL, Hive, Kafka, Azure Data Factory, Azure Databricks, Azure Data Lake, GCP BigQuery, Dataflow, Tableau, Power BI, Python (Pandas, NumPy, scikit-learn), MLlib, R, Hadoop, Cassandra, SQL Server, Git

Contact this candidate