Post Job Free
Sign in

Data Engineer Python Developer Cloud & AI Specialist

Location:
Jersey City, NJ
Salary:
$75 P/hr
Posted:
May 08, 2026

Contact this candidate

Resume:

KRISHNA TEJA

Data Engineer Python Developer Cloud & AI Specialist

**.*******@*****.*** +1-732-***-**** Jersey City, NJ

PROFESSIONAL SUMMARY

Results-driven Python Developer and Data engineer with 10 years of experience architecting end-to-end data pipelines, building scalable cloud infrastructures, and delivering AI-powered solutions across banking, healthcare, and retail domains. Proven expertise in ETL/ELT design, real-time data streaming, data warehousing, and generative AI integration. Adept at leveraging Azure, AWS, and modern data stack tools to process large-scale datasets and drive business intelligence.

10 years of expertise as a Data Engineer and Python Developer specializing in building enterprise-scale ETL/ELT pipelines, distributed data processing systems, cloud data lake/warehouse architectures, and real-time streaming platforms across banking, healthcare, and retail domains.

Certified Azure Data Engineer Associate (DP-203) — hands-on expert in Azure Data Factory (ADF) pipeline orchestration, Azure Databricks, Azure Synapse Analytics, Delta Lake, ADLS Gen2, and Azure Stream Analytics for end-to-end cloud data engineering.

Mastered RESTful API development with Flask and Django, adopting industry best practices for endpoint security, data serialization, and API versioning to facilitate robust backend services.

Architected and delivered end-to-end data pipelines processing 800GB+ daily using PySpark on Azure Databricks and AWS EMR — orchestrating complex DAGs with Apache Airflow and Azure Data Factory across 60+ batch jobs for front-office risk analytics.

Deep expertise in Apache Spark and PySpark — designed and tuned distributed batch and streaming jobs, managed Spark cluster configurations (executor tuning, memory management, broadcast joins, partition optimization), and implemented custom UDFs and aggregations on Azure Databricks and AWS EMR.

Built and governed multi-zone cloud data lakes on Azure Data Lake Storage Gen2 and AWS S3 using Bronze/Silver/Gold medallion architecture, Delta Lake for ACID-compliant transactions, schema evolution, Z-ordering, and data compaction for optimized read performance.

Experience working with Azure functions, Redis cache, Orchestrator, and BigQuery, for parallel data processing and streaming pipelines

Designed high-throughput real-time streaming architectures using Apache Kafka (producers, consumers, Kafka Streams, topic partitioning, consumer group rebalancing), AWS Kinesis Data Streams, and Azure Event Hubs processing millions of financial events per hour with guaranteed delivery.

Provisioned and version-controlled all cloud data infrastructure using Terraform (IaC) across Azure and AWS managing Databricks workspaces, ADF pipelines, Redshift clusters, S3 data lakes, VNets, IAM roles, and security groups with full environment parity.

Implemented robust data quality frameworks using Great Expectations and custom PySpark validation rules automated schema checks, null detection, referential integrity validation, statistical drift monitoring, and SLA alerting across all critical data pipelines.

Experience with pipeline orchestration using Apache Airflow, including building DAGs with retry logic, branching, and task dependencies, and working with Airflow deployments on Kubernetes and MWAA.

Built and maintained comprehensive data observability and monitoring stacks using Azure Monitor, AWS CloudWatch, ELK Stack, Prometheus, and Grafana tracking pipeline SLAs, data freshness, row count anomalies, schema drift, and infrastructure health KPIs.

Containerized data engineering workloads using Docker and orchestrated on Kubernetes (AKS/EKS) managed Helm chart deployments, resource quotas, auto-scaling, and persistent volume claims for stateful data processing services.

Established CI/CD pipelines for data engineering using Azure DevOps and GitHub Actions automated DAG deployment, Databricks notebook promotion, dbt model runs, data contract testing, and environment promotion gates across dev/staging/prod.

Mentored junior and offshore data engineers, led design reviews, and drove Agile delivery — translating complex financial and healthcare data requirements from quant analysts, risk managers, and business stakeholders into production-grade pipeline architectures.

EDUCATION & CERTIFICATIONS

Bachelor of Computer Science — PESIT, Bangalore, India

Microsoft Certified: Azure Data Engineer Associate (DP-203)

TECHNICAL SKILLS

Category

Technologies & Tools

Programming Languages

Python, SQL, JavaScript, Scala, Bash/Shell, R

Web Frameworks

Flask, Django, FastAPI, React.js, Node.js

Data Engineering & ETL

Apache Spark, PySpark, Apache Kafka, Apache Airflow, Apache NiFi, AWS Glue, Azure Data Factory, dbt, Delta Lake

Data Warehousing

Snowflake, Amazon Redshift, Azure Synapse Analytics, Google BigQuery, Apache Hive

Data Processing & Analytics

Pandas, NumPy, PySpark, Databricks, Scikit-learn, Dask

Generative AI & LLMs

LangChain, LlamaIndex, OpenAI API, Azure OpenAI, Hugging Face Transformers, Prompt Engineering, RAG (Retrieval-Augmented Generation), Vector Databases (Pinecone, Weaviate, ChromaDB)

Machine Learning & AI

TensorFlow, Keras, Scikit-learn, XGBoost, NLP, Sentiment Analysis, Predictive Modeling, TensorFlow Serving, Azure ML

Cloud Platforms

AWS (S3, EC2, RDS, Lambda, Glue, EMR, Kinesis, Redshift, Athena, SageMaker, IAM, CloudWatch, Step Functions, DynamoDB), Azure (Databricks, Data Factory, Synapse, Blob Storage, Data Lake, Cosmos DB, Function Apps, DevOps), GCP (BigQuery, Dataflow, Pub/Sub)

Databases

PostgreSQL, MySQL, Amazon RDS, DynamoDB, Azure Cosmos DB, MongoDB, Redis, Elasticsearch

DevOps & CI/CD

Docker, Kubernetes, Jenkins, GitHub Actions, Azure DevOps, Terraform, Bitbucket

Data Visualization & BI

Tableau, AWS QuickSight, ThoughtSpot, Plotly, D3.js, Power BI

Security & Compliance

HIPAA, GDPR, PCI-DSS, SOX, AWS IAM, Azure Active Directory, RBAC, OAuth 2.0, JWT

Testing & Monitoring

PyTest, Apache JMeter, AWS CloudWatch, Azure Monitor, ELK Stack, Prometheus, Grafana, New Relic

Version Control & Collab

Git, GitHub, Bitbucket, JIRA, Confluence, Agile/Scrum

PROFESSIONAL EXPERIENCE

Python Data Engineer

SMBC Sumitomo Mitsui Banking Corporation — Jersey City, NJ Aug 2024 – Present

Enterprise-wide Counterparty Credit Risk (CCR) platform processing large-scale financial data using Python-based data pipelines, Markit RaaS analytics, and Azure cloud services to generate PFE, EPE, and exposure sensitivity metrics for front-office trading desks.

Built credit simulation pipelines to model what-if scenarios on trade exposures and market data — sourced market data from Niwa APIs, implemented CUSIP/redcode/ticker waterfall mapping logic, applied enrichment and normalization transformations, and generated Markit RaaS-compatible structured input datasets.

Implemented Python runtime upgrade to 3.13 across CI/CD, containers, and environments, resolving dependency issues and ensuring long-term maintainability and security compliance.

Developed monitoring and observability stack using Egmonitoring and Azure Monitor — configured custom metric alerts on App Services, Function Apps, and Storage Accounts (~1.5TB), tracking HTTP error rates, memory utilization, TCP availability, SSL certificate expiry, API latency P95, and storage capacity thresholds.

Designed Azure Disaster Recovery (DR) strategies for CRISTAL workloads — implemented geo-redundant backups, automated failover runbooks, data consistency validation across Azure Storage and Databricks, and quarterly DR drills ensuring 99.9% SLA for critical risk analytics systems.

Built and maintained CI/CD pipelines in Azure DevOps (migrating to GitHub Actions) — authored multi-stage YAML pipelines with unit testing (PyTest), integration tests, Docker image builds to ACR, Databricks notebook promotion, and branch-based environment deployments; led Python 3.13 runtime upgrade across all containers.

Led end-to-end design and delivery of the Archive Framework and Archive Statistics platform — standardized archival workflows, automated storage metric tracking across 60+ batch jobs, and built a REST API layer exposing snapshot-based reporting, eliminating manual validation processes entirely.

Designed and implemented Python-based ETL data pipelines (Pandas, PySpark, Django) ingesting and processing 800GB+ of financial datasets daily — built parameterized, reusable pipeline components with configurable source/target mappings supporting 60+ batch jobs across multiple risk analytics workloads.

Delivered credit simulation pipeline enhancements, implementing market data sourcing (Niwa APIs), redcode/CUSIP mapping, and generating Markit-compatible input datasets for risk analytics.

Led end-to-end development of Archive Framework and Archive Statistics platform, standardizing archival workflows and enabling automated tracking of storage metrics across environments and release branches.

Designed and implemented Python-based data pipelines (Pandas, Django) processing x800GB datasets across 60+ batch jobs, supporting front-office trading desks with reliable daily risk analytics.

Built archive statistics collection system with a REST API layer (Django + Azure App Services) exposing storage metrics, snapshot-based reporting, and diff comparisons across release branches — reducing operational support tickets by 35% through automated visibility.

Integrated Azure services (Function Apps, Blob Storage, App Services, Redis Cache) to enable scalable data processing, caching, and high-performance API responses.

Designed and supported Azure Disaster Recovery (DR) strategies for CRISTAL workloads, including backup validation, failover readiness, and data consistency checks across Azure storage and services, ensuring high availability and business continuity for critical risk analytics systems.

Engineered fault-tolerant data processing frameworks with structured exception handling, validation layers, and reconciliation checks to ensure data accuracy for downstream CCR risk models.

Developed monitoring and alerting integrations (Egmonitoring, Azure metrics) to track application health, storage usage, API failures, and infrastructure KPIs, improving production support response time.

Tracked key operational metrics such as HTTP/server errors, memory utilization, TCP availability, SSL certificate validity, request latency, and storage capacity (~1.5TB), enabling faster production .

Led multiple production releases and PRs end-to-end, driving QA stabilization, fixing environment inconsistencies, and ensuring dev–release parity across branches.

Mentored offshore developers and collaborated with cross-functional teams (risk, quant, ops) to translate complex financial requirements into scalable Python solutions.

Integrated and processed market data from Niwa APIs, implementing robust mapping logic (CUSIP, ticker, redcode waterfall) and enrichment pipelines to generate Markit-compatible input datasets.

Built and optimized data transformation pipelines to produce structured simulation outputs and risk metrics, improving data accuracy, traceability, and alignment with downstream Markit analytics systems.

Engineered distributed PySpark jobs on Azure Databricks to parallelize heavy financial data transformations refactored iterrows-based Pandas code to vectorized Spark DataFrames using broadcast joins, partition pruning, and cache strategies, cutting batch runtimes by 40% on high-volume datasets.

Implemented Apache Kafka producers and consumers for real-time ingestion of market data feeds and trade lifecycle events — configured topic partitioning, consumer group offsets, dead-letter queues, and exactly-once semantics to ensure no data loss in credit exposure update pipelines.

Implemented Generative AI-powered document analysis using LangChain and Azure OpenAI to automatically extract and summarize counterparty risk reports, reducing manual review time by 60%.

Built a RAG (Retrieval-Augmented Generation) pipeline integrated with internal knowledge bases and Markit documentation, enabling trading desk analysts to query risk policies and model outputs using natural language.

Developed an LLM-powered anomaly explanation system that detects statistical outliers in credit exposure data and generates human-readable risk narratives for quant and front-office teams.

Integrated vector search (Pinecone) with Azure OpenAI embeddings to build a semantic search layer over 5+ years of archived risk reports, enabling instant regulatory and audit-ready document retrieval.

Engineered fault-tolerant data processing frameworks with multi-layer exception handling, checkpoint-based restartability, validation layers, row-count reconciliation, and data quality assertions — ensuring zero data loss and full accuracy for downstream CCR risk model inputs.

Optimized large-scale SQL queries across Azure Synapse and PostgreSQL — rewrote complex joins using CTEs and window functions, applied partition elimination, rebuilt statistics, and analyzed execution plans to reduce query runtimes from minutes to seconds on 100M+ row tables.

Integrated Azure services (Function Apps, Blob Storage, Redis Cache, Key Vault, Service Bus) to build a scalable event-driven processing layer — implemented async data processing patterns, distributed caching for hot market data, and secure secrets management via Key Vault references.

Applied data visualization using Plotly dashboards and ThoughtSpot search-driven analytics — built custom risk metric dashboards, exposure trend charts, and simulation output reports enabling self-service analytics for front-office and risk management stakeholders.

Mentored offshore developers, led sprint planning, conducted code reviews, and collaborated with quant analysts, risk managers, and operations teams — translating complex financial risk requirements into scalable, maintainable Python and Spark data engineering solutions.

Environment: Python, PySpark, Apache Spark, Pandas, Django, Azure Databricks, Azure Data Lake Gen2 (ADLS Gen2), Delta Lake, Azure Data Factory (ADF),, Apache Kafka, Azure Function Apps, Azure App Services, Azure Blob Storage, Azure Redis Cache, Azure Key Vault, Azure Service Bus, AKS, Terraform, Azure DevOps, GitHub Actions, Docker, ACR, PostgreSQL, PyTest, Markit RaaS, Niwa APIs, Egmonitoring, Azure Monitor, Log Analytics

Python Full Stack Developer

Orion Health — Boston, MA Jun 2023 – May 2024

Predictive Analytics for Early Detection of Chronic Diseases — web-based platform leveraging Azure cloud and Python ML tools to identify individuals at risk of diabetes, heart disease, and hypertension from healthcare data.

Initiated the project by collaborating with healthcare professionals to understand the requirements for early detection of chronic diseases through predictive analytics.

Built Flask RESTful APIs serving 10,000+ concurrent users — implemented connection pooling with SQLAlchemy, query result caching via Azure Redis Cache, JWT/OAuth 2.0/OpenID Connect authentication with Azure AD, and gUnicorn multi-worker deployment on AKS for high availability.

Implemented JWT-based authentication and authorization in a Flask application, ensuring secure data access and transfer across multiple client systems.

Implemented advanced React features such as Suspense and Lazy for code-splitting and lazy loading, optimizing application performance and user experience.

Integrated Material-UI for React to leverage pre-built UI components, ensuring a consistent, modern design language across the application.

Developed a comprehensive back-end using Flask, structuring RESTful APIs to manage healthcare data inputs, model predictions, and user interactions.

Utilized Flask-Migrate for database schema migrations, ensuring smooth evolution of the database structure alongside application updates.

Implemented secure authentication and authorization mechanisms using OAuth 2.0 and OpenID Connect, safeguarding access to the web application.

Implemented ThoughtSpot to enable self-service analytics across the organization, reducing dependency on the IT department for report generation and data insights.

Implemented data observability using Azure Monitor, Application Insights, and ELK Stack — built custom dashboards tracking pipeline job durations, row throughput, data freshness SLAs, API latency P99, Databricks cluster utilization, and Kafka consumer lag with automated PagerDuty alerting.

Ensured full HIPAA compliance — implemented PHI field-level encryption, PII masking in all logs, audit trail logging to Log Analytics, Azure Policy guardrails, Defender for Cloud recommendations, and conducted quarterly vulnerability assessments and penetration tests.

Designed and deployed machine learning models using TensorFlow, focusing on predictive analytics for early detection of chronic diseases.

Designed and implemented end-to-end ETL pipelines using Azure Data Factory (ADF) to ingest EHR data from HL7 FHIR feeds, wearable device sensors, lab result APIs, and hospital EMR exports into Azure Data Lake Storage Gen2 — built parameterized ADF pipelines with mapping data flows, self-hosted integration runtimes, and automated schema drift detection.

Managed Azure Data Lake Storage Gen2 as the central data repository — structured data zones using Bronze/Silver/Gold medallion architecture, applied Delta Lake for ACID transactions and time-travel, configured lifecycle management policies, ACL-based access controls, and AES-256 encryption-at-rest for HIPAA compliance.

Ensured full HIPAA compliance — implemented PHI field-level encryption, PII masking in all logs, audit trail logging to Log Analytics, Azure Policy guardrails, Defender for Cloud recommendations, and conducted quarterly vulnerability assessments and penetration tests.

Leveraged Azure Blob Storage for storing healthcare data, including electronic health records (EHRs), medical images, and lab results, ensuring scalability and security.

Designed and implemented robust data processing pipelines using PySpark, optimizing data transformation and aggregation tasks to enhance performance and scalability in big data environments.

Contributed to migration of legacy data processing scripts to PySpark, resulting in a 50% improvement in processing times and a significant reduction in computational costs.

Integrated Python and PowerShell scripts to leverage the strengths of both languages for comprehensive automation solutions

Deployed the predictive models as web services using Azure Kubernetes Service (AKS), providing scalable, containerized environments for real-time disease risk predictions.

Implemented Apache Airflow DAGs for orchestrating multi-step healthcare data workflows — built custom operators for ADF pipeline triggers, Databricks job submissions, and data quality checks; configured SLA callbacks, retry logic, task dependencies, and XCom-based data passing across pipeline stages.

Deployed predictive ML models as containerized REST services on Azure Kubernetes Service (AKS) — authored Helm charts for versioned deployments, configured horizontal pod autoscaling based on request latency, implemented blue-green deployments for zero-downtime model updates serving real-time disease risk predictions.

Configured Azure Cosmos DB (NoSQL, multi-region write) for real-time ingestion of high-frequency wearable device streams and unstructured health records — designed partition key strategies for even distribution, set TTL policies for data expiry, and integrated Change Feed for downstream stream processing.

Environment: Python, PySpark, Apache Spark, Pandas, Azure Databricks, Delta Live Tables, Azure Data Factory (ADF), Azure Data Lake Gen2 (ADLS Gen2), Delta Lake, Azure Synapse Analytics, Apache Kafka, Azure Event Hubs, Apache Airflow, Azure ML, MLflow, AKS, Helm, Terraform, Azure Cosmos DB, Azure SQL, Azure Redis Cache, Azure Purview, Azure Monitor, Azure DevOps, GitHub Actions, Docker, ACR, Flask, TensorFlow, XGBoost, Scikit-learn, PyTest, ELK Stack

Sr. Python Full Stack Developer

GM Financial — Fort Worth, TX Jan 2022 – Jan 2023

AI-powered customer service chatbot for banking, integrating Flask/Django backend, React frontend, AWS deployment, and TensorFlow/Keras NLP models for intelligent, context-sensitive banking dialogues.

Built and trained NLP models using TensorFlow and Keras with sequence-to-sequence architecture and Word2Vec embeddings, enabling the chatbot to process natural language banking queries with high accuracy.

Deployed TensorFlow Serving for scalable, high-performance model inference; established continuous training pipeline incorporating new user interactions for ongoing accuracy improvement.

Orchestrated ETL pipelines using AWS Glue for processing transaction data and user interactions, and leveraged AWS Lambda for serverless back-end logic to optimize resource usage.

Implemented Flask-SocketIO and WebSocket technology for bi-directional real-time chat communication; designed microservices architecture with Docker and Kubernetes for scalability.

Secured the application with SSL/TLS encryption, Flask-JWT-Extended session management, and OAuth 2.0, adhering to banking compliance standards (PCI-DSS, SOX).

Utilized Terraform for infrastructure-as-code across AWS, reducing deployment time by 30% and established CI/CD pipelines with Jenkins for automated build/test/deploy workflows.

Environment: Python, Flask, Django, React.js, TensorFlow, Keras, AWS (S3, EC2, RDS, DynamoDB, Glue, Lambda, API Gateway, Snowflake), Apache Airflow, Jenkins, Bitbucket, Terraform, Docker, Kubernetes, PowerShell

Sr. Python Developer

The Kroger Co. — Cincinnati, OH Mar 2020 – Feb 2021

Customer Segmentation platform using ML models for customer segmentation, credit risk assessment, and fraud detection — improving marketing ROI, reducing loan defaults, and enhancing user engagement.

Designed and implemented ML pipelines using Pandas and Scikit-learn for customer segmentation with K-means clustering, driving measurable improvements in marketing ROI.

Deployed the web application on AWS Elastic Beanstalk; configured AWS S3 for large dataset storage, AWS Lambda for serverless ML inference, and AWS RDS for relational data management.

Managed container orchestration with Kubernetes on AWS EKS, enabling automated scaling, self-healing, and load balancing; containerized all components with Docker.

Built React.js front-end with Redux state management and custom D3.js visualization components for interactive customer segmentation dashboards.

Implemented CI/CD pipelines with Jenkins and AWS CodePipeline; set up monitoring with AWS CloudWatch and ELK Stack for real-time application performance insights.

Environment: Python, Flask, Django, AWS (S3, EC2, RDS, DynamoDB, Glue, Lambda, Elastic Beanstalk, EKS, CloudWatch), React.js, D3.js, Scikit-learn, Apache Airflow, Jenkins, Terraform, Docker, Kubernetes, ETL

Sr. Python Full Stack Developer

DressCerner Corporation — Kansas City, MO Dec 2016 – Feb 2020

Electronic Health Records Integration Platform — seamlessly connecting disparate healthcare systems via HL7 FHIR standards, enabling efficient data exchange, improved patient care, and HIPAA-compliant workflows.

Designed scalable microservices architecture for EHR integration, built React.js front-end dashboards for patient record management, and developed Flask RESTful APIs with HL7 FHIR compliance.

Designed and implemented data pipelines using Apache Kafka and Apache NiFi for ingesting and processing EHR data from multiple disparate healthcare systems.

Integrated OAuth 2.0 and JWT authentication, conducted HIPAA security audits and vulnerability assessments; deployed on AWS EC2, RDS, and S3.

Implemented real-time data synchronization with WebSocket technology; built comprehensive logging and monitoring with ELK Stack and AWS CloudWatch.

Environment: Python, Flask, React.js, Apache Kafka, Apache NiFi, AWS (EC2, RDS, S3), Docker, Kubernetes, Jenkins, ELK Stack, PostgreSQL, HL7 FHIR, OAuth 2.0, PyTest

Python Full Stack Developer

BlackRock — New York City, NY Sep 2013 – Nov 2016

Stock Market Data Analysis Pipeline — ingesting, processing, and analyzing vast stock market datasets using data warehousing, real-time streaming, and ML for traders, investment firms, and financial analysts.

Designed and developed data processing pipelines using Apache Kafka for real-time data streaming and Apache Spark for large-scale stock market data analysis.

Integrated Python ML libraries (Pandas, NumPy, Scikit-learn, TensorFlow) for predictive modeling; built React.js dashboards with D3.js and Highcharts for real-time market visualization.

Deployed on AWS (EC2, RDS, S3), implemented SQL/NoSQL data storage (PostgreSQL, MongoDB), and configured Prometheus, ELK Stack, and Grafana for monitoring.

Implemented CI/CD with GitLab, containerization with Docker, and Kubernetes orchestration for reliable deployment under variable market data loads.

Environment: Python, Flask, React.js, Apache Kafka, Apache Spark, AWS (EC2, S3, RDS, Lambda, SageMaker, IAM), TensorFlow, Scikit-learn, PostgreSQL, MongoDB, Docker, Kubernetes, Grafana, Prometheus, ELK Stack



Contact this candidate