Post Job Free
Sign in

Data Engineer with 6+ Years in Big Data Solutions

Location:
Hyderabad, Telangana, India
Salary:
100000
Posted:
April 30, 2026

Contact this candidate

Resume:

ASHLESH YANALA

DATA ENGINEER

********.****@*****.*** 331-***-**** LinkedIn

* ***** ** ********** ********* and implementing enterprise-grade data solutions across healthcare, retail industry, and banking using Big Data, Cloud, and Dbt (ETL technologies).

Skilled in orchestration using Apache Airflow (EC2 & Composer), ADF, Oozie, Azkaban, and AWS Step Functions for managing dependency-driven workflows and SLA monitoring.

Hands-on experience with all three major cloud platforms: o AWS: EMR, S3, Redshift, Glue, Lambda, Athena, RDS, DynamoDB, SQS, SNS Azure: ADLS Gen2, Blob Storage, ADF, Synapse, Azure SQL, Azure Functions GCP: Cloud Dataflow (Beam), Cloud Composer, BigQuery, Dataproc, Pub/Sub

Proven expertise in Big Data ecosystems, delivering scalable and fault-tolerant pipelines using Apache Hadoop, Spark, Kafka, Databricks, and Snowflake for high-volume data processing and streaming use cases.

Proficient in Python, Scala, R, and Java for building data pipelines, transformation logic, external REST API integrations, and real-time streaming frameworks.

Skilled in orchestration using Apache Airflow (EC2 & Composer), ADF, Oozie, Azkaban, and AWS Step Functions for managing dependency-driven workflows and SLA monitoring.

Experienced in data modeling (Star, Snowflake schemas), handling structured/semi-structured formats

(Parquet, JSON, Avro, ORC), and metadata-driven ingestion.

Experienced in CI/CD pipelines using GitHub Actions, Jenkins, Docker, Kubernetes, and familiar with Agile methodologies (JIRA, Confluence) for sprint planning, deployment, and collaboration.

Exposure to ML workflows with model deployment in SageMaker and Azure ML for use cases such as fraud detection, healthcare risk prediction, and customer segmentation.

Deep understanding of SQL and NoSQL databases, including:

SQL/T-SQL/PL-SQL – Oracle, SQL Server, PostgreSQL, MySQL, Azure SQL, Hive

NoSQL – DynamoDB, MongoDB, Redis, HBase

Deep expertise in ETL tools such as Apache NiFi, Informatica, AWS Glue, and ADF, delivering insights through BI tools like Power BI, Tableau, and Looker. Category Skills

Programming Languages Python, Scala, SQL, Java, Shell Scripting, C#, R, HTML, CSS, JavaScript Big Data Tools Apache Hadoop, Apache Spark (PySpark, Spark-Scala), Apache Hive, Apache Kafka, Apache NiFi, Apache HBase, Ruby, Rust, GO

Data Warehousing Snowflake, Amazon Redshift, Azure Synapse Analytics, Apache Storm, MSK, Thoughtspot

Cloud Platforms Azure: ADLS Gen2, Azure Data Factory, Azure Databricks, Azure Synapse, Azure Key Vault, PurviewAWS: S3, Glue, Redshift, EMR, Lambda, Kinesis ETL & Data Pipelines Azure Data Factory, AWS Glue, Informatica, Apache NiFi, Sqoop, Oozie, Spark Structured Streaming, Databricks, FHIR

Databases SQL: Oracle, MySQL, SQL Server, PostgreSQL. NoSQL: Cosmos DB, DynamoDB, MongoDB, HBase, Apache Flink, Vertica

Streaming & Messaging Apache Kafka, AWS Kinesis, Azure Event Hubs Scheduling/Orchestration Apache Airflow, Azure Data Factory (Triggers), Oozie, Jenkins, Cassandra Data Formats JSON, CSV, XML, AVRO, ORC, Parquet, SSIS, SSRS PROFESSIONAL SUMMARY

TECHNICAL SKILLS

Data Modeling Star Schema, Snowflake Schema, Dimensional Modeling, OLAP Cubes (Kyvos) Security & Access Control Apache Ranger, Azure Key Vault, IAM Roles, Data Encryption, Column Masking, Row-Level Security

Monitoring & Logging Log4j, Spark History Server, AWS CloudWatch DevOps Tools GitHub, GitLab, Jenkins, Azure DevOps, Terraform Data Quality/Testing Great Expectations, Custom PySpark Validations, Null/Range/Schema Checks Visualization Tools Power BI, Tableau, Kyvos

Containerization Docker (basic usage for packaging PySpark and utility scripts) Documentation &

Collaboration

JIRA, Confluence, SharePoint, Lucidchart

SDLC Methodologies Agile (Scrum), Waterfall

Compliance Knowledge HIPAA, GDPR, PII/PHI Handling Policies Cigna- Plano, TX (July 2022 – Present)

Senior Data Engineer

Designed and implemented a scalable Azure Data Lakehouse architecture (ADLS Gen2 – raw, staging, curated layers), ensuring HIPAA compliance and secure healthcare data processing.

Developed ETL pipelines using Azure Data Factory (ADF) and PySpark, integrating data from SQL Server, REST APIs, and flat files to enable seamless ingestion and transformation.

Built real-time ingestion pipelines using Apache Kafka Structured Streaming, achieving <2 min latency for event-driven healthcare workflows.

Optimized Delta Lake tables with compaction, Z-ordering, and caching, reducing compute costs by 25% and improving dashboard refresh time by 40%.

Built and maintained Elasticsearch/OpenSearch indices for healthcare claims and provider data, enabling sub-second search and retrieval for operational analytics.

Created Kibana dashboards for monitoring ETL jobs, Kafka streams, and healthcare compliance metrics, providing real-time visibility to business stakeholders.

Implemented data quality checks using schema validation, null handling, and business rule validation to ensure 99.9% accuracy of healthcare datasets.

Enforced data governance and lineage with Azure Purview, ensuring regulatory compliance and audit readiness for Medicare Advantage risk adjustment.

Designed and deployed CI/CD pipelines with GitHub Actions and Azure DevOps for automated testing, version control, and reproducible deployments.

Mentored junior engineers on PySpark best practices and cloud architecture, driving team-wide adoption of optimized ETL standards and improving productivity. American Express Bank - New York City, NY (Dec 2019 – Aug 2021) Data Engineer

Designed and developed ETL pipelines using AWS Glue + Spark-Scala on EMR, ingesting millions of credit card transactions into S3, improving data availability by 40%.

Built real-time fraud detection pipelines integrating Apache Kafka + Spark Streaming, delivering sub-second alerts and reducing fraud by 35%.

Developed embedding-based anomaly detection models, storing vectors in FAISS/Pinecone for nearest- neighbor search to identify suspicious transaction patterns. PROFESSIONAL EXPERIENCE

Managed and optimized PostgreSQL Redshift migration, improving query performance by 50% and reducing storage cost.

Built and maintained DBT transformation models in Redshift, standardizing business logic across 25+ datasets and cutting ETL time by 30%.

Implemented Elasticsearch/OpenSearch indices for transaction logs, enabling fast text search and compliance audits.

Delivered Kibana and Power BI dashboards for fraud detection and GAAP compliance reporting with 1- minute latency.

Implemented data lineage and governance controls in AWS Glue Data Catalog, ensuring regulatory compliance and audit readiness.

Automated deployments using Jenkins CI/CD pipelines integrated with GitHub for consistent release cycles.

Collaborated with finance and compliance teams, ensuring U.S. GAAP reporting accuracy and embedding ML-driven fraud insights into financial dashboards. Lululemon- Fort Worth, TX (Jan 2018 – Dec 2019)

Data Engineer

Designed and implemented an on-prem Hadoop Data Lake using Cloudera, structuring raw curated zones in HDFS for POS, returns, and behavior data.

Developed Spark-Scala jobs for cleaning, deduplication, and aggregation of large datasets, outputting partitioned ORC tables for downstream analytics.

Built Hive dimension and fact tables using dynamic partitioning and pre-aggregations to support merchandise planning and markdown analysis.

Implemented Sqoop ingestion pipelines for Oracle ERP data into HDFS/Hive, using CDC logic to minimize reloads.

Scheduled Oozie workflows chaining Spark, Hive, and Sqoop jobs with automated error handling, integrated into Control-M for enterprise scheduling.

Integrated Kafka + Spark Streaming pipelines for real-time inventory and fraud detection, storing logs in HBase for instant lookups.

Exposed inventory and sales datasets via REST APIs and GraphQL endpoints, enabling mobile apps and merchandising teams to access curated data with sub-second query responses.

Developed Tableau dashboards powered by Hive marts for store performance and top-selling categories, improving decision-making speed by 30%.

Applied encryption, masking, and Apache Ranger policies to secure sensitive retail data, ensuring GDPR compliance.

Migrated legacy Informatica mappings to reusable Spark-Scala functions, improving ETL performance by 40% and reducing operational costs.

Master’s in computer science engineering

Lewis University

EDUCATION



Contact this candidate