ALISHA RUQSHAN KADIRI
******************@*****.*** 716-***-**** linkedin.com/in/alishark/ github.com/AlishaRuqshan EDUCATION
Masters, Data Science, University at Buffalo, The State University of New York Jan 2023 – May 2024 Bachelors, Computer Science, Sri Venkateswara College of Engineering, Tirupati, A.P Technologies
Programming Query Languages: Python (Pandas, NumPy), SQL, PL/SQL, Java, Shell, R, JavaScript, C Data Engineering Tools: Databricks, Apache Spark (PySpark), Delta Lake, Delta Live Tables, dbt, Airflow, ADF, Azure Synapse, Git, Jenkins, CI/CD
Cloud Platforms: Azure (Data Factory, Microsoft Fabric Synapse, Data Lake, Key Vault, DevOps), AWS (S3, RDS, Glue, Lambda, Step Functions, Secrets Manager), GCP
Databases: Oracle, Azure SQL, SQL Server, Amazon RDS, PostgreSQL, MySQL, Snowflake, DynamoDB, DB2, Cassandra Visualization Reporting: Power BI, Tableau, Qlik Sense, Excel (Pivot Tables, Charts), Salesforce Dashboards, Confluence Data Governance Modeling: Unity Catalog, Data Lineage, Metadata Documentation, Data Mapping, ERD (MS Visio), UML, SCD Types APIs Integration Tools: REST APIs, Swagger, Postman, JSON, XML, Salesforce Data Loader, Optymyze Machine Learning Concepts: Supervised & Unsupervised Learning, Regression, Classification, Clustering (K-Means, KNN), SVM, PCA, Time Series, Transformers, LLMs, Model Evaluation, Hyperparameter Tuning Machine Learning Concepts: Supervised & Unsupervised Learning, Regression, Classification, Clustering (K-Means, KNN), SVM, PCA, Time Series Forecasting, Transformers, LLMs, Model Evaluation, Hyperparameter Tuning Summary
• Built scalable ETL/ELT pipelines across Azure, AWS, and hybrid environments.roven ability to gather business requirements, define KPIs, and document workflows using BRDs, FRDs, and RTM in Waterfall and Agile SDLCs.
• Proficient in developing ADF pipelines with ARM templates for batch automation, job orchestration, and error handling.
• Experienced in Databricks, PySpark, and Delta Lake for large-scale data processing and real-time enrichment.
• Integrated on-prem data from SQL Server, DB2, and Cassandra into Azure SQL and Data Lake using secure patterns.
• Led SSIS-to-cloud migrations using Azure Synapse, Glue, and event-based triggers for modernized data architecture.
• Skilled in data modeling, SCD logic, and metadata-driven pipelines using Unity Catalog and enterprise documentation tools.
• Built CI/CD pipelines using Azure DevOps, GitHub Actions, and REST APIs for version control and multi-stage deployment.
• Troubleshot issues in Databricks clusters, integration runtimes, and VNet/storage configs across complex cloud environments.
• Converted business needs into data flows using ERDs, UMLs, KPIs, and Agile documentation to support delivery planning and tracking.
• Certified Azure and AWS Data Engineer with an M.S. in Data Science and proven record of enterprise data project success. Experience
Data Engineer, Department of Environmental Protection of Florida – Tallahassee, FL Oct 2024 – Present
• Develop, maintain end-to-end ETL pipelines using SQL and PL/SQL to load and export environmental data into Oracle for EPA reporting.
• Engineer scalable workflows to ingest and merge field and lab datasets, streamlining compliance reporting across 6,950+ monitoring stations.
• Detect anomalies and environmental trends using advanced SQL analytics, enabling proactive corrections and audit-ready datasets.
• Build, optimize regulatory SQL workflows, collaborating with data scientists, QA teams, compliance managers to ensure reporting accuracy.
• Perform geospatial processing in ArcGIS and R, extracting spatial patterns for visual impact assessments and EPA audits.
• Tune Oracle queries and indexes to reduce pipeline latency by 40
• Modernize legacy pipelines using Microsoft Fabric components including Data Factory, Synapse, and Lakehouse to streamline compliance reporting architecture.
• Build unified reports in Power BI (Fabric) with direct lake access to visualize KPIs, station metrics, and real-time compliance indicators.
• Document data flows, data lineage, and business rules across Fabric workloads to support governance, reusability, and audit transparency.
• Drive ongoing infrastructure upgrades and team adoption of Fabric-native workflows, simplifying operations and enhancing delivery speed. Data Engineer, IBM – Bengaluru, KA Mar 2021 – Dec 2022
• Built scalable ETL pipelines using Python and advanced SQL to consolidate CRM, web, and sales data into a unified Snowflake warehouse.
• Designed a dimensional model in Snowflake with clustering and materialized views to optimize performance for customer segmentation analytics.
• Developed modular ELT workflows using dbt and Git, enabling versioned transformations with auto-documentation and reusable macros.
• Ingested raw clickstream and transaction data from AWS S3, processed using PySpark on EMR to support daily customer insight reporting.
• Orchestrated multi-stage data pipelines using Apache Airflow, incorporating SLA monitoring, failure alerts, and dependency management.
• Integrated AWS Glue, Lambda, and Athena with Snowflake to build a hybrid pipeline for batch and on-demand processing needs.
• Modeled Slowly Changing Dimensions (SCD Type 2) and surrogate keys in Snowflake to enable full historical tracking of customer profiles.
• Automated CI/CD for dbt transformations with GitHub Actions, incorporating lint checks, model testing, and staging deployments.
• Deployed data quality microservices using Docker, and configured isolated validation environments using Kubernetes (EKS) clusters.
• Collaborated with internal teams to design and implement a Customer 360 solution using AWS and Snowflake for unified customer analytics. Software Engineer, IBM – Bengaluru, KA Apr 2020 – Mar 2021
• Built ETL pipelines using AWS Glue and Python to extract and transform data from RDS, S3, and SQL Server into partitioned S3 layers.
• Designed daily full and incremental load workflows using Step Functions and job bookmarks to manage state across Glue jobs.
• Orchestrated workflows with Step Functions and Lambda, incorporating Wait, Choice, and failover handling for robust data pipelines.
• Automated metadata updates and logging by integrating Athena queries and Lambda functions into ingestion pipelines.
• Migrated legacy SSIS packages to AWS Glue and S3-based workflows, reducing manual intervention and batch delays.
• Transferred structured and unstructured data from SQL Server, DB2, and Cassandra to S3, registering schemas in Glue Catalog.
• Implemented CloudWatch and S3 event triggers to launch pipelines and monitor batch loads across raw and processed zones.
• Processed JSON, CSV, and Parquet files using PySpark in Databricks, performing joins, filters, and window operations at scale.
• Scheduled and managed Databricks jobs using Workflows with REST API hooks for cross-platform orchestration and error capture.
• Resolved compute and memory issues in Databricks clusters, optimizing configurations for better stability and parallelism.
• Queried raw S3 data using Athena and Redshift Spectrum to validate ingested files and perform schema-on-read transformations.
• Secured access credentials using AWS Secrets Manager, integrated with Glue and Databricks for token-based auth flows. Associate Software Engineer, Xoriant – Chennai, TN Jun 2017 – Apr 2020
• Designed reusable ADF pipelines using Copy, Web, ForEach, and Stored Procedure to automate on-prem to cloud data migrations.
• Migrated enterprise data from SQL Server and DB2 to Azure SQL, Data Lake, and SQL DW using parameterized ADF pipelines.
• Scheduled and triggered ADF workflows via SQL Server Agent for daily, weekly, and monthly batch loads across multiple data domains.
• Investigated failures using Azure Monitor, pipeline logs, and diagnostics to troubleshoot job issues and minimize runtime delays.
• Executed full SQL migrations to Azure SQL and Managed Instance, validating schema, indexes, stored procedures, and performance.
• Built scalable data models and transformations in Azure SQL DW for reporting, analytics, and cross-system data access use cases.
• Automated deployment of ADF resources using ARM templates for linked services, datasets, and environment-specific configurations.
• Processed semi-structured batch files via HDInsight clusters integrated with blob storage and orchestrated using ADF pipelines.
• Resolved infrastructure issues involving storage permissions, VNet security, and integration runtime configs during cloud migration phases.
• Collaborated in an onsite-offshore delivery model, managing development tasks, handovers, and pipeline troubleshoot with distributed teams. Data Engineer Intern, Xoriant – Chennai, TN Dec 2016 – May 2017
• Streamlined data ingestion by uploading and organizing large datasets into Amazon S3 with structured folder hierarchies and metadata tagging, enabling faster access and improved traceability for analytics teams.
• Automated routine data workflows by developing Python scripts using Pandas for data cleaning and reporting; integrated with Amazon S3 and RDS to support scalable data processing and reduce manual effort by 60%.
• Conducted exploratory data analysis (EDA) using advanced Excel functions and SQL queries on Amazon RDS to uncover data quality issues, identify trends, and guide subsequent data transformation and visualization tasks. Certifications
AWS Certified Data Engineer – Associate
Microsoft Certified: Azure Data Engineer Associate Projects
Streaming Analytics Pipeline – Azure, Power BI, Databricks azure-databricks-DLT
• Built a modular, parameterized ELT pipeline in ADF and Databricks to dynamically ingest, transform streaming data across Delta Lake layers.
• Automated ingestion using ADF Set Variable and ForEach activities to parse GitHub metadata, enabling scalable, file-agnostic workflows.
• Executed reusable Databricks notebooks with input parameters increasing development efficiency and modularity.
• Applied weekday-driven conditional logic to trigger weekend-only transformations in Databricks Workflows using lookup task routing.
• Built Bronze, Silver, and Gold layers using Auto Loader, custom enrichment logic, and Delta Live Tables (DLT) with Unity Catalog for lineage tracking and access control.
Modular ELT Pipeline – DBT, Snowflake, Airflow snowflake-dbt-airflow
• Configured secure Snowflake access by creating a dedicated dbt role, schema, warehouse; enforced role-based access for staging, mart layers.
• Developed a modular dbt project with reusable SQL models and Jinja2 macros, enabling efficient transformations and CI-friendly deployment.
• Built an end-to-end orchestration layer using Apache Airflow (Astro CLI) and Bash to automate dbt run/test,enable scheduled DAGs.
• Verified materialized views and tables in Snowflake, validating output via Graph view, log checks to ensure pipeline execution and data quality. LLM Application – HuggingFace, Tokenizers, Transformers LLM-from-scratch
• Built custom BPE tokenizer and fine-tuned transformer models on domain-specific data for LLM experimentation.
• Deployed real-time semantic search using FastAPI on AWS EC2, tracked training via TensorBoard and Weights & Biases.