AMULYA NITTALA
857-***-**** ************@*****.*** https://www.linkedin.com/in/amulya-nittala/
PROFESSIONAL SUMMARY
Proficient IT professional with over 5 years of experience, specialized in Azure Data Engineering and Big Data Systems for data acquisition, ingestion, modelling, storage, analysis, and processing.
Extensive hands-on experience with Microsoft Azure services such as Azure Databricks, Synapse Analytics, Azure Data Factory, ADLS, Data Lakehouse, Delta Tables, and Azure SQL Database by setting up elastic poo jobs and designing tabular models.
Strong knowledge in working ETL/ELT pipelines for data extraction, loading and transformation to prepare data and load it into target database through data warehousing for reporting and analysis.
Implemented frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation, and Data Profiling using technologies in Big Data.
Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, HBase, Spark SQL, MapReduce framework.
Designed ETL pipelines using Snowflake, PySpark, SQL, and SnowSQL, automating CI/CD with dbt, Git, and Azure DevOps, and enabling continuous integration of dynamic data models for robust business reporting.
Proficient with AWS S3, EC2, Athena, Redshift to provide a complete solution for computing, query processing, and storage across a wide range of applications.
Experienced working with snowflake and has ingested data into the snowflake cloud data warehouse using Snow pipe.
Extensive experience with Python coding using arithmetic operations such as pandas, NumPy.
Implemented Python alongside various libraries such as matplotlib for charts and graphs, MySQL for database connectivity, and Python libraries like panda’s data frame, network, and urllib2.
Advanced level complex SQL queries with CTEs, window functions, and recursive logic for KPI tracking, data modeling using star and snowflake schemas to support scalable analytics, improving dashboard performance.
Led performance tuning, job failure resolution, and debugging across Spark and Azure clusters; applied advanced data transformations, schema drift handling, and cluster performance tuning to reduce pipeline latency and improve SLA adherence.
Built data integration pipelines ingesting from relational, non-relational, and streaming sources; automated transformation and transfer processes to AWS S3, Azure Blob Storage, Redshift, and Snowflake, enabling analytics across hybrid data platforms.
Supported predictive modeling workflows by performing feature engineering, outlier detection, signal smoothing, and regression modeling (Ridge, Lasso), improving model performance and forecast accuracy in trading and operations use cases.
Integrated GitHub Actions, Azure DevOps, and containerized jobs to implement robust CI/CD pipelines across data workflows, reducing deployment time and enabling automated testing, versioning, and monitoring of data applications.
Skilled in using Azure authentication and authorization and experience in using Visualizations tools like Tableau, Power BI. Basic hands-on experience working with Kusto.
Worked with cross-functional teams—including data science, DevOps, and business units—to translate requirements into scalable solutions, participated in Agile ceremonies, and helped reduce manual workload across teams through automation.
TECHNICAL SKILLS
Programming Languages
SQL (MS SQL, MySQL, Oracle, PostgreSQL), Python, R, Java, PySpark, C#
Microsoft Azure
Azure Data Factory, Synapse Analytics, ADLS, IoT Hub, Active Directory, Key Vault, HDInsight
Amazon AWS
EC2, EMR, S3, Redshift, Athena, Glue, StepFunction, Lambda, IAM
Software and Tools
Jupyter Notebooks, Excel, GitHub, Alteryx, Tableau, CI/CD, Jira, Snowflake
Relational Databases
Oracle SQL Server, MySQL, MS SQL Server, Azure SQL Database
NoSQL Databases
MongoDB, Neo4j, HBase
Reporting
PowerBI, Tableau
Operation Systems
Linux, Unix, Window, macOS
Methodologies
Agile, Waterfall
PROFESSIONAL EXPERIENCE
American Express New York, NY
Data Engineer May 2024 – Present
Engineered end-to-end data ingestion pipelines using Azure IoT Hub and Azure Data Lake Storage Gen2 to process real-time transaction logs, ATM telemetry, and fraud signals from retail branches and mobile platforms
Orchestrated batch and streaming workflows in Azure Data Factory, implementing Mapping Data Flows with schema drift handling to normalize multi-source transaction data for compliance reporting and audit readiness
Designed and built scalable pipelines using PySpark and Delta Lake in Azure Databricks, enabling a hybrid real-time and batch processing framework capable of handling over 5M+ transaction records/day
Built ingestion pipelines in Azure Data Factory to integrate structured (SQL Server, Oracle) and semi-structured (JSON logs, Kafka streams) data sources into a unified data warehouse
Provisioned and managed Databricks clusters for real-time fraud analytics and regulatory checks, installing custom libraries for anomaly detection, time-series modeling, and transaction scoring
Conducted advanced data transformations and signal preprocessing (e.g., outlier removal, smoothing) to improve data quality by 90%, supporting cleaner AML (Anti-Money Laundering) and KYC (Know Your Customer) analytics
Detected suspicious spending behaviors and internal inefficiencies through trend analysis and event-based anomaly detection, helping the bank reduce potential fraud risk and achieve 15% operational cost savings
Migrated data from Azure to Snowflake using SnowSQL, SnowPipe, and Python scripts, supporting downstream analytics for regulatory teams and executive dashboards
Automated Snowflake transformation workflows using dbt version-controlled in Git, and deployed via Azure DevOps CI/CD pipelines, enabling continuous and auditable updates to core banking models
Wrote optimized SQL queries against Snowflake to compliance reports, exposure risk summaries, and customer segmentation profiles for data science and credit risk teams
Built automated ETL pipelines and reporting layers that powered real-time alerts for large transaction anomalies and daily historical summaries for internal audit teams
Worked cross-functionally with fraud detection, infrastructure, and compliance teams to ensure the platform aligned with internal data governance and regulatory reporting
Environment: Azure IoT Hub, Azure DevOps, ADLS, Azure Databricks, Snowflake, SnowPipe, Spark
Verizon Boston, MA
Data Engineer Oct 2022 – May 2024
Collaborated with Business Analysts and Subject Matter Experts (SMEs) from network operations and customer service teams to gather requirements and translate them into data engineering solutions for telecom usage analytics and churn prediction
Partnered with ETL developers to ensure high-quality call detail records (CDRs) and usage logs were processed using Hive, supporting accurate KPIs for dropped calls, signal strength, and data consumption
Automated extraction and transformation of daily network traffic and customer interaction data into CSV, then securely transferred it to AWS S3 via EC2 scripts and loaded into Redshift for business intelligence reporting
Performed statistical profiling on call drop rates, latency metrics, and data throughput—calculating metrics like variance, skewness, and burst rate to identify performance bottlenecks
Utilized PySpark and Pandas to compute technical indicators such as network congestion thresholds, session duration trends, and usage anomalies, storing the output in the data warehouse to support both real-time and historical analytics
Optimized existing Hadoop-based algorithms used for signal quality scoring using Spark Context, RDDs, and Spark SQL, improving pipeline throughput by 30% and scalability across network zones
Developed infrastructure provisioning scripts using Terraform to manage scalable and secure AWS environments for ingesting and processing telecom telemetry data
Integrated Spark with Hadoop clusters to support batch processing of CDRs and graph-based network topology analysis using GraphX, identifying weak signal nodes and optimizing routing
Created custom data validation frameworks in Python to ensure end-to-end consistency of time-series and geo-spatial datasets from tower logs, GPS pings, and user devices
Built and tuned machine learning models using Ridge and Lasso regression to forecast total call volume and predict customer churn based on usage patterns and complaint frequency
Automated telecom data engineering workflows using Python, reducing manual intervention by 40% and significantly improving turnaround for operational reports
Queried and joined structured telecom data across Amazon S3 and HDFS using Spark SQL, powering executive dashboards for network health and performance
Enhanced predictive model performance through polynomial feature transformations and domain-specific feature selection, improving churn model accuracy by 22%
Environment: AWS S3, EC2, Redshift, Spark, SQL, HDFS, Hadoop, Python
T-Mobile Bellevue, WA
Data Analyst Aug 2021 – Sep 2022
Built SQL queries using CTEs, window functions, hierarchical, and recursive queries to extract and aggregate audit data, creating robust datasets
for KPI dashboards in Tableau
Built robust and scalable data integration (ETL) pipelines using SQL, Python and Spark, designed and developed data model using star and snowflake schema using Erwin to handle Slowly Changing Dimensions and normalization for efficient data storage
Utilized Azure services such as Azure Data Factory, Azure Logic Apps, Azure Blob Storage, Azure Virtual Machines, Power BI, Azure Event Hubs, and Azure Monitor to build a robust project framework
Work together with the DWA to enforce data modeling best practices within Power BI, ensuring consistency, scalability, and ease of maintenance
Conducted research on data mining tools, protocols, and standards to drive informed procurement and development decisions
Developed QC reports to identify data anomalies and trend differences for high revenue automation frameworks to expedite the QC repots, trending reports and adhoc reporting requirements to validate data
Prepared and optimized data definitions for new and existing database tables to support comprehensive data analysis
Improved audit efficiency by 30% through enhanced dashboard interactivity with drill-through, cross-filtering, and bookmarking allowing data exploration at different levels of granularity, generating insights into daily to yearly trends
Provide regular reporting on Power BI usage, data consumption, and query performance within the data warehouse, working with the DWA to identify areas for improvement
Troubleshoot and fix the production automated job failures, data investigation, provide a solution to the issue and deploy updated code into production
Managed high-quality datasets for internal and external consumption, ensuring data integrity and security by collaborated with DevOps and infrastructure teams to design Terraform-based data engineering solutions
Environment: Azure Data Factory, Azure Blob Storage, Virtual Machines, PowerBI, Python, SQL
TCS India
Data Analyst Jun 2019 – Jul 2021
Designed and developed multiple modules within a Hadoop ecosystem using Apache Spark (Core, SQL, MLlib, Streaming, and Structured Streaming) to support large-scale data processing
Engineered Spark Streaming workflows for geospatial analysis, enabling real-time insights from vehicle telematics and location-based data
Created interactive dashboards and analytical reports in Tableau, delivering business-critical insights such as VIN-level uptime, dealer location mapping, vehicle trajectory tracking, and fault code analysis
Partnered with the Data Science team to visualize analytical outcomes and KPIs via Tableau for executive stakeholders
Developed RESTful microservices that interfaced with Azure Cosmos DB (IoT data) and integrated with Kafka to transfer data into a Hadoop data lake
Built and maintained Spark batch jobs for uptime tracking and data science pipelines, supporting real-time and historical analysis
Implemented a geospatial data science framework using Magellan and GeoSpark, extracting actionable insights from large-scale spatial datasets
Built real-time structured streaming pipelines in Apache Spark 2.2, ingesting data from Azure IoT Hub and Azure Event Hub, processing millions of messages per second, and persisting to HDFS
Integrated external datasets (e.g., weather, crash reports, potholes, and highway geospatial data) with real-time telematics streams to enrich context and accuracy of vehicle insights
Optimized Spark SQL queries to enhance data pipeline performance and ensure scalability for growing data volumes
Developed streaming applications to aggregate telematics data from various service providers via Kafka and persist it into HBase tables for downstream analytics
Environment: Azure Cosmos DB, Azure IoT Hub, Hadoop, HDFS, Spark, Magellan, GeoSpark, Kafka, HBase, Tableau
Certifications: Microsoft Azure Data Fundamentals, Microsoft Azure Data Engineer Associate