Data Engineer

Location:

Boston, MA

Posted:

June 04, 2025

Contact this candidate

Resume:

AMULYA NITTALA

857-***-**** ************@*****.*** https://www.linkedin.com/in/amulya-nittala/

PROFESSIONAL SUMMARY

Proficient IT professional with over 5 years of experience, specialized in Azure Data Engineering and Big Data Systems for data acquisition, ingestion, modelling, storage, analysis, and processing.

Extensive hands-on experience with Microsoft Azure services such as Azure Databricks, Synapse Analytics, Azure Data Factory, ADLS, Data Lakehouse, Delta Tables, and Azure SQL Database by setting up elastic poo jobs and designing tabular models.

Strong knowledge in working ETL/ELT pipelines for data extraction, loading and transformation to prepare data and load it into target database through data warehousing for reporting and analysis.

Implemented frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation, and Data Profiling using technologies in Big Data.

Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, HBase, Spark SQL, MapReduce framework.

Designed ETL pipelines using Snowflake, PySpark, SQL, and SnowSQL, automating CI/CD with dbt, Git, and Azure DevOps, and enabling continuous integration of dynamic data models for robust business reporting.

Proficient with AWS S3, EC2, Athena, Redshift to provide a complete solution for computing, query processing, and storage across a wide range of applications.

Experienced working with snowflake and has ingested data into the snowflake cloud data warehouse using Snow pipe.

Extensive experience with Python coding using arithmetic operations such as pandas, NumPy.

Implemented Python alongside various libraries such as matplotlib for charts and graphs, MySQL for database connectivity, and Python libraries like panda’s data frame, network, and urllib2.

Advanced level complex SQL queries with CTEs, window functions, and recursive logic for KPI tracking, data modeling using star and snowflake schemas to support scalable analytics, improving dashboard performance.

Led performance tuning, job failure resolution, and debugging across Spark and Azure clusters; applied advanced data transformations, schema drift handling, and cluster performance tuning to reduce pipeline latency and improve SLA adherence.

Built data integration pipelines ingesting from relational, non-relational, and streaming sources; automated transformation and transfer processes to AWS S3, Azure Blob Storage, Redshift, and Snowflake, enabling analytics across hybrid data platforms.

Supported predictive modeling workflows by performing feature engineering, outlier detection, signal smoothing, and regression modeling (Ridge, Lasso), improving model performance and forecast accuracy in trading and operations use cases.

Integrated GitHub Actions, Azure DevOps, and containerized jobs to implement robust CI/CD pipelines across data workflows, reducing deployment time and enabling automated testing, versioning, and monitoring of data applications.

Skilled in using Azure authentication and authorization and experience in using Visualizations tools like Tableau, Power BI. Basic hands-on experience working with Kusto.

Worked with cross-functional teams—including data science, DevOps, and business units—to translate requirements into scalable solutions, participated in Agile ceremonies, and helped reduce manual workload across teams through automation.

TECHNICAL SKILLS

Programming Languages

SQL (MS SQL, MySQL, Oracle, PostgreSQL), Python, R, Java, PySpark, C#

Microsoft Azure

Azure Data Factory, Synapse Analytics, ADLS, IoT Hub, Active Directory, Key Vault, HDInsight

Amazon AWS

EC2, EMR, S3, Redshift, Athena, Glue, StepFunction, Lambda, IAM

Software and Tools

Jupyter Notebooks, Excel, GitHub, Alteryx, Tableau, CI/CD, Jira, Snowflake

Relational Databases

Oracle SQL Server, MySQL, MS SQL Server, Azure SQL Database

NoSQL Databases

MongoDB, Neo4j, HBase

Reporting

PowerBI, Tableau

Operation Systems

Linux, Unix, Window, macOS

Methodologies

Agile, Waterfall

PROFESSIONAL EXPERIENCE

American Express New York, NY

Data Engineer May 2024 – Present

Engineered end-to-end data ingestion pipelines using Azure IoT Hub and Azure Data Lake Storage Gen2 to process real-time transaction logs, ATM telemetry, and fraud signals from retail branches and mobile platforms

Orchestrated batch and streaming workflows in Azure Data Factory, implementing Mapping Data Flows with schema drift handling to normalize multi-source transaction data for compliance reporting and audit readiness

Designed and built scalable pipelines using PySpark and Delta Lake in Azure Databricks, enabling a hybrid real-time and batch processing framework capable of handling over 5M+ transaction records/day

Built ingestion pipelines in Azure Data Factory to integrate structured (SQL Server, Oracle) and semi-structured (JSON logs, Kafka streams) data sources into a unified data warehouse

Provisioned and managed Databricks clusters for real-time fraud analytics and regulatory checks, installing custom libraries for anomaly detection, time-series modeling, and transaction scoring

Conducted advanced data transformations and signal preprocessing (e.g., outlier removal, smoothing) to improve data quality by 90%, supporting cleaner AML (Anti-Money Laundering) and KYC (Know Your Customer) analytics

Detected suspicious spending behaviors and internal inefficiencies through trend analysis and event-based anomaly detection, helping the bank reduce potential fraud risk and achieve 15% operational cost savings

Migrated data from Azure to Snowflake using SnowSQL, SnowPipe, and Python scripts, supporting downstream analytics for regulatory teams and executive dashboards

Automated Snowflake transformation workflows using dbt version-controlled in Git, and deployed via Azure DevOps CI/CD pipelines, enabling continuous and auditable updates to core banking models

Wrote optimized SQL queries against Snowflake to compliance reports, exposure risk summaries, and customer segmentation profiles for data science and credit risk teams

Built automated ETL pipelines and reporting layers that powered real-time alerts for large transaction anomalies and daily historical summaries for internal audit teams

Worked cross-functionally with fraud detection, infrastructure, and compliance teams to ensure the platform aligned with internal data governance and regulatory reporting

Environment: Azure IoT Hub, Azure DevOps, ADLS, Azure Databricks, Snowflake, SnowPipe, Spark

Verizon Boston, MA

Data Engineer Oct 2022 – May 2024

Collaborated with Business Analysts and Subject Matter Experts (SMEs) from network operations and customer service teams to gather requirements and translate them into data engineering solutions for telecom usage analytics and churn prediction

Partnered with ETL developers to ensure high-quality call detail records (CDRs) and usage logs were processed using Hive, supporting accurate KPIs for dropped calls, signal strength, and data consumption

Automated extraction and transformation of daily network traffic and customer interaction data into CSV, then securely transferred it to AWS S3 via EC2 scripts and loaded into Redshift for business intelligence reporting

Performed statistical profiling on call drop rates, latency metrics, and data throughput—calculating metrics like variance, skewness, and burst rate to identify performance bottlenecks

Utilized PySpark and Pandas to compute technical indicators such as network congestion thresholds, session duration trends, and usage anomalies, storing the output in the data warehouse to support both real-time and historical analytics

Optimized existing Hadoop-based algorithms used for signal quality scoring using Spark Context, RDDs, and Spark SQL, improving pipeline throughput by 30% and scalability across network zones

Developed infrastructure provisioning scripts using Terraform to manage scalable and secure AWS environments for ingesting and processing telecom telemetry data

Integrated Spark with Hadoop clusters to support batch processing of CDRs and graph-based network topology analysis using GraphX, identifying weak signal nodes and optimizing routing

Created custom data validation frameworks in Python to ensure end-to-end consistency of time-series and geo-spatial datasets from tower logs, GPS pings, and user devices

Built and tuned machine learning models using Ridge and Lasso regression to forecast total call volume and predict customer churn based on usage patterns and complaint frequency

Automated telecom data engineering workflows using Python, reducing manual intervention by 40% and significantly improving turnaround for operational reports

Queried and joined structured telecom data across Amazon S3 and HDFS using Spark SQL, powering executive dashboards for network health and performance

Enhanced predictive model performance through polynomial feature transformations and domain-specific feature selection, improving churn model accuracy by 22%

Environment: AWS S3, EC2, Redshift, Spark, SQL, HDFS, Hadoop, Python

T-Mobile Bellevue, WA

Data Analyst Aug 2021 – Sep 2022

Built SQL queries using CTEs, window functions, hierarchical, and recursive queries to extract and aggregate audit data, creating robust datasets

for KPI dashboards in Tableau

Built robust and scalable data integration (ETL) pipelines using SQL, Python and Spark, designed and developed data model using star and snowflake schema using Erwin to handle Slowly Changing Dimensions and normalization for efficient data storage

Utilized Azure services such as Azure Data Factory, Azure Logic Apps, Azure Blob Storage, Azure Virtual Machines, Power BI, Azure Event Hubs, and Azure Monitor to build a robust project framework

Work together with the DWA to enforce data modeling best practices within Power BI, ensuring consistency, scalability, and ease of maintenance

Conducted research on data mining tools, protocols, and standards to drive informed procurement and development decisions

Developed QC reports to identify data anomalies and trend differences for high revenue automation frameworks to expedite the QC repots, trending reports and adhoc reporting requirements to validate data

Prepared and optimized data definitions for new and existing database tables to support comprehensive data analysis

Improved audit efficiency by 30% through enhanced dashboard interactivity with drill-through, cross-filtering, and bookmarking allowing data exploration at different levels of granularity, generating insights into daily to yearly trends

Provide regular reporting on Power BI usage, data consumption, and query performance within the data warehouse, working with the DWA to identify areas for improvement

Troubleshoot and fix the production automated job failures, data investigation, provide a solution to the issue and deploy updated code into production

Managed high-quality datasets for internal and external consumption, ensuring data integrity and security by collaborated with DevOps and infrastructure teams to design Terraform-based data engineering solutions

Environment: Azure Data Factory, Azure Blob Storage, Virtual Machines, PowerBI, Python, SQL

TCS India

Data Analyst Jun 2019 – Jul 2021

Designed and developed multiple modules within a Hadoop ecosystem using Apache Spark (Core, SQL, MLlib, Streaming, and Structured Streaming) to support large-scale data processing

Engineered Spark Streaming workflows for geospatial analysis, enabling real-time insights from vehicle telematics and location-based data

Created interactive dashboards and analytical reports in Tableau, delivering business-critical insights such as VIN-level uptime, dealer location mapping, vehicle trajectory tracking, and fault code analysis

Partnered with the Data Science team to visualize analytical outcomes and KPIs via Tableau for executive stakeholders

Developed RESTful microservices that interfaced with Azure Cosmos DB (IoT data) and integrated with Kafka to transfer data into a Hadoop data lake

Built and maintained Spark batch jobs for uptime tracking and data science pipelines, supporting real-time and historical analysis

Implemented a geospatial data science framework using Magellan and GeoSpark, extracting actionable insights from large-scale spatial datasets

Built real-time structured streaming pipelines in Apache Spark 2.2, ingesting data from Azure IoT Hub and Azure Event Hub, processing millions of messages per second, and persisting to HDFS

Integrated external datasets (e.g., weather, crash reports, potholes, and highway geospatial data) with real-time telematics streams to enrich context and accuracy of vehicle insights

Optimized Spark SQL queries to enhance data pipeline performance and ensure scalability for growing data volumes

Developed streaming applications to aggregate telematics data from various service providers via Kafka and persist it into HBase tables for downstream analytics

Environment: Azure Cosmos DB, Azure IoT Hub, Hadoop, HDFS, Spark, Magellan, GeoSpark, Kafka, HBase, Tableau

Certifications: Microsoft Azure Data Fundamentals, Microsoft Azure Data Engineer Associate

Contact this candidate