Data Engineer

Location:

Aubrey, TX

Posted:

March 11, 2025

Contact this candidate

Resume:

Sushmitha Nama

Email id: **************@*****.***

Mobile: 980-***-****

Professional Summary

• Overall 6+ years of IT experience in the Software Development Life Cycle using Waterfall and Agile methodologies.

• Experience in Health care, Insurance, and financial domains.

• Strong expertise in both SQL and NOSQL databases.

• Strong exposure to Apache Spark, Spark Streaming, Spark MLlib frameworks, and developing production-ready Spark applications using Python programming interface with a focus on optimizing real- time and batch processing for large-scale data.

• Developed various spark applications using PySpark to perform various enrichments of user behavioral data.

• Designed and implemented PySpark and Spark-SQL workflows to perform complex ETL operations, converting raw data into structured, consumable formats such as Parquet, JSON, and ORC for downstream analytics.

• Experience in creating Spark applications using PySpark and Spark-SQL in Databricks for data extraction, transformation, and aggregation from various file formats for analysis & transformation of the data to reveal insights into user behavior.

• Experienced in designing and developing data pipelines and analytics solutions using Snowflake Data Warehouse.

• Experience working with different services of AWS, cluster platform provider EMR, file system AWS S3, and Athena services.

• Experience in Migrating SQL Database to Azure Data Lake, Data Bricks, and Azure SQL Data Warehouse and controlling and granting database access and Migrating On-premises databases to Azure Data Lake store using Azure Data Factory.

• Proficient in Dimensional Data Modeling techniques, Star Schema, Snowflake Schema, Fact and Dimension tables.

• Involved in Agile methodologies, daily scrum meetings, and sprint planning.

• Good communication, analytical, and problem-solving skills with the adapting nature of learning new technical and functional skills.

Education

Master Of Science, Computer Science Jan 2022 – July 2023 Texas A&M University-Commerce, Texas GPA: 3.7 / 4.0 Bachelor of Technology, Information Technology Aug 2012 – May 2016 Jawaharlal Nehru Technological University (JNTU), Hyderabad, India GPA: 3.6 / 4.0 Technical Skills

• Programming Languages: Python, SQL, PL/SQL, Java, Shell Scripting

• NoSQL Databases: Cassandra, DynamoDB

• Big Data Ecosystem: Apache Spark, Hive, HDFS, YARN, MapReduce, Sqoop

• ETL Tools: Talend, Informatica PowerCenter

• Hadoop Distributions: AWS EMR, Cloudera, Hortonworks, Azure Databricks

• Data Storage Solutions: MySQL, Oracle, AWS S3, AWS Redshift, Snowflake, Azure Data Lake, Azure Blob Storage

• Data Visualization: Tableau

• Version Control & CI/CD Tools: Git, Bitbucket, Jira, Confluence, CI/CD pipelines

• Other Tools & Services: REST APIs, Apache Airflow, AWS Athena, Azure Key vault, AWS secrets manager, AWS Lambda.

Work Experience

Ventois Inc, Shrewsbury, MA Aug 2023 – Current

Data Engineer

Client: UnitedHealth Group

• Collaborated with insurance departments to define requirements for integrating Claims, Policyholder, and provider network data to support Data Analytics and Reporting.

• Created detailed design documents as per business and functional requirements using notion and lucid chart.

• Consumed data from various source systems which includes web-scale datasets and processed it efficiently using Apache Spark and integrated the data directly into Snowflake for storage and further analysis.

• Developed spark applications using PySpark, and Spark-SQL in Databricks to transform and aggregate data from multiple file formats, then loaded the processed data into Snowflake for business insights and customer usage pattern analysis.

• Implemented ETL processes to populate fact and dimension tables in Snowflake using star and snowflake schema, maintaining data integrity and supporting efficient querying for business intelligence.

• Developed and managed automated ETL pipelines using AWS Glue to ingest and process data in AWS S3 enhancing data availability and reducing manual intervention.

• Implemented multiple Data Pipeline DAGs and Maintenance of DAGs in Airflow orchestration.

• Performed Data Validations by developing custom Python functions to verify schema consistency, duplicate entries, and data format adherence across diverse datasets.

• Used data debugging to work on unit testing and integration testing by executing a range of scenarios for each use case.

• Integrated Python scripts with external APIs, Kafka for real-time data ingestion, which was then processed and stored in the Snowflake Data warehouse for downstream analysis.

• Ensured data compliance with industry regulations (HIPPA, ACA) by implementing data encryption and access control data practices to protect sensitive insurance and healthcare information.

• Developed and deployed real-time Tableau dashboards connected to Snowflake to provide stakeholders with insights into market trends, growth opportunities, and business performance.

• Responsible for taking architectural decisions like estimating cluster size, cluster availability, and troubleshooting related issues in production.

Tech Stack: Python, SQL, Apache Spark, Spark-SQL, PySpark, Delta Lake, AWS S3, Databricks, AWS Glue, Snowflake, Apache Kafka, Tableau

Ventois Inc, Shrewsbury, MA Aug 2020 – Dec 2021

Data Engineer

Client: USAA

• Collaborated with banking product teams, business analysts, and data scientists to gather data requirements for regulatory reporting, risk management, and fraud detection.

• Analyzed existing Netezza legacy systems and processes, mapping out data flow requirements for migration to cloud-based systems Azure Data Lake, Synapse Analytics.

• Profiled data from different sources, such as core banking systems, payment gateways, credit card transactions, and external credit bureaus (e.g., Experian, Equifax).

• Developed end-to-end ETL pipelines using Azure Data Factory, Databricks, and SQL to extract data from core banking systems, clean and transform it for financial reporting and risk analysis.

• Designed and implemented complex data transformation logic using PySpark for cleaning and standardizing banking data, such as credit card transactions and loan applications.

• Implemented data encryption and tokenization techniques in ETL workflows using Azure Key Vault to safeguard sensitive customer information during data transfers between Azure Blob Storage and Azure SQL Database.

• Implemented error handling and retry mechanisms in ETL pipelines using ADF and Databricks, for alerting any failed jobs in high-frequency banking transaction systems.

• Conducted integration testing on ETL pipelines to ensure seamless ingestion of transaction data from diverse banking systems into Azure Synapse Analytics.

• Optimized the Pyspark jobs to run on Kubernetes Cluster for faster data processing.

• Tested ETL performance and optimized pipeline jobs to handle large-scale data from millions of daily financial transactions without performance degradation.

• Worked with BI developers to create dashboards in Tableau for credit card fraud detection and tracking loan origination and default rates.

Tech Stack: Spark, Azure Data Lake, BLOB, Azure Synapse, Databricks, ADF, Azure Key Vault, Tableau Foray Software Pvt Limited, Hyderabad, India Apr 2019 – Jan 2020 Data Engineer

• Designed, developed, and implemented performant pipelines for managing HR and Payroll domain data using Apache Spark and Talend.

• Configured EMR clusters on AWS and used them for running spark jobs.

• Implemented data pipelines to process batch data by integrating into AWS S3, Hive, and AWS Redshift.

• Developed Airflow DAG using Spark Operator, EMR Operator, Hive Operator, and bash Operator.

• Experienced in working with Spark SQL and Talend components for SQL-based transformations.

• Created Talend jobs for ingesting audit data into DynamoDB tables and updating the record status accordingly.

• Optimized the processing times of the applications by caching data and partitioning where appropriate and worked on performance tuning of spark applications via configuration changes.

• Experience in writing reusable, testable, and efficient code.

• Involved in unit testing and delivered unit test plans and result documents, including Talend job validations.

Tech Stack: Talend, Apache Spark, AWS S3, Hive, Airflow, Python, AWS EMR, AWS Redshift, DynamoDB. DXC Technology, Hyderabad, India July 2016 – Mar 2019 Database Developer

• Gained experience in the IT industry by being involved in various stages of a work order or a user story which includes analyzing, developing, debugging, testing, support, data fixes, and code fixes.

• Developed and optimized SQL queries, views, and stored procedures in SQL Server and Oracle databases to support insurance policy data management.

• Assisted in building data pipelines to manage claims processing by implementing ETL processes using SSIS to transfer data from transactional systems into a data warehouse for reporting.

• Created ETL procedures to transfer data from legacy sources to staging and from staging to data warehouse using Oracle Warehouse Builder.

• Created PL/SQL scripts for the ETL converting as per the business requirements and loading them into Oracle tables for Data warehousing and BI purposes.

• Improved the performance of critical insurance reports by restructuring complex SQL queries, implementing proper indexing, and ensuring efficient joins across large datasets.

• Developed complex analytical logic scripts for implementing business requirements in Tableau dashboards.

• Created Datawarehouse objects like fact tables, dimension tables, table partitions, sub-partitions, materialized views, stored packages, functions, and procedures.

• Worked on real-time production Issues, providing solutions to existing defects and enhancements. Tech Stack: SQL Server, Oracle, SSIS, Oracle Warehouse Builder, Tableau

Contact this candidate