Sushmitha Nama
Email id: **************@*****.***
Mobile: 980-***-****
Professional Summary
• Overall 6+ years of IT experience in the Software Development Life Cycle using Waterfall and Agile methodologies.
• Experience in Health care, Insurance, and financial domains.
• Strong expertise in both SQL and NOSQL databases.
• Strong exposure to Apache Spark, Spark Streaming, Spark MLlib frameworks, and developing production-ready Spark applications using Python programming interface with a focus on optimizing real- time and batch processing for large-scale data.
• Developed various spark applications using PySpark to perform various enrichments of user behavioral data.
• Designed and implemented PySpark and Spark-SQL workflows to perform complex ETL operations, converting raw data into structured, consumable formats such as Parquet, JSON, and ORC for downstream analytics.
• Experience in creating Spark applications using PySpark and Spark-SQL in Databricks for data extraction, transformation, and aggregation from various file formats for analysis & transformation of the data to reveal insights into user behavior.
• Experienced in designing and developing data pipelines and analytics solutions using Snowflake Data Warehouse.
• Experience working with different services of AWS, cluster platform provider EMR, file system AWS S3, and Athena services.
• Experience in Migrating SQL Database to Azure Data Lake, Data Bricks, and Azure SQL Data Warehouse and controlling and granting database access and Migrating On-premises databases to Azure Data Lake store using Azure Data Factory.
• Proficient in Dimensional Data Modeling techniques, Star Schema, Snowflake Schema, Fact and Dimension tables.
• Involved in Agile methodologies, daily scrum meetings, and sprint planning.
• Good communication, analytical, and problem-solving skills with the adapting nature of learning new technical and functional skills.
Education
Master Of Science, Computer Science Jan 2022 – July 2023 Texas A&M University-Commerce, Texas GPA: 3.7 / 4.0 Bachelor of Technology, Information Technology Aug 2012 – May 2016 Jawaharlal Nehru Technological University (JNTU), Hyderabad, India GPA: 3.6 / 4.0 Technical Skills
• Programming Languages: Python, SQL, PL/SQL, Java, Shell Scripting
• NoSQL Databases: Cassandra, DynamoDB
• Big Data Ecosystem: Apache Spark, Hive, HDFS, YARN, MapReduce, Sqoop
• ETL Tools: Talend, Informatica PowerCenter
• Hadoop Distributions: AWS EMR, Cloudera, Hortonworks, Azure Databricks
• Data Storage Solutions: MySQL, Oracle, AWS S3, AWS Redshift, Snowflake, Azure Data Lake, Azure Blob Storage
• Data Visualization: Tableau
• Version Control & CI/CD Tools: Git, Bitbucket, Jira, Confluence, CI/CD pipelines
• Other Tools & Services: REST APIs, Apache Airflow, AWS Athena, Azure Key vault, AWS secrets manager, AWS Lambda.
Work Experience
Ventois Inc, Shrewsbury, MA Aug 2023 – Current
Data Engineer
Client: UnitedHealth Group
• Collaborated with insurance departments to define requirements for integrating Claims, Policyholder, and provider network data to support Data Analytics and Reporting.
• Created detailed design documents as per business and functional requirements using notion and lucid chart.
• Consumed data from various source systems which includes web-scale datasets and processed it efficiently using Apache Spark and integrated the data directly into Snowflake for storage and further analysis.
• Developed spark applications using PySpark, and Spark-SQL in Databricks to transform and aggregate data from multiple file formats, then loaded the processed data into Snowflake for business insights and customer usage pattern analysis.
• Implemented ETL processes to populate fact and dimension tables in Snowflake using star and snowflake schema, maintaining data integrity and supporting efficient querying for business intelligence.
• Developed and managed automated ETL pipelines using AWS Glue to ingest and process data in AWS S3 enhancing data availability and reducing manual intervention.
• Implemented multiple Data Pipeline DAGs and Maintenance of DAGs in Airflow orchestration.
• Performed Data Validations by developing custom Python functions to verify schema consistency, duplicate entries, and data format adherence across diverse datasets.
• Used data debugging to work on unit testing and integration testing by executing a range of scenarios for each use case.
• Integrated Python scripts with external APIs, Kafka for real-time data ingestion, which was then processed and stored in the Snowflake Data warehouse for downstream analysis.
• Ensured data compliance with industry regulations (HIPPA, ACA) by implementing data encryption and access control data practices to protect sensitive insurance and healthcare information.
• Developed and deployed real-time Tableau dashboards connected to Snowflake to provide stakeholders with insights into market trends, growth opportunities, and business performance.
• Responsible for taking architectural decisions like estimating cluster size, cluster availability, and troubleshooting related issues in production.
Tech Stack: Python, SQL, Apache Spark, Spark-SQL, PySpark, Delta Lake, AWS S3, Databricks, AWS Glue, Snowflake, Apache Kafka, Tableau
Ventois Inc, Shrewsbury, MA Aug 2020 – Dec 2021
Data Engineer
Client: USAA
• Collaborated with banking product teams, business analysts, and data scientists to gather data requirements for regulatory reporting, risk management, and fraud detection.
• Analyzed existing Netezza legacy systems and processes, mapping out data flow requirements for migration to cloud-based systems Azure Data Lake, Synapse Analytics.
• Profiled data from different sources, such as core banking systems, payment gateways, credit card transactions, and external credit bureaus (e.g., Experian, Equifax).
• Developed end-to-end ETL pipelines using Azure Data Factory, Databricks, and SQL to extract data from core banking systems, clean and transform it for financial reporting and risk analysis.
• Designed and implemented complex data transformation logic using PySpark for cleaning and standardizing banking data, such as credit card transactions and loan applications.
• Implemented data encryption and tokenization techniques in ETL workflows using Azure Key Vault to safeguard sensitive customer information during data transfers between Azure Blob Storage and Azure SQL Database.
• Implemented error handling and retry mechanisms in ETL pipelines using ADF and Databricks, for alerting any failed jobs in high-frequency banking transaction systems.
• Conducted integration testing on ETL pipelines to ensure seamless ingestion of transaction data from diverse banking systems into Azure Synapse Analytics.
• Optimized the Pyspark jobs to run on Kubernetes Cluster for faster data processing.
• Tested ETL performance and optimized pipeline jobs to handle large-scale data from millions of daily financial transactions without performance degradation.
• Worked with BI developers to create dashboards in Tableau for credit card fraud detection and tracking loan origination and default rates.
Tech Stack: Spark, Azure Data Lake, BLOB, Azure Synapse, Databricks, ADF, Azure Key Vault, Tableau Foray Software Pvt Limited, Hyderabad, India Apr 2019 – Jan 2020 Data Engineer
• Designed, developed, and implemented performant pipelines for managing HR and Payroll domain data using Apache Spark and Talend.
• Configured EMR clusters on AWS and used them for running spark jobs.
• Implemented data pipelines to process batch data by integrating into AWS S3, Hive, and AWS Redshift.
• Developed Airflow DAG using Spark Operator, EMR Operator, Hive Operator, and bash Operator.
• Experienced in working with Spark SQL and Talend components for SQL-based transformations.
• Created Talend jobs for ingesting audit data into DynamoDB tables and updating the record status accordingly.
• Optimized the processing times of the applications by caching data and partitioning where appropriate and worked on performance tuning of spark applications via configuration changes.
• Experience in writing reusable, testable, and efficient code.
• Involved in unit testing and delivered unit test plans and result documents, including Talend job validations.
Tech Stack: Talend, Apache Spark, AWS S3, Hive, Airflow, Python, AWS EMR, AWS Redshift, DynamoDB. DXC Technology, Hyderabad, India July 2016 – Mar 2019 Database Developer
• Gained experience in the IT industry by being involved in various stages of a work order or a user story which includes analyzing, developing, debugging, testing, support, data fixes, and code fixes.
• Developed and optimized SQL queries, views, and stored procedures in SQL Server and Oracle databases to support insurance policy data management.
• Assisted in building data pipelines to manage claims processing by implementing ETL processes using SSIS to transfer data from transactional systems into a data warehouse for reporting.
• Created ETL procedures to transfer data from legacy sources to staging and from staging to data warehouse using Oracle Warehouse Builder.
• Created PL/SQL scripts for the ETL converting as per the business requirements and loading them into Oracle tables for Data warehousing and BI purposes.
• Improved the performance of critical insurance reports by restructuring complex SQL queries, implementing proper indexing, and ensuring efficient joins across large datasets.
• Developed complex analytical logic scripts for implementing business requirements in Tableau dashboards.
• Created Datawarehouse objects like fact tables, dimension tables, table partitions, sub-partitions, materialized views, stored packages, functions, and procedures.
• Worked on real-time production Issues, providing solutions to existing defects and enhancements. Tech Stack: SQL Server, Oracle, SSIS, Oracle Warehouse Builder, Tableau