Data Engineer Engineering

Location:

Cumming, GA

Posted:

March 31, 2025

Contact this candidate

Resume:

Sowmya Kollipara

AWS Data Engineer

Email: *****************@*****.***

Contact +1-678-***-****

LinkedIn: www.linkedin.com/in/kollipara-sowmya

Summary:

●Overall 9+ Years of IT Experience, including 5 years in Data Engineering, Automation, Data Analysis and 4 years in Oracle SQL & PLSQL.

●Expertise in cloud platforms, primarily AWS (S3, EC2, EMR, Lambda, Glue, Step Functions, Redshift).

●Proficient in working with Big Data software stacks such as HDFS, Sqoop, Hive, Spark, Python, Scala, HBase, Oozie, Airflow, and cloud platforms like AWS (S3, EC2, EMR, Lambda, Glue, Step function, Glue & Redshift).

●Hands-on experience in designing, automating, and optimizing ETL workflows, ensuring data accuracy, consistency, and seamless migration from external vendor cloud environments.

●Experienced in requirement gathering, impact analysis, development, testing, documentation, and supporting SIT/UAT until implementation.

●Actively collaborated with stakeholders and non-technical business users to better understand business requirements and define the scope, working closely with technical architects and solution designers to plan optimal solutions.

●Excellent understanding of Hadoop architecture and the underlying Hadoop framework, including storage management.

●Managed data integration and transformation from various sources, handling both structured and unstructured data.

●Designed, developed, and optimized data warehousing solutions, ensuring efficient ETL processes for seamless data integration and migration.

●Handled data imports from vendor cloud environments, ensuring seamless data integration into hosted environments.

●Extensive experience with Apache Spark, utilizing DataFrames and RDDs for efficient data processing and transformation.

●In-depth understanding of Spark architecture, including Spark Core, Spark SQL, DataFrames, Spark Streaming, Driver Node, Worker Node, Stages, Executors, and Tasks.

●Designed and developed PySpark and Spark-SQL applications on Dataproc for complex data transformations.

●Proficient in designing and implementing data models (Star, Snowflake, SCD), creating fact and dimension tables to enable efficient reporting and analytics.

●Developed modules in Scala and Python for data validation and load, including writing Lambdas for triggers and automation.

●Designed and implemented ETL workflows to transform and load source files from S3 to HIVE tables, using JSON, HQL, configuration files, and YML files for data quality checks (count, null, and duplicate checks).

●Worked in an Agile Scrum environment, utilizing tools like Git, IntelliJ, Jira for version control and project tracking.

●Worked extensively in the processing layer using TOS & Spark components (RDDs, DataFrames) for data transformation and optimization.

●Implemented indexing strategies (bitmap, compound indexes) to accelerate query performance on complex datasets.

●Strong analytical and problem-solving skills, collaborating closely with stakeholders, architects, and solution designers to define optimal ETL strategies.

●Experience migrating code from Talend to Spark using Spark SQL, Spark RDD, and Scala.

●Knowledge of time-based and event-based triggers in cloud environments, as well as Airflow and Oozie scheduling and operations.

●Created functional, integrated, and UAT test cases, verifying through Jenkins pipeline according to business requirements.

●Proficient in Git/GitHub and Jenkins for continuous integration and deployment of applications.

●Strong interpersonal skills, with the ability to work well within a team and independently, and quickly adapt to new environments.

Technical Skills Summary

Programming Languages

Scala, Python, SQL, HQL

Big Data Technologies

Spark, Hive, HDFS, HBase, Databricks

ETL & Data Processing

TOS, AWS Glue, Apache Spark, Pyspark

Cloud Services

AWS (S3, EC2, EMR, Step Functions, CloudWatch, Lambda, Glue)

Databases

Hive, PostgreSQL,Aurora DB, Oracle, DBeaver,MSSQL

Issue Management

Jira, ServiceNow

Version Control

GitHub, Bitbucket

Requirements Management

JIRA, Confluence, Word, Excel

Development Tools

IntelliJ, Putty, Rally

Operating Systems

Linux, Windows

Professional Experience

Client Name : Anthem Jul 2022 to Jan 2025

Project Name : CarelonRX

Role : Senior Software Engineer

Environment : Spark, Scala, Python, SQL, Databricks, AWS (EMR, EC2, S3, Step Function, Lambda, Glue, Cloud Watch, SNS)

Project Details:

CarelonRX, a division of Elevance Health is one of the trusted Health Insurance providers in the United States and to track the various domain data of customers. Carelon is the global capability center.As part of the project, we deal with business modules like Membership, Billing, Provider and Claims. Here we are working in agile to deliver the requirements in a sprint. We perform Big Data validations, data ingestion and transformations. Different classes and jobs are developed to run applications with Scala language.

Responsibilities:

●Created modules for parsing, transforming, and loading membership data into S3, handling different file formats (Parquet, AVRO, JSON, XML, CSV) from various sources, applying transformation logic, and loading data into partitioned Hive tables.

●Developed transformation logic using an existing Scala framework and created supporting artifacts like JSON, YAML, and configuration files for seamless data loading from files into Hive tables.

●Implemented ETL workflows from one zone table to another, creating data extracts as per client requirements.

●Followed Agile methodology for development, testing (unit, SIT, UAT), and deployment into production.

●Worked with Apache Spark for large-scale data processing, utilizing RDDs and DataFrames to transform, aggregate, and analyze complex datasets efficiently.

●Developed Python-based automation scripts for data processing and integration, utilizing AWS Lambda and PySpark for scalable, serverless data transformation and analysis.

●Developed, configured, and unit-tested complex data processing solutions, checking data in Hive tables with custom queries.

●Automated tasks using AWS Lambda for various operations, including S3 file transfers with email confirmations on success.

●Monitored and optimized production jobs, troubleshot Big Data applications, and enhanced AWS resources for cost and performance optimization.

●Conducted performance tuning of SQL queries and data pipelines, resolving bottlenecks for optimized processing and to improve query performance and storage efficiency.

●Collaborated with cross-functional teams to align data models with business requirements and technical specifications.

●Developed and executed Scala projects using SBT in IntelliJ IDEA, ensuring proper dependency management and configuration.

●Managed version control using Git and deployed through Bitbucket and Bamboo.

●Verified test cases through input, mapping, modeling, and expected output validation for various file formats.

●Provided training (KT) to new joiners, ensuring proper understanding of job applications and processes.

●Collaborated closely with cross-functional teams (BI Analysts, Data Scientists, Software Engineers) to deliver business-driven data solutions.

Client Name : UHG-OPTUM Jul 2019 - Dec 2021

Project Name : Digicom

Role : Data Engineering Analyst

Environment : TOS, Bigdata, Spark using Scala, Sqoop

Project Details:

Digicom focuses on generating the Claims extract for customers related to pharmacy and medical products. The main objective of the project is to identify the Pharmacy and Medical Internal and External based on data mapping provided with the join conditions speciﬁed. The code has been developed using Big data Talend spark using Scala have been performed by interacting with CRM Teradata source and their Hive target tables and the results have been based on the transformation rules. Test strategy includes validating the sqoop and claims modules testing End-to-End validation and their corresponding output ﬁles should be validated as per mapping criteria. Test conditions/scenarios will be created for all the functional requirements mentioned in the technical documents.

Responsibilities:

●Involved in analysis and mapping exercises for new integrations and enhancements, collaborating with business users to gather requirements and translate them into actionable tasks.

●Worked on ETL transformations using Hive and Spark, mapping data according to documentation for efficient integration and processing.

●Developed and optimized Talend ETL workflows and Pig scripts for data integration, transformation, and processing, focusing on efficiency and performance improvements.

●Converted Talend jobs into Spark/Scala as part of a modernization initiative, ensuring end-to-end testing and validation.

●Designed and managed data partitioning schemes, optimizing storage and query performance across large datasets.

●Implemented performance optimizations, resolved issues in faulty code, and ensured smooth execution of ETL processes.

●Participated in User Acceptance Testing (UAT), addressing identified issues and migrating components to production after approval.

●Conducted unit testing of the code and executed through Jenkins pipeline for continuous integration and automated deployment.

●Scheduled and monitored jobs using Airflow, ensuring successful execution, and managed job failures with automatic restarts and cleanup procedures.

●Optimized ETL pipeline architecture for high-volume data loads, ensuring timely and accurate data delivery to business users.

●Developed a regression automation framework by creating a Jenkins pipeline using Groovy scripting, enabling end-to-end execution of the NextGen complex flow in the test environment. Automated the triggering of TOS jobs, eliminating manual intervention and enhancing efficiency.

●Reported and tracked defects/incidents using Rally and ServiceNow, ensuring quick resolution and clear communication to stakeholders.

●Compiled and distributed daily status reports to stakeholders, highlighting project progress, incidents, and key deliverables.

●Recognized with Quarterly Star Awards and Aquamarine Awards for exceptional performance within the UHG organization.

Client Name : CGI (Internal Product) Mar 2015 - Jul 2019

Project Name : Wealth Management Solutions (Msuite)

Role : Software Engineer

Tools : TOS, SQL Developer, Shell Scripting

Project Details:

Global Wealth Capital Markets is a part of M-Suite Portfolio Solutions product. This product is used by most of the Canadian banks for their high net worth individual customers to provide end to end wealth management solutions. GWCM has two versions of the product, one is on legacy technology Powerhouse and another is on Java. Power house application will be on UNIX servers and the latest version is a web application running on Java.GWCM deals with front office, middle office and back office data and will be used in preparing the trade file. This product will be connected to external systems to exchange the data between the systems like ADP, reporting database, custodians, brokerage, trading and record keeping systems. The ETL process will be loading the data from the Mpower system to the MVest system over the night.

There are two databases maintained, one is for day to day transaction maintenance and application purpose and another one is for reporting purposes which will be loaded through ETL process from Mpower source. ETL process will be every night on a scheduled basis for legacy systems and through the triggers on the updated java version product.

Responsibilities:

●Developed shell scripts for the support team to check the status of processes, streamlining monitoring and issue identification.

●Resolved performance bottlenecks through effective query tuning, improving system efficiency.

●Worked with maintenance teams to enhance and resolve defects in the production environment.

●Monitored daily production jobs, identifying and addressing issues promptly to ensure smooth operations.

●Provided quick solutions to production issues, ensuring timely resolution to minimize downtime.

●Assisted in optimizing SQL queries and reports under guidance, learning basic techniques like EXPLAIN PLAN and performing simple optimizations to improve query performance.

●Implemented ETL processes for data loading and unloading using SQL Loader concepts, ensuring accurate and efficient data migration.

●Conducted code reviews, allocated tasks, and tracked progress using the BMC Remedy tool.

●Took ownership of issues and participated in daily calls with both onshore and offshore teams to provide status updates and solutions.

●Collaborated with the testing team, providing technical clarifications on fixes delivered to QA.

●Automated repetitive tasks and job monitoring processes using shell scripting and SQL automation, significantly reducing manual effort and improving efficiency.

●Prepared technical specifications and created test cases for unit testing, ensuring thorough validation of code changes.

●Modified CRON jobs based on client requirements, ensuring schedules align with business needs.

●Received ‘Pat on the Back’ award from CGI for outstanding contributions to the team.

Educational Qualifications

Bachelor of Technology in Computer Science Engineering (CSE)

Acharya Nagarjuna University, India April 2013

Certifications & Trainings:

●AWS Certified Data Engineer - Associate (DEA-C01)

●Microsoft Azure Certified AZ-900

●Oracle Certified Associate – 2014

Contact this candidate