Data engineer

Location:

Manassas, VA

Salary:

Posted:

January 07, 2025

Contact this candidate

Resume:

*********************@*****.***

971-***-****

California, United States 94542

Skills

Data migration

SQL expertise

Metadata management

ETL development

Big data processing

Real-time analytics

Spark framework

API development

Data pipeline design

Data warehousing

Scripting languages

NoSQL databases

Data modeling

Data quality assurance

Data security

Data pipeline control

Data governance

Data curating

Hadoop ecosystem

Tableau reporting

Docker containers

DevOps practices

Apache hbase

Spark development

Python programming

Data lake management

Data integration

Real-time processing

Apache Kafka

Shell scripting

Microsoft Azure

Education And Training

Master of Science:

California State University

Vamsi K

Summary

Highly competent Data Engineer with background in designing, testing, and maintaining data management systems. Possess strong skills in database design and data mining, coupled with adeptness at using machine learning to improve business decision making. Previous work resulted in optimizing data retrieval processes and improving system efficiency.

Experience

L.A. Care Health Plan - Data Engineer

CA, United States Of America

05/2023 - Current

Designed and built real-time data pipelines using Apache Kafka and AWS Kinesis to capture and stream prescription data from pharmacies and electronic health records (EHR), ensuring timely insights for patient care and medication management.

Configured SQL stored procedures and triggers to automate pharmacy data processing workflows, improving data consistency, and reducing manual intervention.

Developed ETL workflows using Apache NiFi and AWS Glue to extract, transform, and load data from various La systems (pharmacy management, EHR systems, and insurance claims) into Snowflake, reducing data integration time by 30%.

Employed Python for orchestrating ETL tasks, allowing for efficient data handling, and error logging.

Utilized Apache Spark and PySpark to process real-time patient data, analyzing prescription adherence, medication usage patterns, and patient outcomes to provide la pharmacists and healthcare providers with actionable insights.

Implemented strict data validation and cleansing processes in accordance with HIPAA and other healthcare regulations, ensuring high data accuracy and security, with a 99.9% compliance rate across all pharmacy data flows.

Integrated the system with business intelligence tools such as Power BI and AWS QuickSight, enabling real-time dashboards for tracking patient prescription adherence, pharmacy stock levels, and customer service performance, leading to faster decision-making.

Developed and implemented data models, database designs, data access and table maintenance codes.

Configured and maintained cloud-based data infrastructure on platforms like AWS, Azure, and Google Cloud to enhance data storage and computation capabilities.

Created stored procedures for automating periodic tasks in SQL Server.

Identified, protected and leveraged existing data.

Verizon - Data Engineer

TX, United States Of America

10/2019 - 04/2023

Implement Continuous Integration and Continuous Delivery processes using GitLab, along with Python and Shell scripts, to automate routine jobs, which include synchronizing installers, configuration modules, packages, and requirements for the applications.

Hands-on Django framework using PyCharm, hands-on Airflow workflow management.

Written AWS Lambda code in Python for nested JSON files, converting, comparing, sorting, etc.

Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake, IAM, CloudFormation, Route 53, CloudWatch, CloudFront, and CloudTrail.

Provide guidance to the development team working on PySpark as an ETL platform.

Makes sure that quality standards are defined and met.

Optimize the PySpark jobs to run on the Kubernetes cluster for faster data processing.

Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed.

Created packages and procedures to automatically drop table indexes and create indexes for the tables.

Experience developing and executing automated testing for data warehousing.

Develop RESTful and SOAP APIs using Swagger, and perform mobile app and customer product details app testing using Postman.

Hands-on with Redshift Database (ETL data pipelines from AWS Aurora, MySQL Engine, to Redshift).

Built a data virtualization layer (DENODO base and derived views), data visualization using Tableau, and accessed aggregations using SQL clients, PostgreSQL, and SQL Workbench.

Assigning user-level/group-level permissions on the Redshift schema for security reasons.

Installed and configured Hive, written Hive UDFs, and used Piggy Bank, a repository of UDFs for Pig Latin.

Implemented partitioning, dynamic partitions, and buckets in HIVE for efficient data access.

Work with different data sources, like HDFS, Hive, and Teradata, for Spark to process the data.

Use Spark to process the data before ingesting the data into HBase. Both batch and real-time Spark jobs were created using Scala.

Design and implementation of Data Mart, DBS coordination, and DML generation and usage.

Provide data architecture support to enterprise data, as well as support for projects, such as the development of enterprise data models, master and reference data, and support for projects, such as the development of physical data models, data warehouses, and data marts.

Creating Databricks notebooks using SQL, Python, and automated notebooks using jobs.

Build an ETL pipeline end-to-end from AWS S3 to the key-value store DynamoDB and Snowflake Data Warehouse for analytical queries, specifically for cloud data.

Document all the changes implemented across all systems and components using Confluence and Atlassian Jira. Documentation includes technical changes, infrastructure changes, and business process changes. Post-release documentation would also include known issues from production implementation and deferred defects.

Morgan Stanley - Data Engineer

TX, United States Of America

06/2017 - 09/2019

Designed and implemented data pipelines for processing large volumes of financial data, ensuring high performance and reliability.

Collaborated with quantitative analysts and traders to understand data requirements and deliver accurate, timely data solutions.

Develop Python and SQL used in the transformation process in Matillion.

Utilized SQL and NoSQL databases to manage and store structured and unstructured financial data

Integrated market data feeds from various sources, ensuring real-time data availability for trading applications.

Developed Python scripts and automated workflows to improve data processing efficiency, and reduce manual intervention.

Ensured data quality and consistency by implementing data validation and cleansing procedures.

Create a Matillion job to extract the data from the file in the shared drive.

Design the new flow of data using Matillion ETL into Snowflake.

Created and maintained data models and schemas to support complex financial analytics and reporting.

Worked with cloud platforms (AWS, Azure, GCP) to deploy scalable data solutions for capital markets applications.

Created an audit framework using shared jobs in Matillion.

Conducted performance tuning and optimization of data queries to enhance application performance.

Track market activities; provide daily pricing, and transactional support to the team using Glue.

Participate in team discussions, represent the desk in internal discussions with key stakeholders as needed, and prepare.

Assisting in the overall monitoring of Canadian debt capital markets (flows, trends, funding opportunities) using CloudWatch.

Create and develop data load and scheduler processes for ETL jobs using the Mtillion ETL package.

Provided support for existing products that include SSIS, SQL Server, stored procedures, interim data marts, Matillion, AWS, and Snowflake.

Troubleshoot and maintain ETL, ELT jobs running using Matillion.

Provided technical support and troubleshooting for data-related issues in production environments.

Documented data engineering processes and best practices to facilitate knowledge sharing and onboarding of new team members.

CVS Health - Data Engineer

CA, United States Of America

02/2015 - 05/2017

Designed and built real-time data pipelines using Apache Kafka and AWS Kinesis to capture and stream prescription data from CVS pharmacies and electronic health records (EHR), ensuring timely insights for patient care and medication management.

Configured SQL stored procedures and triggers to automate pharmacy data processing workflows, improving data consistency and reducing manual intervention.

Developed ETL workflows using Apache NiFi and AWS Glue to extract, transform, and load data from various CVS systems (pharmacy management, EHR systems, and insurance claims) into Snowflake, reducing data integration time by 30%.

Employed Python for orchestrating ETL tasks, allowing for efficient data handling, and error logging.

Utilized Apache Spark and PySpark to process real-time patient data, analyzing prescription adherence, medication usage patterns, and patient outcomes to provide CVS pharmacists and healthcare providers with actionable insights.

Contact this candidate