Data Engineer Sql Server

Location:

Cedar Park, TX

Posted:

December 22, 2023

Contact this candidate

Resume:

Sathvik

989-***-**** ***********@*****.***

Professional Summary

Data Engineer professional with 4+ years of experience in ETL processes such as loading to Teradata. A large amount of time spent with Teradata, BigQuery, SQL Server, Oracle, DMExpress and Unix. I am an avid developer in Bash/Korn, GCP, PySpark, and Python. I am thorough and have unique ideas to solve relatively complex problems. I am excited about the evolving technologies on cloud and have lots of hands-on experience with GCP.

Technology Skills

Systems –

Vast experience on the Teradata database, most work was ELT with transformations and optimizations done in Teradata. I created pipelines to load data into the EDW.

Mixed use in Microsoft SQL Server, DB2, and Oracle servers. Experience with Windows/Linux and VMware.

Specific healthcare experience in the clinical data warehouse with experience handling different types of data including employee, patient, asset management, and countless other types of data.

DMExpress (ETL Tool) – Focused on making pipelines from MS SQL server, DB2, Oracle, Flat file (CSV, XML, TXT, JSON, etc), and some API work with SOAP, AWS, and Salesforce.

Streaming – Experience in pulling messages from Kafka with TPT scripts to load to Teradata every 15 minutes with enterprise mainframe scheduler.

Programming/Environments –

Python – Created macros, web data scrubs (no API), FTP file transfers, and local modules.

Scripting (Bash/Sh/Korn) – Used Sed and Awk to correct bad data from source to prepare files for processing in the EDW. My development in our Korn onion shell is used in other LOB structures. ODBC connections are in our Korn shell and used to pull from database to database or database to flat file.

Teradata – Tuned queries to better distribute the load on all amps and remove unneeded cross/product joins. I have tuned queries that ran in 2 hours down to 5 minutes with my knowledge of amps and distribution. Developed TPT Scripts pulling from Kafka topics.

Airflow – Created Dags in GCP Composer in python using many operators including, PythonOperator, BashOperator, TriggerOperator, GCStoBigqueryOperator, and many others. I have also created Dags that trigger other Dags in a loop using the stable API backend.

GCP – Knowledge of the full data GCP Suite, knowing the use cases for each storage option. Hands on experience in developing data pipelines with BigQuery, BigTable, Cloud SQL, Spanner, GKE, GCS, Pub/Sub, DataFlow, Data Fusion, and Cloud Composer. In-depth knowledge on how BigQuery, Bigtable, and Spanner store and process data. I can determine how to build data pipelines based on business use cases and how to best build a data warehouse in BigQuery.

PySpark – Created complex PySpark code for large scale Hadoop system hosted on DataProc. Also, created scheduled Spark jobs for testing in Airflow.

Snowflake – Understand infrastructure and have used features like time travel, secure views, and data sharing. Worked to compare features to BigQuery including data sharing, data masking, and tasks/streams.

Education

Masters in computer science from Central Michigan University

Experience

Spotify (Remote) 02-2023– Current

Data Engineer

I came in as a Teradata and GCP resource to move on-prem Teradata to Teradata Vantage (Cloud) as well as working to move some processes to GCP with BigQuery and Airflow. Another big part of my role was support. I maintained and ran the monthly royalty process.

I learned and managed the royalties monthly process which was a complex SQL ELT pipeline. I had to document and learn all stored procedures, shell scripts, and table structures.

I gave daily updates on my work I gave daily updates on my work and bi-weekly meetings with clients.

Worked with vendors to resolve problems with file/report delivery. Also, worked with Teradata support to resolve database problems such as default time zone differences and Unicode issues.

Created pipelines to load data from GCP to Teradata as a temp pipeline as we were moving to BigQuery. Also, create pipelines to run from GCP to BigQuery based on existing pipelines that ran in Teradata.

Did lots of performance tuning and troubleshooting on SQL and SPs so they will run in Teradata Vantage. This included stat collecting, rewriting SPs, and increasing spool space for service accounts.

Kavaliro (Remote) 07-2021 – 01-2023

Data Engineer

I worked on two projects at Kavaliro working with the Capgemini team. Both projects were working with the Google team. The first was a Healthcare project where I helped advise with help of my Healthcare background. The second project was much longer where I worked with the team to improve the Walmart data processing system in Spark.

I worked with the Walmart and Google team to rewrite pySpark code as common code and to improve performance.

Attended meetings with the client (Walmart) on a regular basis to provide updates.

Met with the Google team for regular code review.

with the team to set a path on a data mesh strategy in GCP with Data Plex, Analytics Hub, and other GCP tools.

I created Snowflake dashboards in Snowsight to monitor usage and suggested alert integration in GCP.

Encompass Health, Atlanta, Georgia 04-2019 – 06-2021

ETL Developer

Developed data flows from a source, either a RDMS or files, and brought them into Teradata with mostly an ELT approach. I tested the scripts in QA before deploying it to production. The scripts ran through DMXpress and Korn shell script for parameters and logging.

Created complex queries in Teradata including extensive performance tuning.

Was part of the initial Data Analytics partnership with Google and HCA on GCP. I helped guide the ref data pipeline where we were trying to make a better data warehouse structure that we could link the data warehouse to FHIR resources. I had a big part in determining how data should be stored in the data warehouse (BigQuery) in GCP. I worked with our team to provide very specific and technical guidelines to the Google team.

Developed Korn/Bash scripts in the Unix box to create ELT pipelines.

Worked as the lead ETL developer for all COVID-19 projects and deployed more than 50 jobs in 2020 for COVID-19. I also increased the frequency from hourly all the way down to 15 minutes with intense performance tuning and strict scheduling rules.

I worked on the Hl7 streaming project during which I made data flows for many specific message types, I pulled data from Teradata staging and created dimension tables.

HCA has a few different EMRs and I worked to pull two additional EMRs. Integrating multiple EMRs into the EDW was one of the most challenging projects the team has worked on as each EMR has their own data structure.

Contact this candidate