Resume

Data Engineer Big

Location:

Frisco, TX

Posted:

April 01, 2024

Contact this candidate

Resume:

Mounika V

Data Engineer

ad4pqh@r.postjobfree.com

469-***-****

linkedin.com/in/mounika-v-2633a0249

Professional Summary:

* ***** ** ******** ** experience as a Big Data Engineer, Python Engineer and Data Analyst, and an expert in designing, development, and implementing data models for big data applications.

Have experience in technologies, tools and databases like Big Data, AWS, Azure, tableau, Hadoop, Hive, Pig, Sqoop, HBase, Spark, Cassandra.

Proficient in relational databases like Oracle, MySQL, and SQL Server

Extensive experience in development of Bash scripting, T-SQL, and PL/SQL scripts

Extensively worked on AWS services like EC2, S3, EMR, RDS, Athena, Lambda Function, Step Function, Glue Data Catalog, SNS, RDS(Aurora), Redshift, DynamoDB, and Quick Sight and other services of the AWS family

Extensive knowledge in working with Azure cloud platform (HDInsight, VM, Blob Storage, Data Lake, Data Bricks, ADLS Gen 2, Azure Data Factory, Synapse, and Data Storage Explorer)

Expertise in Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks, Managing the Machine Learning Lifecycle

Integrated electronic health records, claims data, and other healthcare datasets using GCP services like Dataflow.

Strong experience in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.

Extensively used Spark Data Frames API over Cloudera platform to perform analytics on Hive data and used Spark Data Frame Operations to perform required Validations in the data.

Extensive experience in Configuring Spark Streaming to receive real time data from Apache Kafka and store the stream data to HDFS and expertise in using Spark-SQL with various data sources like JSON, Parquet, and Hive.

Proficient in Python Scripting and worked in stats function with NumPy, Pandas for organizing data and visualization using Matplotlib.

Extensive use of JIRA for tracking all the open-source projects. Experience in agile methodologies Scrum.

Proficient in converting Hive/SQL queries into Spark transformations using Data frames and Data sets.

Hands on Experience in using Visualization tools like Tableau, Power BI.

TECHNICAL SKILLS:

Hadoop/Big Data

Ecosystem

Apache Spark, HDFS, MapReduce, HIVE, Kafka, Sqoop

Programming &

Scripting

Python, PySpark, SQL, Scala

NoSQL Databases

Mongo DB, Dynamo DB

SQL Databases

MS-SQL Server, MySQL, Oracle, Postgres

Cloud Computing

AWS, Azure

Operating Systems

Ubuntu (Linux), Mac OS-X, Windows 10, 8

Reporting

PowerBI, Tableau

Version Control

Git, GitHub, SVN

Methodologies

Agile/ Scrum, Rational Unified Process and Waterfall

Professional Experience:

Client: Solera Heath May 2023-Till Date

Role: GCP Data Engineer

Responsibilities:

Migrating Oracle DB to Big Query and using Power-BI for reporting as a POC.

Build data pipelines in Airflow in GCP for ETL related jobs using different airflow operators.

Developed Scripts that automated DDL and DML Statements used in creation of databases, tables, constraints, and updates.

Build a program with Python and Apache beam and execute it in Cloud Dataflow to run Data validation between raw source file and big query tables.

Selected and implemented GCP services like Cloud Storage, Big Query, and Dataflow for efficient and secure data migration.

Gained extensive experience with AGILE methodology in software projects, participated in SCRUM meetings, followed biweekly sprints and tracked progress on JIRA.

Work related to downloading Big Query data into pandas or Spark data frames for advanced EL capabilities.

Developed and executed data quality assurance processes to validate the accuracy and completeness of health data after migration.

Created comprehensive documentation for the entire data migration process, including procedures, configurations, and troubleshooting steps.

Build data pipelines in Airflow for orchestrating ETL related jobs using different airflow operators.

Expert knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology to get the job done.

Experience in using various operators in composer/airflow and have used the google cloud client libraries in python for big query & storage hooks.

Demonstrated experience working with healthcare data standards such as HL7, FHIR.

Applied GCP Solutions for processing and analyzing medical and imaging data, contributing to improved diagnostic accuracy.

Used Cloud dataflow using python SDK for deploying streaming jobs in GCP as well as batch jobs for custom cleaning of text and Json files and write them to big query.

Environment: MYSQL, Oracle, GCP Composer, Big Query, Cloud Dataflow, Cloud Functions, Power BI.

Client: Medco Health Solutions Sep2021 – April 2023

Role: Data Engineer

Responsibilities:

Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing, and reporting of voluminous, rapidly changing data.

Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, Dynamo DB.

Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, and S3.

Implemented the machine learning algorithms using python to predict the quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 Data Lake.

Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon Dynamo DB.

Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD.

Converting various files from CSV to JSON or Parquet files using python code and then uploaded the new files into S3.

Will write complex SQL queries with multiple joins to automate/manipulate these extracts in Alteryx.

Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.

Importing & exporting databases using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages).

Worked on RDS, AWS Lambda, EC2, S3, to create and support data pipelines in Python to load unload data from S3 and oracle to Snowflake.

Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.

Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server.

Implemented AWS Step Functions to automate and orchestrate the Amazon Sage Maker related tasks such as publishing data to S3, training ML model and deploying it for prediction.

Understanding of Structured Data Sets, Data Pipelines, ETL tools, Data Reduction, Transformation and aggregation technique, Knowledge of tools such as DBT, Data Stage.

Provided Extensive Production support for Data Warehouse Internal and External Data Flows to, Oracle DBMS from ETL servers via from remote servers.

Worked in teams as a development DBA with a single company product as well as a Production DBA Supporting SQL Server farm leading a support team of 7 people.

Performed Data Analysis and Data validation by writing complex SQL queries using TOAD against the ORACLE database.

Extensive Experience in building Scalable, elastic, secure and highly available multi-tier application infrastructure with best practices on Amazon web Services (AWS) public cloud.

Worked with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source systems which includes loading nested JSON formatted data into snowflake table.

Responsibilities include but not limited to SQL Development, ETL development using Microsoft SQL Server Integration Services (SSIS), Business Intelligence delivery using Microsoft SOL Server Analysis Services (SSAS) and Reporting solutions using Microsoft SQL Server Reporting Services (SSRS) and Power BI and Tableau Desktop.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, Dynamo DB, Amazon Sage Maker, Apache Spark, HBase, Apache Kafka, HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python, SSRS, Tableau, Terraform, T-SQL, ECS.

Client: Veritex Bank, Dallas, Texas Feb 2019 – Jul 2021

Role: Data Engineer

Responsibilities:

Extensively worked on Azure cloud platform (HDInsight, Data Lake, Data Bricks, Blob Storage, Data Factory, Synapse, SQL, DWH and Data Storage Explorer)

Designed and deployed data pipelines using Data Lake, Data Bricks, and Apache Airflow

Worked on Azure Data Factory to integrate data of both on-prem (MY SQL, Cassandra) and cloud (Blob storage, Azure SQL DB) and applied transformations to load back to Azure Synapse

Evolved in Spark Scala functions for mining data to provide real time insights and reports.

Data Lake is used to store and do all types of processing and analytics.

Used the SQL procedures for Informatica mappings for truncating the data in target tables at run time.

Ingested data into Azure Blob storage and processed the data using Databricks. Involved in writing Spark Scala scripts and UDF's to perform transformations on large datasets.

Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API)

Designed custom-built input adapters using Spark, Hive, and Sqoop to ingest and analyze data (Snowflake, MS SQL, MongoDB) into HDFS

Build reliable data pipelines and solve data problems using Databricks, our partner’s products and other OSS tools.

Used Azure DevOps and VSTS (Visual Studio Team Services) for CI/CD, Active Directory for authentication and Apache Ranger for authorization.

Performed Data Analysis and Data validation by writing complex SQL queries using TOAD.

Configuration of services like API Manager, Azure Cosmos DB, App Service, Key Vaults, Container Registries and App

Developed and Designed automation framework using Python and Shell Scripting

Experience in SSIS Package scripting, scheduling, monitoring, deployment, and production.

Used Power BI Power Pivot to develop data analysis prototype and used Power View and Power Map to visualize reports.

Created the ETL exception reports and validation reports after the data is loaded into the warehouse database.

Environment: Azure HDInsight, Data Bricks, Data Lake, Cosmos DB, MySQL, Snowflake, MongoDB, Teradata, Ambari, Flume, VSTS, Tableau, PowerBI, Azure DevOps, Ranger, Azure AD, Git, Blob Storage, Data Factory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark v2.0.2, Airflow, Hive, Sqoop, HBase

Client: HSBC, New York City, NY (Crayon Data) Dec 2018 – Jan 2019

Role: Big Data Engineer

Responsibilities:

Experienced working with Big Data, Data Visualization, Python Development, SQL, and UNIX.

Expertise in quantitative analysis, data mining, and the presentation of data to see beyond the numbers and understand trends and insights.

Handled high volume of day-to-day Informatica workflow migrations.

Review of Informatica ETL design documents and working closely with development to ensure correct standards are followed.

Designed and implemented complex ETL data process using Informatica power Center and advanced SQL queries (analytical functions)

Created Informatica mappings using various Transformations like Joiner, Aggregate, Expression, Filter, and Update Strategy

Created stored procedures, Parameters tables, Custom Tables for Migration.

At least 1 year experience building Data Lake or Data Warehouse solutions with Snowflake or Power BI Scripting experience with JavaScript, Python, or Bash Outstanding skills.

Designed and Developed ETL data pipelines using Scala and Spark.

Identify and document limitations in data quality that jeopardize the ability of internal and external data analysts and wrote standard SQL Queries to perform data validation and created excel summary reports (Pivot tables and Charts) as well as gathered analytical data to develop functional requirements using data modeling and ETL tools.

Scripting experience with JavaScript, PySpark, Python, T-SQL or other similar languages.

Used Spark optimizations techniques like Cache/Refresh tables, broadcasting variables, Coalesce/Repartitioning, increasing memory overhead limits, handling parallelism, and modifying the spark default configuration variables for performance tuning.

Worked extensively building Power BI Dashboards with quick insights into KPIs and Advance analysis actions, Calculations, Parameters, Maps and Scatter plots.

Developed and handled business logic through backend Python code.

Environment: Python, UNIX, SQL, ETL, Informatica, Spark, HTML.

Client: Apollo Pharmacy, Hyderabad, India May 2017 – Nov 2018 Role: Data Analyst

Responsibilities:

Understanding the functionality of core business processes.

Implemented coding, procedural, and change control guidelines for application development in proprietary environments.

Involved in Data Quality, and profiling from source systems.

Responsible with ETL design identifying the source systems, designing source to target relationships, data cleansing, data quality, creating source specifications, ETL design documents, ETL development following Velocity best practices.

Involved in Data Extraction, Transformation and Loading ETL from source systems.

Designed a reporting table approach to get better performance of complex queries and Views.

Used External Tables to Transform and load data from Legacy systems into Target tables.

Incremental loading of Fact table from the source system to Staging Table done on daily basis.

Design and implement the proof of concept on development environment.

I have expertise in SQL tuning, DB/DBM parameter tuning, different levels of memory allocations, various tools like export, import, load, autoloader, and experience with all kinds of database snapshots.

Day-to-day attending to the user queries in sorting out database problems.

Coordinating with user departments and attending to their requirements for the smooth functioning.

Environment: ETL, Data Quality, SQL, Python, AWS, Docker.

EDUCATION DETAILS:

Master’s in computer science from governor’s state university August 2021-December 2022.

Contact this candidate