Database Administrator Data Engineer

Location:

Corpus Christi, TX

Salary:

Posted:

April 10, 2023

Contact this candidate

Resume:

PROFESSIONAL EXPERIENCE: -

Highly dedicated, experienced, and inspiring Senior data engineer with 8+ years of IT industry experience in working with Azure and AWS, exploring various tools, technologies, and databases.

Experienced in designing and building ETL pipelines, Visualizations using Azure, AWS, and Open-source frameworks.

Excellent knowledge of Data Validation, Data Analysis, Data Cleansing, and Data Verification.

Experienced in AWS services such as S3, RDS, EC2, IAM, Glue, Redshift, Lambda, Athena, AWS Kinesis, and CloudWatch.

Experienced in working on EMR Clusters to modify python scripts and shell scripts.

Experienced in using Azure Services like Azure Data Lake Storage, Azure SQL Database, Azure Log Analytics, Azure Stream Analytics, Azure Triggers, HD Insights.

Experienced in creating Spark applications within Databricks to extract, transform, and aggregate data from various file formats which can be used to analyze and transform the data into customer usage patterns.

Experienced in working with Hadoop technologies such as HDFS, No SQL, and Spark.

Proficient with Programming skills in coding in different technologies, i.e., Python, Scala, and Bash.

Experienced in developing Spark applications using Spark tools like Spark Streaming,

RDD transformations, Spark SQL, Spark MLlib.

Implemented Continuous Integration & Deployment (CI/CD) with Jenkins and Azure DevOps.

Experienced in real-time data streaming using Apache Kafka.

Experienced in databases such as My SQL, Oracle, Dynamo DB, Spreadsheets, Mongo DB.

Hands-on experience with ETL tools like SSIS for data extraction, transformation, and loading.

Hands-on experience working on Teradata databases and Snowflake.

Experienced in using Monitoring tools such as Splunk and Cloud watch.

Experienced in visualization tools such as Tableau, Power BI, and Quick Sight.

Hands-on experience with Agile, Waterfall Methodologies.

TECHNICAL SKILLS: -

AWS

S3, IAM, EC2, Glue, Redshift, Lambda, Athena, RDS, AWS Kinesis,

CloudWatch.

AZURE

Azure Data Lake Storage, HD Insights, Azure SQL Database,

Azure Log Analytics, Azure Stream Analytics, Databricks, Azure Triggers.

DATABASES

My SQL, Oracle, Dynamo DB, Snowflake, Apache Pyspark, SQL, Mongo DB.

OTHER TECHNOLOGIES

Pandas, Python, Scala, Shell Scripting, GCP Cloud Storage, Big Query.

ETL/BI

Tableau, Power BI, Quick sight

SOFTWARE METHODOLOGY

Agile, Waterfall

CERTIFICATIONS:-

AWS Certified Data Analytics - Specialty

WORK EXPERIENCE:-

Client: Santander, Dallas September 2021-Present

Title: Sr. Data Engineer

Santander is a retail and commercial bank that is involved in offering a range of financial products. The project that we worked on deals with unorganized data from third-party clients to process Auto-financing customer eligibility. The data migration project aimed to standardize and normalize customer data to make it easily accessible and usable for the organization for its immediate use.

Responsibilities: -

Designed an ETL architecture for data transfer from the source server to the Data Warehouse.

Developed an ETL process in AWS Glue to migrate customer data from external data stores such as S3 into AWS Redshift.

Built pipelines to copy the data from multiple sources to destination in AWS Redshift.

Developed Python code using modules such as that manipulate data in formats such as Excel, CSV, JSON, Avro, and Parquet.

Developed Pyspark scripts to perform advanced data processing and transformation tasks.

Developed Spark application scripts using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.

Developed the Pyspark code for ETL in AWS Glue jobs and EMR.

Performed Real-time event processing of data from multiple servers in the organization using Kafka.

Developed Airflow Workflow to schedule batch and real-time data from source to target.

Built different phases of the Software development life cycle using Agile Methodology.

Developed and implemented CI/CD pipeline involving Bitbucket, and Jenkins for complete automation from commit to deployment.

Worked on ETL Migration services by developing and deploying Lambda functions for generating serverless data pipelines.

Used Glue to crawl JSON data stored in Amazon S3 buckets.

Implemented Spark in EMR for data processing in AWS Data Lake.

Involved in designing and Developing Spark workflows using Scala to pull the data from the AWS S3 bucket.

Built a cloud data warehouse on Snowflake for batch processing and streaming (with snow pipes)

Used Athena to transform and clean the data before it was loaded into data warehouses.

Experienced in creating logical and physical data models, including database queries, tables, schema, indexes, and constraints, as per business needs.

Experienced in improving the performance of dashboards and visualizing customer data using Amazon Quick sight.

Used CloudWatch to monitor and alert production and corporate servers/storage.

Environment: - Python, Pyspark, Kafka, Scala, Glue, S3, Redshift, EMR, Airflow, Snowflake, Amazon Quick Sight, Agile, Bitbucket, Jenkins, Lambda, Athena, CloudWatch.

Client: GroupOn, Chicago March 2018-August 2021

Title: Data Engineer

Groupon is an American global e-commerce marketplace connecting subscribers offering activities, travel, goods, and services in different countries. The project worked on was Big Data Migration of customer data from Hadoop using big data technologies to Azure to improve customer experience.

Responsibilities: -

Migrating customer data through ETLs from Azure Data Lake Storage server to Azure SQL Database using Azure HD Insights.

Used Azure Data Factory to transfer data from Hadoop to Azure Data Lake Storage or other Azure data stores.

Used Apache Pyspark to load the data into Azure SQL Database.

Involved in executing Spark jobs and SQL queries by creating a cluster using Notebook in Databricks.

Used HD Insights for the Extract, Transform, and Load process from Hadoop to Azure.

Designed and implemented streaming solutions using Azure Stream Analytics.

Involved in developing and maintaining multiple Power BI visualizations and Dashboards as per the requirement.

Developed Azure DevOps data pipes for CI/CD.

Used Snowflake for analyzing and visualizing data in POWER BI.

Used Azure Log Analytics to monitor and troubleshoot the process of migration.

Involved in using Azure Triggers to monitor the automation of the migration process.

Extensive knowledge of working with various data formats CSV, JSON, XML, Tabular (Relational/Non-Relational data sets)

Environment: - Pyspark, Hadoop, Azure Data Lake Storage, Azure SQL Database, HD Insights, Azure Data Factory, Databricks Azure Stream Analytics, Azure DevOps, Azure Log Analytics, Azure Triggers, Snowflake, Power BI.

Client: -JK TECH, Banglore June 2016-February 2018

Title: -Data Engineer

JK Tech is a global IT solutions company that has been delivering high-impact technology solutions. The project we worked on was to Build Data Pipelines to maintain customer data received from different marketing platforms and migrate it to AWS.

Responsibilities: -

Primarily Responsible for converting the Manual Report system to a fully automated CI/CD Data Pipeline that ingests data from different Marketing platforms to AWS S3 data lake.

Deployed the project into Jenkins using the GIT version control system and worked on Jenkins continuous integration tool for deployment of process.

Performed data cleaning using pandas and packages in python.

Involved in Configuring EC2 Instances and IAM roles and created S3 Data pipe using Boto API to load data from internal data sources.

Developed spark SQL scripts and designed the solutions to implement using Pyspark.

Worked in the LINUX environment for the development of applications.

Involved in Developing PIG scripts to transform the raw data into consumable data as specified by the business users.

Worked on No SQL databases such as Mongo DB.

Involved in Setting up databases in AWS using RDS for storage using S3 buckets and configuring instance backups to S3 buckets.

Used AWS Kinesis streaming application for real-time processing.

Designed serverless application CI/CD by using the AWS Serverless application model AWS Lambda.

Used Visualization Tools Such as Tableau to get quick business insights into data.

Used Splunk to create dashboards, search queries, and reports for multiple applications.

Environment: - Jenkins, Python, SQL, Pyspark, EC2 Instances, IAM roles, crontab, LINUX, PIG Scripts, Mongo DB, RDS, AWS Code Pipeline, AWS Kinesis, AWS Lambda, Tableau, Splunk.

Client: -HealthPlix, Hyderabad June 2015– May 2016

Title: SQL Server Database Administrator

HealthPlix offers a digital healthcare service empowering healthcare professionals. Worked as a SQL Server Database Administrator to create a database using MySQL, managing permissions for allowing clients to use the updated database on a daily basis.

Responsibilities: -

Developed and created views, stored procedures, and functions for databases.

Efficiently involved in developing SQL queries for building and testing ETL processes.

Used various data modeling techniques to develop the database and was involved in the complete Software Development Life Cycle of the system.

Created constraints and triggers by maintaining data integrity.

Managed and configured all SQL health checks and monitoring using various methods and tools.

Used left, right, and inner joins by connecting dynamic and static datasets.

Experienced in frequently monitoring database performances, connections, and logs.

Environment: -My SQL Workbench, SQL Queries, stored procedures, views, functions, constraints, triggers, joins, connections, logs.

EDUCATIONAL QUALIFICATIONS: -

Master’s In computer information & Science from Western Illinois University.

Bachelor’s in Electrical & Electronic Engineering from Mahaveer Institute of Science & Technology.

Contact this candidate