Data Engineer Big

Location:

Arlington, TX

Salary:

110000

Posted:

August 14, 2023

Contact this candidate

Resume:

Highly dedicated, inspiring, and expert Data Engineer with over 4+ years of IT industry experience exploring various technologies, tools and databases.

Comprehensively experience in Big Data processing using Hadoop and its ecosystem.

Worked on Spark for computational analytics, installed it on top of Hadoop performed advanced analytical application by making use of Spark with SQL/Oracle/Snowflake and acquainted knowledge of AWS Cloud EC2 infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism and various services.

Debugged Pig and Hive scripts and optimizing MapReduce job.

Hands on experience in migrating on premise ETLS to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc. Google Cloud Storage, Composer.

Got Familiar with job workflow scheduling and monitoring tools like lambda & Airflow.

Developed various shell scripts and python scripts to address various production issues.

Experienced with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.

Involved in daily SCRUM meetings to discuss the development/progress. Languages: Python, Spark, R, C++, Java, SQL, JavaScript, HTML, CSS Database: MySQL, SQL Server, MongoDB, Hubble, Coginity Programming tools: Tableau, SAS, RapidMiner, TensorFlow, MapReduce/Hadoop, Power BI, JIRA, GitHub, Scikit-learn

Data Analytics &Machine

Learning:

Linear Regression, Logistic Regression, K-Mean, KNN, Decision Trees, Cluster Analysis, Neural Networks, Naïve Bayes, BigQuery

Statistical Tools: Time Series, Regression models, principal component analysis, Hypothesis Testing, A/B Testing, Confidence Intervals, T-test, ANOVA, NLP Python packages: NumPy, SciPy, pandas, PyTables, PyTorch, Tensorflow Operating Systems: Windows, Linux, Mac

Cloud Platform: AWS S3, Redshift, Glue, Lambda, Athena, SNS, SQS, RDS, IAM, CloudWatch, VPC, Secret Manager, AWS Lake Formation, API Gateway, EMR, Datanet (ETL), GCP

● Master of Science in Data Science: University of North Texas, Denton, TX Aug 2019-May 2021

● Bachelor of Engineering in Computer Engineering: Pune University, IN Aug 2015-May 2018

● Diploma in Computer Engineering: MSBTE, IN July 2013- May 2015 Merck Pharma, TX Oct 2022 – Present

Role: Data Engineer

Responsibilities:

Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.

Created complex SQL queries and scripts to extract and aggregate data to validate the accuracy of the data.

Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.

Developed Kafka consumer API in python for consuming data from Kafka topics and processed the xml file using Spark Streaming to capture User Interface updates.

Using rest API with Python to ingest Data from and some other site to BIGQUERY.

Build a program with Python beam and execute it in cloud Dataflow to run Data validation between raw source file and big query tables.

Using g-cloud function with Python to load Data in to Big query for on arrival csv files in GCS bucket.

Very keen in knowing newer techno stack that Google Cloud platform (GCP).

Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators. Aishwarya Deshmukh

Data Engineer

LinkedIn: https://linkedin.com/in/asd-data-engineer Phone: 817-***-****

Location: Arlington, TX

Email: ****************@*****.***

Summary

Skills

Education

Work Experience

Migrated an existing on-premises application to AWS.

Worked with google data catalog and other google cloud API's for monitoring, query and billing related analysis for Big Query usage.

Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HIVE.

Created hive tables on top of the data and based on the business rule performed data cleansing, along with standardizing, and make a few transformations on raw data using Spark. Optimized existing Scala code and improved the cluster performance.

Stored incoming data in the Snowflake's staging area. Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse.

Involved in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation.

Well versed with building CI/CD pipelines with Jenkins, used tech stack like Gitlab, Jenkins, Helm, Kubernetes. Amazon, TX Oct 2021 – Sept 2022

Role: Data Engineer

Responsibilities:

Designed and setup Enterprise Data Lake for Amazon Prime Air to provide support for various uses cases including Analytics, processing, storing, and Reporting of voluminous, rapidly changing data.

Performed end to end Architecture for Retail data & implemented various AWS services like Amazon Redshift, S3, Glue, Athena.

Imported the data from different sources like Apriso, Infor, Jama, Swift UI into data lake and perform computations using PySpark to generate the output response.

Worked with business users, business analysts in analysing business requirements and translating requirements into functional and technical design specifications.

Monitored the Prod Environment and optimized stored procedures for better performance by resolving locking conflicts and resource utilization.

Monitored job workflow scheduling using lambda & Airflow.

Good Understanding of Data ingestion for Data Orchestration and other related python libraries.

Worked on Rest API’s & Python API’s and landed data to S3 from external sources.

Created Data pipeline for different events of ingestion, aggregation, and load consumer response data in AWS S3 bucket into external tables in Athena & Redshift location to serve as feed for tableau dashboards.

Analysed & created excel visualization report from excel as a story narrative for Business Users and Product Owners.

Utilized ETL solution to develop pipeline that extract and transform data from multiple sources and load to Redshift for Automation.

Developed Data Central using Python and SQL for transformation process.

Worked on Java code through Jenkins, used Jenkins plugin to deploy into AWS.

Worked in Agile development based on Jira.

Dell Technologies, TX Aug 2020 – Sept 2021

Role: Data Engineer

Responsibilities:

Experienced in managing and reviewing Hadoop log files.

Monitored workload, job performance and capacity planning using Cloudera.

Provided ad-hoc queries and data metrics to the Business Users using Hive.

Worked on importing and exporting data from snowflake and DB2 into HDFS.

Worked on Snowflake modelling and Data cleansing.

Worked on Informatica workflow manager to create session, batches and schedule workflow, monitoring sessions.

Configured Apache Airflow for S3 bucket and Snowflake data warehouse and created Dag’s to run the Airflow.

Experienced with AWS, AZURE cloud services to smoothly manage application in the cloud.

Created data pipeline for different events of ingestion, aggregation, and load responses in AWS S3 to serve as feed for tableau dashboards.

Performed end-to-end delivery of pyspark ETL pipeline on Azure Databricks to perform the transformation of data orchestrated via Azure Data-Factory.

Worked on Apache spark writing python & Java applications to convert txt, json, csv, excel and parquet files. Trigent, India Jan 2017 – May 2019

Role: Data Engineer

Responsibilities:

Responsible for analysing the business requirement and preparing the mapping design documents for Confidential Point of sale and Direct sales.

Analysed large and critical datasets using various big data technologies.

Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.

Used Spark Streaming APIs to perform transformations.

Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.

Developed Pre-Processing job using Spark Data frames to convert Json documents to flat file.

Developed Dashboard reports on Tableau.

Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR

Contact this candidate