Professional Data Engineer

Location:

Dallas, TX

Posted:

March 23, 2023

Contact this candidate

Resume:

Shashank

Email:**********@*****.*** Phone: +1-760-***-****

Linkedin: https://www.linkedin.com/in/shashank-n-507545166

PROFESSIONAL SUMMARY:

●6 years of experience working on data analytics projects in big data environments; Google Certified Cloud Engineer; Solid understanding of big data and the development of technologies since 2016;

●Practical experience using open-source technologies like Hadoop, Nifi, DBT, and ELT architectures on the Google Cloud Platform.

●Practical knowledge in transferring on-premise ETL to the Google Cloud Platform (GCP) using cloud-native tools like BIG query, Cloud DataProc, Google Cloud Storage, Cloud Dataflow, Cloud Composer, Cloud Functions, and Cloud PubSub.

●Developed a design template for cloud IAM solutions that allows for auditing and data catalog integration.

●Practical experience working with forseti, a cloud scanner for security holes.Knowledge and hands-on experience in using terraform templates for google cloud platform such as enabling api’s, creating google buckets, projects etc.

●Refactoring bigquery SQL code to run the existing deployed code in less time and consuming less billed bytes.

●A working knowledge of data modeling (Dimensional & Relational) ideas like Fact and Dimension tables, Star-Schema Modeling, and Snowflake Schema Modeling.

●Highly skilled at creating distributed data warehousing designs and data marts.

●To handle the growing amount of data, we use SQL principles, Presto SQL, Hive SQL, Python (Pandas, Numpy, SciPy, Matplotlib), Scala, and Pyspark.

●Knowledge of developing Docker-based applications for airflow data engineering pipelines.

●Practical experience writing Bash scripts and setting up data pipelines on Unix/Linux systems.

●Understanding of the Azure big data stack for building GCP-compatible data pipelines.

●Practical knowledge of many programming languages, including Scala and Python.

●Used Kubernetes and Docker as the CI/CD system's runtime environment for building, testing, and deploying.Experience in handling python and spark context when writing Pyspark programs for ETL.

●Working knowledge of SQOOP for transferring data from RDBMS to HDFS and Hive.

●Wide-ranging experience in all stages of the software development life cycle (SDLC), particularly in the development, testing, and deployment of applications throughout the Analysis, Design, and Development phases.

●Strong expertise in data preparation, data modeling, and data visualization using Power BI, as well as experience creating a variety of reports and dashboards utilizing Tableau's Visualizations.

●Analyze and assess the various business units' reporting needs.

●Knowledge of Makefile and Buildkite for developing CICD architectures

●Expertise with different HDFS file formats, such as Avro, ORC, and Parquet.

●Excellent interpersonal and communication abilities, and quick to pick up new technology.

TECHNICAL SKILLS:

RDBMS Databases:

Mysql, Oracle Sql, Sql-Server

Google Cloud Platform

GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub, Cloud Datafusion, Cloud Dataprep, Cloud IAM

Big Data:

Hive, Presto, Ambari, Oozie, spark, Impala, Sqoop

Reporting:

Power BI, Data Studio, Tableau

Python:

Pandas, Numpy, SciPy, Matplotlib

Other

Terraform, Git, CI/CD

PROFESSIONAL EXPERIENCE:

Paramount Aug 2022 – Present

Sr. GCP Data Engineer

Responsibilities:

●Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.

●Migrate existing airflow running on Kubernetes cluster & airflow 1 into composer 2 with airflow 2 by setting the environment using the terraform modules and CICD deployment strategies.

●Pushed app specific source code image to container registry with the help of docker and used it in app specific dags with the help of Kubernetes Pod operator for achieving required functionality.

●Used terraform to create and alter table structures and make secret manger entries in GSM.

●Created tables specific DDLS to create new schemas in existing tables.

●Write Big query SQLs to achieve data manipulation in different task of the dags.

●Help configure housekeeping dag for airflow 2.5 such that airflow performance don’t get impacted with the time and space utilization be under threshold.

Wayfair Mar 2021 – Aug 2022

Sr. GCP Data Engineer

Responsibilities:

●Create data pipelines utilizing various airflow operators for ETL-related tasks in GCP.

●Converting bigquery jobs to DBT in order to create more lineage-based models for each data mart.

●Knowledge of migrating Oracle and SQL server tables from on-premise to cloud-based storage for a cdc-based requirement.

●Knowledge of BigQuery, GCS, Cloud functions, and GCP Dataproc.

●Expertise in developing dashboards for associate experience while running ELT workloads with dataflow, pubsub, and bigquery.Deployed Cloud IAM solution for auditing purposes such as to find out what level of access each of the GCP users in the organization has across all the projects.

●Deploying cloud dataflow jobs based on Java and Python to convert gcs files into bigquery and subsequently mysql for the web server-based react apps.

●To do data validation between raw source files and Bigquery tables, create a program with Python and Apache Beam and run it in cloud Dataflow.

●Using cloud Dataflow with Python, process and load bound and unbound Data from Google pub/sub topic to Bigquery.

●Monitoring all environments' Stackdriver jobs for Bigquery, Dataproc, and cloud data flow.

●The process of transferring BigQuery data into pandas or Spark data frames for more sophisticated ETL capabilities.

●Assisted with monitoring, query, and billing-related analyses for BigQuery consumption using the Google Data Catalog and other Google Cloud APIs.

●Developed BigQuery approved views for row- and column-level security, or for exposing the data to other teams and querying the BigQuery data from tableau servers using their access levels.

●Proficient in developing and deploying Hadoop clusters and several Big Data analytical tools, such as Apache Spark, Pig, Hive, SQOOP, and Cloudera Distribution.

Schlumberger Mar 2019 - Dec 2020

GCP Data Engineer

Responsibilities:

●Practical knowledge of creating data pipelines in Python, Pyspark, HiveSQL, and Presto.

●Monitored Data Engines for relational and non-relational databases, such as Cassandra and HDFS, to establish data requirements and data accusations.

●Building data pipelines in Composer on the GCP platform for ETL-related tasks using various airflow operators, both traditional and cutting-edge operators.

●To configure the services Data Proc, Storage, and BigQuery on GCP, use the cloud shell SDK.

●Using Python and the g-cloud function, load data into Bigquery for newly arrived csv files in the GCS bucket.

●Design and build a spark job in Scala to implement a batch processing pipeline from start to finish.

●Processing data in parquet file format in hive partition tables while using Scala, Spark, and Spark SQL.

●Built an ETL pipeline with Spark and Hive to ingest data from various sources.

●Using the appropriate technology and expertise in Hive SQL, Presto SQL, and Spark SQL for ETL workloads.

●Automating operational processes and implementing and managing ETL systems.

●Created stored procedures in MS SQL to process these files and update the tables with data fetched from various servers via FTP.

●Created a report in Tableau that keeps track of the dashboards uploaded to Tableau Server, allowing us to identify prospective new customers within the company.

●Created sophisticated tables with excellent performance metrics such as partitioning, clustering, and skewing by writing scripts in Hive SQL.

●Participated in the development of Oozie workflow and coordinated jobs to start on time and with the availability of data.

SunSoft Technologies Inc July 2017 – Feb 2019

Role:Hadoop Data Engineer

Responsibilities:

●In HIVE, managed/external tables were designed and made in accordance with the specifications.

●Creating Hive tables, populating them with data, and developing Hive queries that are executed within a MapReduce framework.

●Dynamic Partitions, Buckets, and Partitioning were implemented in HIVE.

●SQOOP was used to develop the code for data import and export into HDFS and Hive.

●I spent a lot of time using SQOOP to import and export data from HDFS to DB2 Database systems, as well as load data into HDFS.

●Created HIVE and PIG scripts in accordance with specifications.

●Was a part of writing UDF in HIVE.

●In charge of generating CR and CRQ for the release and turning the code over to QA.

●Putting a proof-of-concept into practice to convert map-reduce tasks into Spark RDD transformations.

●Excellent exposure to the Cloudera/Hortonworks Hadoop distributions for application development.

●Built sophisticated SQL queries and leveraged JDBC connectivity to access the database.

EDUCATION :

●Masters in computer science, State University of New York.

●Bachelors in Electrical and Electronics Engineering, Osmania University.

Certifications:

Google Cloud Certified Professional Data Engineer.

Contact this candidate