Data Engineer

Location:

Arlington, TX, 76011

Posted:

January 01, 2025

Contact this candidate

Resume:

BHARATHWAJ REDDY YANALA

Data Engineer

Address: Houston, Texas 77056, USA

Phone: +1-979-***-**** Email: *********************@*****.*** Professional Summary

• Around 6 years of Progressively responsible experience in Data Engineering, Data Analysis, ETL, and Data warehousing, Business Intelligence background in Designing, Developing, Analysis, Implementation, and post implementation support of DWBI applications.

• Worked on architecting, designing, and implementation of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.

• Used Python, implemented multiple machine learning techniques such as Decision Tree, Naive Bayes, Logistic Regression, and Linear Regression to evaluate the accuracy rate of each model.

• Analyzed and transformed data using various SQL functionalities using MySQL server to generate data extracts and reports.

• Conducted data analysis experiments, A/B testing, and statistical modeling to optimize business processes, improve customer experience, and increase operational efficiency.

• Stayed up to date with the latest cloud technologies, data analytics tools, and industry trends, and recommended innovative solutions to enhance data analytics capabilities.

• Developed data wrangling through data gathering and querying process over SQL and implemented data cleaning process with R and Python.

• Worked with a diverse set of clients all over the world, identifying and analyzing Dynamic Values, and managing development with agile methodology.

• Streamlined data processing by automating ETL workflows in Informatica, DataStage, and Ab Initio, resulting in a 30% reduction in manual handling and improved data processing speed. This initiative enhanced data quality and integrity within Snowflake, with additional integration of Informatica with Airflow for enhanced workflow management.

• Developed backend applications to integrate real-time data and automate data loading processes using Oracle SQL, PL/SQL, T-SQL, and Unix Shell scripting.

• Engaged in SQL performance tuning, troubleshooting, and database administration to ensure optimal database performance.

• Designed, optimized, and secured CI/CD pipelines using Apache Airflow, Docker, Kubernetes, and SQL, collaborating cross-functionally for efficient data processing and analysis.

• Achieved operational cost reduction and improved data processing speed by 20% through pipeline optimizations, while mentoring junior data engineers to ensure team growth and development.

• Utilized Snowflake's architecture for external data sharing, supporting data monetization strategies, and collaborated on end-to-end data pipelines, optimizing ETL job performance and modernizing infrastructure, including ETL processes. Experience

Data Engineer - Paramount - Atlanta, GA January 2022 to Present

• Collaborated with business analysts and technology teams to create comprehensive requirements documentation, including ETL/ELT mappings, interface speciﬁcations, and business rules, facilitating eﬃcient data management and integration.

• Developed and Demonstrated the POC, to migrate snowflake workload to Google Cloud Platform using GCS, Big Query, Cloud SQL and Cloud DataProc. Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators both old and newer operators.

• Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP’s big data stack. Created BigQuery authorized views for row level security or exposing the data to other teams.

• Migrated Data pipelines into GCP’s Dataflow/DataProc and streamlined pipeline development processes using templates.

• Analyzed the system for new enhancements/functionalities and performed Impact analysis of the application for implementing ETL changes.

• Created BigQuery jobs for loading the data into BigQuery tables from data files stored in Google Cloud storage daily.

• Used Cloud Functions to support the data migration from BigQuery to the downstream applications.

• Documented the inventory of modules, infrastructure, storage, components of existing Snowflake data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration.

• Developed Client Quality Status and Flag reporting tables at DAY/WEEK/MONTH granularity to identify new, winback, returning, and quality vs. non-quality users based on activity and viewing minutes, enabling targeted insights by ﬂagging users who meet the '30-minutes in 30-days' quality threshold and focusing analysis on the top ~50% of users contributing ~99% of Monthly TVMs.

• Developed an automated pipeline to transfer client quality data from Snowﬂake to Amazon S3 and Deloitte SFTP server, ensuring seamless data ﬂow and secure access for downstream processing.

• Designed and implemented an ETL pipeline to match Pluto device information with TapAd device graph data, using Commercial Audience IDs (CAID) and IP addresses (IPV4, IPV6) to classify devices and identify associated household IDs.

• Migrated existing Airﬂow DAGs to Google Cloud Platform (GCP) Airﬂow instances, ensuring a smooth transition and continued operation of data workﬂows.

• Provided on-call support for production issues related to existing pipelines, ensuring the reliability and availability of critical data processes.

• Designed and developed a Proof of Concept (POC) for TVM Attribution by Entitlement, showcasing the potential for tracking attribution within the organization, including identifying registered and non-registered PlutoTV users and their entitlements such as Walmart and T- Mobile aﬃliations.

• Conducted data proﬁling analysis on user registration data, contributing to a better understanding of user demographics and behavior.

• Created a reporting structure to identify distinct segments of Pluto users based on usage percentile, enabling more targeted performance tracking and analysis.

• Developed a Streamlit application to streamline partner dashboard access and content mappings, featuring a user-friendly interface for managing partner mappings and facilitating Tableau access requests.

• Designed and built data pipelines in Google Cloud Platform's Airﬂow for various ETL-related jobs, leveraging diﬀerent Airﬂow operators to automate and streamline data processes.

• Provided valuable support to the Business Intelligence (BI) team by optimizing Snowﬂake queries, enhancing query performance, and ensuring eﬃcient data retrieval for analytics and reporting.

• Utilized Snowﬂake's stored procedures and corresponding DDL statements, incorporating them into data processing workﬂows.

• Enhanced operational support for data migration and data proﬁling processes, ensuring data quality and integrity throughout the organization.

Data Engineer – Humana, IL, USA May 2021- March 2022

• Orchestrated and managed data workflows using Apache Airflow, ensuring timely and reliable execution of data processing tasks.

• Built and maintained data warehouse solutions using technologies like Snowflake, optimizing data storage and retrieval for analytical purposes.

• Collaborated with data scientists and analysts to understand data requirements and provide optimal data solutions that meet business objectives.

• Conducted data modeling and schema design to ensure effective organization and storage of data in databases.

• Implemented data quality checks and validation processes to ensure the accuracy and consistency of data across multiple systems.

• Worked closely with stakeholders to define data governance policies and ensure compliance with privacy regulations.

• Conducted performance tuning and optimization of database queries, enhancing overall system performance.

• Collaborated with cross-functional teams to integrate new data sources into existing data pipelines, ensuring seamless data flow.

• Maintained documentation for data pipelines, data models, and data governance processes, facilitating knowledge transfer and onboarding.

• Evaluated and implemented new technologies and tools to enhance data engineering capabilities, staying abreast of industry best practices.

• Developed data wrangling through data gathering and querying process over SQL and implemented data cleaning process with R and Python.

• Worked with a diverse set of clients all over the world, identifying and analyzing Dynamic Values, and managing development with agile methodology.

• Worked in a Unix environment with Oracle SQL Developer, Microsoft SQL Developer, Oracle Warehouse, and PyCharm.

• Presented Dashboards to Management for more Insights using Microsoft Power BI. Queried data from SQL server, imported other formats of data and performed data checking, cleansing, manipulation, and reporting using SQL.

• Migrating data from on-premises databases (SQL Server) to Cloud databases/Snowflake.

• Developed and maintained data models to support business intelligence and machine learning projects, and optimized Spark tasks to run on the Kubernetes Cluster for quicker data processing.

Data Science Intern – InstaHub, IL, USA May 2020- Nov 2020

• Developed an energy model leveraging sensor data collected from building data loggers to predict temperature diﬀerences between indoor and outdoor environments, enabling the automation of HVAC set point adjustments within the building.

• Implemented an anomaly detection system using the energy model to identify outliers and promptly notify clients of abnormal temperature ﬂuctuations outside the expected range.

• Conducted extensive exploratory data analysis (EDA) to classify trends within the data logger data, facilitating a better understanding of temperature patterns and aiding in decision-making for temperature control systems.

• Employed data logger data to create an occupancy classiﬁcation system, determining whether the building was occupied or unoccupied based on sensor data. This classiﬁcation supported energy eﬃciency eﬀorts by adjusting HVAC operations as needed.

• Collaborated closely with the web development team to automate the energy models, streamlining the process of managing building HVAC systems and optimizing energy consumption.

Data Engineer- Accenture, India September 2018- February 2020

• Collaborated with data engineering teams to implement Azure-based data solutions, leveraging services such as Azure Data Factory and Azure Databricks.

• Implemented data lakes on Azure Storage, enabling efficient storage and retrieval of structured and unstructured data.

• Conducted performance tuning for Azure SQL Data Warehouse, optimizing query performance for analytical workloads.

• Implemented Azure Active Directory (AAD) for identity and access management, strengthening security measures and achieving compliance with industry standards.

• Orchestrated the adoption of Azure DevOps for continuous integration and continuous deployment (CI/CD) pipelines, resulting in a 30% reduction in deployment time.

• Spearheaded the migration of on-premises applications to Microsoft Azure, achieving a 20% reduction in operational costs and improving overall scalability.

• Setting up a continuous integration (CI) process, entails creating a Git repository, configuring a Jenkins server, and installing the Git plugin, creating a new Jenkins job, and configuring it to build the project from the Git repository, configuring build settings, saving the job, running a test build, and finally publishing the build to a remote Git repository or deploying it to the production environment after a successful build.

• Implemented New ETL Solution based on Business Requirements.

• Design & implement strategies to build new data quality frameworks to replace old systems in place.

• Working with Agile environment and using rally tool to maintain the user stories and tasks. Participated in collaborative team designing and developing a Snowflake data warehouse.

• Design of workflow which includes setting up DEV, QA, and PROD environments, creating users, and managing their permissions.

• Created and maintained data workflows using Apache Airflow to schedule and monitor ETL jobs, ensuring data quality and accuracy.

• Working on architecting, designing, and implementation of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.

• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple File formats.

• Collaborate with Data Science to gather requirements, analyze data, design, implement, and test an application utilizing the Agile paradigm.

• To handle massive datasets from diverse sources, such as SQL databases, NoSQL databases, One Lake, and APIs, designed and deployed scalable data pipelines with complicated data processing using Apache Airflow.

• To enable real-time data analytics, built real-time streaming apps with Apache Kafka and Apache Spark.

• Tableau was used to create data visualization dashboards to communicate findings to stakeholders. Education

Master of Science in Data Science March 2020 – March 2022 DePaul University, IL, USA GPA: 3.78/4.0

Bachelor of Technology in Civil Engineering August 2014 – August 2018 National Institute of Technology University, Kerala, India GPA: 3.1/4.0 Skills

Methodologies:

SDLC, Agile, Waterfall, Scrum

Programming Languages and Platforms:

Python, R, C, Scala, SQL, Unix Shell Script,

FactSet, Bloomberg, Euro-monitor, Eikon,

MergerMarket, Dealogic, Factiva, S&P

CapIQ, SNL, One Source and Zawya.

ML Algorithm: Linear Regression, Logistic

Regression, Decision Trees, Supervised

Learning, Unsupervised Learning,

Classification, SVM, Random Forests,

Naive Bayes, KNN, K Means, CNN

Big Data: Hadoop, HDFS, Yarn, Sqoop,

Oozie, Hive, HBase, Spark, Kafka, Impala,

Nifi, Cassandra, Apache Airflow, Databricks

ETL/ELT Tools: SSRS, SSIS, SSAS,

Informatica, Matillion, Azure Data factory,

DBT

Databases: MySQL, SQL Server,

Snowflake cloud, SQL, NoSQL, MySQL,

SQL Server DB2, PostgreSQL, Oracle,

MongoDB, DynamoDB

Data Analytics Skills:

Data Cleaning, Data Masking, Data

Manipulation, Data Visualization

Digital Ocean: Droplets, Spaces

BI & CRM Tools:

Tableau, Microsoft Business Intelligence

(Power BI), Sigma Computing

Packages:

NumPy, Pandas, Matplotlib, SciPy Scikit-

learn, Seaborn, TensorFlow

Tools/ IDE/ Build Tools:

PowerBI, Tableau, Talend Studio, Git, Git

Bash, Eclipse, IntelliJ, Maven, ANT,

Jenkins, GitHub, Jira, Snowflakes,

Bitbucket, Data pipelines, QlikView

Cloud Computing: AWS (S3, CloudWatch,

Athena, RedShift, EMR, EC2, DynamoDB),

Azure (Azure Data Factory, Azure Blob,

Azure Databricks), IAM, Secret Manager,

S3, Lambda, CloudWatch, Messaging Queue

(SNS & SQS), Azure - ADF, Blob Storage

File Formats:

Parquet, Avro, ORC, JSON

Operating System: Windows, Linux, Unix,

MacOS

Contact this candidate