Sr.cloud data engineer

Location:

Texas City, TX

Posted:

May 14, 2024

Contact this candidate

Resume:

Anand. K

Sr. Cloud Data Engineer

Email: ***********@*****.***

Ph:512-***-****

Professional Summary:

Overall 12+years of experience with (Banking, Healthcare payer, Insurance and Telecommunications.

Extensive experience in ETL Development (Informatica Cloud IICS, Informatica 9.6 and 10.1), Data Warehousing.

Data Integration, Data Migration and PL/SQL conversion to ETL, Microsoft Power Automate and PowerApps.

Expertise with the tools of Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie and NoSQL databases like HBase, MongoDB and Druid, Tableau

Experience in using various tools like Sqoop, Flume, Kafka, and Pig to ingest structured, semi structured, and unstructured data into cluster.

Good understanding of Cloud Based technologies such as AZURE, AWS, GCP.

Proficient in designing, implementing, and managing scalable data infrastructure solutions. Data modelling, ETL processes, data warehousing, and big data technologies.

using Apache Airflow and Kubernetes, enabling seamless deployment and monitoring of data pipelines.

Data analytics with focus on cloud platforms, (Azure, AWS, GCP) and experience in Snowflake, Databricks, DBT, Informatica Proficient in leveraging cloud technologies to optimize data processing, storage.

Designed and implemented scalable data pipelines on AZURE, AWS and GCP, integrating various data sources to facilitate real-time analytics and reporting.

Worked on Airflow 1.8 (Python2) and Airflow 1.9 (Python3) for orchestration and built custom Airflow.

Proficient in Data Warehousing principles (Fact Tables, Dimensional Tables, Dimensional Data Modelling.

Design data architectures, including databases, APIs, streaming data, and flat files.

Implemented automated ETL processes using Apache Spark and Google Cloud Dataflow, reducing manual intervention, and accelerating data processing times.

building data platforms using AWS S3, Athena, Glue, and AWS Redshift, built data lake on AWS cloud and transformed data in Snowflake to build analytical dashboards and data.

visualizations on Tableau and PowerBI.

Worked on GCP services such as Big Query, Cloud Storage, Cloud Dataflow, Cloud Pub/Sub and Cloud SQL.

Worked on processing real-time data using Kafka 0.10.1 producers and stream processors, and implemented multiple algorithms for analytics using Cassandra with Spark and Scala

implemented stream process using Kinesis and landed data into Data Lake S3

working with Big Data technologies such as Hadoop, MapReduce jobs, HDFS, Apache,Spark, Hive, Pig, Sqoop, Flume, Kafka.

Developed data models and schemas optimized for query performance and data retrieval, ensuring data integrity and consistency across multiple platforms.

Led the migration of on-premises data infrastructure to cloud environments, resulting in improved performance, reliability, and cost efficiency.

implement processes and tools to monitor data quality, identify anomalies, and take corrective actions to maintain data accuracy and consistency.

development and Operations involving AWS Cloud Services, EC2, EBS, VPC, RDS, Auto scaling, Cloud Watch, and SNS.

Experience in installation, configuration, support, and maintenance of Cloudera.

Designed and developed BI products using Microsoft BI stack (SSIS and SSAS) and Tableau

Worked on data models and loading unstructured data using HBase, Dynamo DB and Cassandra.

tuning database configurations, optimizing queries, and implementing caching mechanisms.

implementing access controls, encryption, and other security measures to protect sensitive data., such as GDPR, HIPAA, or PCI-DSS

You work closely with data scientists and analysts to understand their data requirements and provide them with the necessary infrastructure and tools.

handle increasing volumes of data and maintain high availability and reliability to minimize downtime and ensure uninterrupted data processing.

Proven ability to collaborate with cross-functional teams to deliver robust and reliable data solutions that drive business insights and decision-making.

Implemented security best practices and data governance policies to ensure compliance with regulatory requirements and protect sensitive information.

Experience in various databases like Snowflake, Teradata, Oracle for the data processing.

Experience in using python scripts for read files and Archive files in Unix.

Experience in all phases of SDLC and Agile Methodologies including Dimensional Data Modelling.

EDUCATION:

Bachelor’s in computer science engineering, JNTU, India.2005

Master’s in computer networking, University of Sunderland, UK.2009

CERTIFICATIONS:

GCP Associate Cloud Engineer - Certification ID: QeNqUH

Technical Skills:

Project Methods: Agile, Scrum, Software development lifecycle, Model development lifecycle

ETL Data Pipelines: Apache Airflow, Hive, Talend, Informatica Intelligent Cloud Services–IICS, Informatica 9.6 and 10.1, Informatica Power Exchange

Big Data: Hadoop, Spark, HDFS, Flume Hadoop, Hive, MapReduce, Pig, Kafka, Sqoop, Oozie, Hadoop, Spark, Spark Streaming, Hive, and Kafka

Databases: Cassandra, HBase, DynamoDB, MongoDB, MS Access, SQL, MySQL, Oracle, PL/SQL, SQL, RDBMS, AWS Redshift, Amazon RDS, Teradata.

Data warehousing: Redshift, Snowflake, Google Big Query

Programming & Scripting: Python, Scala, Java, Bash SQL, PL/SQL, C++,ER/ Studio

Visualization: MS Power Point, Power BI, Tableau, Quick Sight, QlikView

Libraries: Pandas, NumPy, Spicy, Scikit-learn, spaCy, matplotlib, TensorFlow, Keras

Data modeling: ER/Dimensional Modeling, Star Schema, Snowflake Schema

Cloud platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP)

Version control: Git, SVN

Financial analysis: Risk assessment, fraud detection, anti-money laundering (AML)

Domains: Banking, Telecom and Retail CPG

Professional Experience

United Airlines/TX Mar 2023-Present

Sr. Cloud Data Engineer/Lead Data Engineer.

Responsibilities:

Designing and implementing data pipelines and architectures to support data processing and analytics, includes data ingestion, storage, transformation, and consumption.

involves managing and implementing data engineering solutions on cloud platforms like AWS, Azure, or Google Cloud.

Worked on complex data pipelines integrating Azure Data Lake, Azure Databricks, and Apache Airflow, optimizing data flow architecture for scalable analytics.

Developed comprehensive Erwin data models to visualize and manage data architecture, supporting effective data governance and standardization across departments.

Installed and configured Apache airflow for workflow management and created workflows in python.

Creation of S3 data lake infrastructure and automate the entire process using AWS Lambda function and API Gateways and further ETL process to Cloud warehouse (Redshift) to support Advance Analytics.

Streamlined real-time data ingestion using Kafka, channelling information into Azure Data Lake, Azure Storage, and Azure SQL Data Warehouse, facilitating immediate data availability for processing.

Meticulously designed robust data architecture, intricate models, and efficient columnar families in Cassandra. Reliably loaded and updated data in Cassandra using advanced Spark SQL techniques.

Redshift spectrum external schema creation and tables creation for S3 data on Redshift upon running instance and query S3 data from Redshift and load data into other fact and dimension tables, rather than using COPY command if data volume are huge.

involve pulling data from databases, APIs, or Using Cloud Composer to orchestrate extracts data from the source.

Google Cloud's security compliance standards, ensuring data integrity and confidentiality.

Partitioning Data streams using Kafka. Designed and configured Kafka cluster to accommodate heavy throughput.

Using rest API with Python to ingest Data from and some other site to BIGQUERY.

Knowledge in Tableau Administration Tool for Configuration, adding users, managing licenses and data connections, scheduling tasks, embedding views by integrating with other platforms.

Implemented robust security and monitoring solutions, and optimized cost management using Azure services including Azure Key Vault, Azure Security Centre, Azure Monitor, and Azure Cost Management.

Working with AWS stack S3, EC2, EMR, Athena, Glue, Redshift, DynamoDB, RDS, Aurora, IAM, and Lambda. Provide and implement solutions and proof of concepts for data pipelines using Microsoft Azure cloud services such as Azure Data Bricks, Azure Data Factory, and PySpark.

Google, Austin, TX Mar 2022-Feb 2023

GCP Data Engineer

Responsibilities:

Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and streaming sources

Experienced in Maintaining the Hadoop cluster on GCP using the Google cloud storage, Big Query and Dataproc

Led the migration/replicating from SQL Server to BigQuery using DataStream, GCS buckets Pub/sub and DataFlow.

Used Apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.

Implemented both batch and real-time ingestion workflows in Apache Druid, enabling the ingestion of historical and streaming data for comprehensive analytics.

Extracted and transformed data from structured and semi-structured formats like JSON, CSV, and Parquet

Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query

Using G-cloud function with Python to load data into Big Query for on arrival CSV files in GCS bucket.

managed, infrastructure scaling, and maintenance, Cloud Composer integrates with other Google Cloud services such as Big Query, Cloud Storage, Dataflow, Dataproc.

data encryption to ensure data protection and compliance.

Transformed data and developed metrics using Spark SQL to be displayed on dashboards.

Worked closely with data analysts, scientists, and business stakeholders to gather requirements and deliver effective data solutions aligned with business needs.

Analysed the SQL scripts and designed the solution to implement using PySpark · Used cloud shell SDK in GCP to configure the services Data Proc, Storage, and Big Query

Implemented data quality checks and validation processes within PySpark pipelines to ensure accurate and consistent data.

Build data pipelines using airflow in GCP for ETL related jobs using different airflow operators.

Implemented custom visualizations and charts in Apache Superset using its extensible visualization library, catering to specific business requirements and data insights.

Configured Snow pipe to pull the data from Google Cloud buckets into Snowflakes table.

Installed Kafka manager for consumer lags and for monitoring Kafka metrics also, this has been used for adding topics, Partitions, etc.

Designed solutions that scaled to handle growing data volumes while maintaining optimal performance and resource utilization.

Developed Pre-processing job using Spark Data frames to flatten JSON documents to flat file

Worked on data warehouse migration to GCP and modernize analytical capabilities.

Perform Data Cleaning, features scaling, and features engineering using pandas and NumPy packages in Python.

Integrated PySpark applications with GCP for scalable and distributed data processing

Tuned PySpark jobs for optimal performance by configuring partitioning, caching, and utilizing Data Frame optimization techniques.

Exported the analysed data to Teradata using Sqoop for visualization and to generate reports for the BI team · Migrated the computational code in HQL to PySpark

Utilized PySpark’ s window functions and advanced aggregation techniques to perform time-series analysis and sliding window computations.

Executed complex join operations using PySpark to combine data from multiple sources and create unified datasets.

Assess the existing on-premises data infrastructure, including data sources, databases, and ETL processes.

Created automated Python scripts to convert data from different sources and to generate ETL pipelines.

Converted SQL queries into Spark transformations using Spark RDDs, Python, and Scala.

Comcast, Denver, CO Mar-19- Feb-22

Sr. Data Engineer/Data Analyst

Responsibilities:

Building and maintaining data pipelines to collect, and store large data from various sources including social media, streaming platforms, and content management systems.

Designing and optimizing databases and data warehouses for efficient data storage and retrieval.

Collaborating with data scientists and analysts to ensure data availability and integrity for analysis purposes.

Implementing data governance policies and ensuring compliance with regulations such as GDPR or CCPA.

Integrating third-party APIs and tools to enrich and augment existing datasets.

Involved in creating detailed LLD (Low Level Design) document which includes all Business requirements and technical specifications.

Worked for designing Logical and Physical data models for Master Data and Transactional Data.

Prepared the Mapping document which details out the mapping of source to the target data model with application of business rules.

Support and Maintenance of Monthly Metric Reports for LOB Analysis.

Designed and developed audit control process for PL/SQL jobs to keep track of all the job execution along with balance and reconciliation reports.

Worked on developing ETL workflows on the data obtained using Python for processing it in HDFS and HBase using Flume.

Involved in ETL Design and development sessions.

Developed PL/SQL programs to load and validate the data into staging environment.

Tuning Legacy operations and Automated the Analysis reports.

Developed Tableau Reports to display monthly analysis on each LOB requirements. Designed and Implemented the ETL for the report flow, created Dynamic Data Types, Implemented Data Aggregation in Tableau.

Involved in Performance considerations in Tableau vis, calculation, and Dashboard design.

Involved in creating OBIEE dashboards and migrating OBIEE Dashboard reports to Tableau.

Automated Daily outlook email reports to Tableau display.

Experienced in creating the sites and access control in Tableau Server.

Used AWS tools like S3 and Athena to query large amounts of data stored in S3 buckets.

Developed and maintained ETL pipelines using Apache Spark and Python on Google Cloud Platform (GCP) for large-scale data processing and analysis.

Designed and implemented efficient data models and schema designs using Big Query for optimized data querying and storage.

Outlined a scalable and efficient data architecture that integrates Snowflake, Oracle, GCP services, and other relevant components.

Recognized data flow patterns, considering data extraction, transformation, loading, and analytics processing.

Develop ETL pipelines to extract data from Oracle databases using efficient methods like change data capture (CDC) or scheduled batch processing.

Worked with various data sources including structured, semi-structured and unstructured data to develop data integration solutions on GCP.

Implemented real-time data processing using Spark, GCP Cloud Composer, and Google Dataflow for streaming data processing and analysis.

Built data ingestion pipelines (Snowflake staging) using disparate sources and other data formats to enable real-time data processing and analysis.

Integrated data pipelines with various data visualization and BI tools such as Tableau and Looker for dashboard and report generation.

Mentored junior data engineers and provided technical guidance on best practices for ETL data pipelines, Snowflake, Snow pipes, and JSON.

Implemented data security measures such as access control policies and data encryption to ensure data protection and compliance with regulatory requirements.

Collaborated with cross-functional teams including data scientists and analysts to deliver data-driven solutions.

Worked with stakeholders to understand business requirements and translate them into technical data engineering solutions.

American Electric Power, Columbus, OH Apr-2017-2019

BI Developer/Data Analyst

Responsibilities:

Involved in planning and designing database schema and provided documentation.

Based on business requirement, developed the Complex SQL queries with Joins and T-SQL, PL/SQL Stored procedure, Views, Trigger to implement the business rules and transformations.

Extensively worked in ETL and data integration in developing ETL mappings and scripts using SSIS, Worked on Data transfer from a Text file to SQL Server by using bulk insert task in SSIS.

Developed database objects including tables, clusters, Indexes, views, sequences, packages, triggers, and procedures to troubleshoot any database problems.

Involved in Performance considerations in Tableau VIS, calculation, and Dashboard design.

Automated Daily outlook email reports to Tableau display. Created grant charts and box charts for data display.

Developed PL/SQL programs to load and validate the data into staging environment

Tuning Legacy operations and automating the Analysis reports.

Experienced in working with cross-functional technical teams.

Experienced in creating AD queue with single user consumption and multiuser consumption.

Involved in creating Word Cloud bubble charts in tableau to track the trends in a project.

Used SQL loader and external tables in loading Bulk files Daily/Monthly Frequency.

Involved in writing External table scripts with complex shell scripting and data scrubbing based on the Business requirements.

Worked in importing and cleansing of data from various sources like Main Frame, Sybase, flat files, Excels to and from SQL Server with high volume data.

Built interactive Tableau dashboards utilizing Parameters, calculated fields, Table Calculations, User Filters to handle views more efficiently.

Developed various reports using best practices and different visualizations like Bars, Lines and Pies, Maps, Gantt Charts, Bubble Charts, Histograms, Bullet Charts and Highlight tables.

Huntington National Bank, Columbus, OH Feb 2015- Feb 2017

BI Developer

Responsibilities:

Develop SSIS, PL/SQL Packages based on the requirements. Gathering requirements from the Business Users for SSIS, PL/SQL Packages. Getting the data from different sources to the Staging area.

Scheduling work sessions to help non-technical team members to understand the SQL queries used to create and run various PL/SQL, SSIS Packages used by the team.

Developed various complex stored procedures, packages, interfaces, and triggers in PL/SQL.

Created various SQL and PL/SQL scripts for verification of the required functionalities.

Database Triggers to enforce Data integrity and additional Referential Integrity.

Developed complex SQL queries for data retrieval from various database objects including tables and views.

Worked with DBA extensively to create External tables, Loading schedules, Redo & Undo logging. Worked on various backend Procedures and Functions using PL/SQL.

Performance tuning of the existing packages. Changing the packages according to the new requirements so that they can fetch data from new tables under new databases.

Managing the daily, weekly, and monthly packages for the team.

Working on complex stored procedures and performance tuning the procedures which are slow.

Creating documents that helps the business team to understand various SSIS Packages which they use and creating worksheets that they can use for running the packages monthly.

Worked in Tableau environment to create dashboards like Yearly, monthly reports using Tableau desktop & publish them to server.

Converted Excel Reports to Tableau Dashboard with High Visualization and Good Flexibility.

United Nations, NYC Jun 2012- Dec 2014

Database Analyst/Report Developer

Responsibilities:

Generated reports using MS SQL Server Reporting Services 2005/2008, UNIX Shell scripts from OLTP and OLAP data sources.

Writing complex SQL Queries used by SSRS Reports.

Created new logical and physical design of database to fit new business requirement and implemented new design into SQL Server 2005.

Designed and developed audit control process for SQL jobs to keep track of all the job execution along with balance and reconciliation reports.

Working with database connections, SQL joins, cardinalities, loops, aliases, views, aggregate

conditions, parsing of objects and hierarchies.

Involved actively with the PM role to understand and assign tasks to offshore team and get the daily updates, solely responsible for representing the offshore team.

Performed SQL tuning using Explain Plan, Hints, and indexes. Responsible for performing code reviews.

Created SSIS packages to load data into temporary staging tables. Utilized SQL*Loader to load flat files into database tables.

Involved in table and index partitioning for performance improvement and manageability and identified, tested, and resolved database performance issues (monitoring and tuning) to ensure database optimization.

Involved in creating the test Cases and validation scripts in SQL to test the migration of data from one data warehouse and OLAP to another.

Scheduling work sessions to help non-technical team members to understand the PL/SQL queries used to create SSRS Reports and to run various SSIS Packages used by the team.

Prudential Financial, Scranton, PA May 2011– May 2012 BI Developer

Responsibilities:

Developed stored procedures to transform the data from enterprise data warehouse to SQL server and load data to fact tables.

Transferred data from various sources like MS Excel, MS Access, and SQL Server using SSIS and then created reports using this data using SSRS. Created new and converted several Crystal reports into detailed SSRS reports. Debugged reports for the business end users.

Performed T-SQL script tuning and optimization of queries for reports that take longer execution time using MS SQL Profiler, index tuning wizard and SQL Query Analyzer in MS SQL Server 2005.

Generated reports using SSRS that could be used to send information to different primary vendors, clients, and managers.

Involved in documentation for Reports, DTS and SSIS packages.

Designed and implemented stored procedures and triggers for automating tasks.

Developed trend indicating reports which monitor changes in different categories and record historical analysis and displays them in a single dashboard.

Developed graphs and charts to display telecommunication data categorized by hierarchy and organization.

Developed Parameterized, Drill Through, Rolling, Linked and Sub-reports with specific KPIs. Developed indicators for rolling reports and drill down reports.

Work simultaneously with slowly changing dimensions and real time data to report accurately.

Generated Reports using Global Variables, Expressions and Functions for the reports and created stylish Report Layouts and Deployed the Reports using SSRS 2005/2008.

Wrote Complex Stored Procedures, Triggers, Views and Queries. Created indexes, Constraints, and rules on database objects.

Contact this candidate