Post Job Free

Resume

Sign in

Data engineer

Location:
California City, CA
Salary:
80
Posted:
April 26, 2024

Contact this candidate

Resume:

Reddy. K

AWS, GCP, Data Engineer

Email: ad35yi@r.postjobfree.com PH: 512-***-****

Summary:

• Overall 12+ years of experience in IT/data engineering and analytics with focus on cloud platforms (AWS, GCP) and experience in Snowflake, Databricks, DBT, Informatica, Immuta, Alteryx to move data from multiple sources into a common target area such as Data Marts and Data Warehouse.

• Strong knowledge in OLAP Systems, Dimensional modeling using Star schema and Snowflake schema.

• Experience in implementing update strategies, incremental loads and Change Data Capture (CDC) and handling SCDs (Slowly Changing Dimensions) using Informatica.

• Experience in various databases like Snowflake, Teradata, Oracle for the data processing for various business purposes.

• Extensive experience in Performance Tuning in Identifying and fixing bottlenecks also tuned the complex Informatica mappings for better Performance.

• Experience in using python scripts for read files and Archive files in Unix.

• Experience in all phases of SDLC and Agile Methodologies including Dimensional Data Modeling.

• Application Data warehousing domain experience in Banking, Healthcare payer, Insurance and Telecommunications.

• Interpersonal skills and ability to work independently and deliver on time, hard working and ability to adapt to new technologies very quickly.

• Building models, developing data mining and reporting solutions that scales across a massive volume of Structured and unstructured data.

• Extensive experience in ETL Development (Informatica Cloud IICS, Informatica 9.6 and 10.1), Data Warehousing, Data Integration, Data Migration and PL/SQL conversion to ETL, Microsoft Power Automate and PowerApps

• Expertise with the tools of Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie and NoSQL databases like HBase, MongoDB and Druid, Tableau

• Expertise in developing strategies for Extraction, Transformation, Loading (ETL) data from various sources into Data Warehouse using.

• Experience in using various tools like Sqoop, Flume, Kafka, and Pig to ingest structured, semi-structured, and unstructured data into cluster.

• Good understanding of Cloud Based technologies such as AWS, GCP.

• Worked on Airflow 1.8 (Python2) and Airflow 1.9 (Python3) for orchestration and built custom Airflow operators and orchestrated workflows with dependencies involving multi-clouds.

• Extensive experience in building data platforms using AWS S3, Athena, Glue, and AWS Redshift,

• Built data lake on AWS cloud and transformed data in Snowflake to build analytical dashboards and data visualizations on Tableau and PowerBI

• Worked on HBase to conduct quick lookups such as updates, inserts and deletes in Hadoop.

• Experience in infrastructure development and Operations involving AWS Cloud Services, EC2, EBS, VPC, RDS, Auto scaling, Cloud Watch, and SNS

• Expert-level proficiency in Apache Druid, utilizing its capabilities for real-time analytics and high-performance querying on large-scale datasets.

• Integrated Informatica Cloud processes with third-party tools like PagerDuty and Five9 Reports

• Working knowledge in Spark architecture with Python/Scala scripts

• Worked on GCP services such as Big Query, Cloud Storage, Cloud Dataflow, Cloud Pub/Sub and Cloud SQL

• Migrated Teradata database to Amazon Redshift using AWS SCT data extraction agents and created data pipelines to load the legacy files, CSV & Hadoop files.

• Supported business requests, developed stored procedures and triggers, and used Quest tools like TOAD.

• Proficient in Big Data Technologies, Advanced Analytics, Data Engineering, Data Warehouse Concepts, Cloud Platforms & Business Intelligence using major structured and unstructured database & distributed systems such as Microsoft BI stack, Oracle & Big Data

• Experience in installation, configuration, support and maintenance of Cloudera

• Designed and developed BI products using Microsoft BI stack (SSIS and SSAS) and Tableau

• Worked on data models and loading unstructured data using HBase, Dynamo DB and Cassandra

• Experience in working with Big Data technologies such as Hadoop, MapReduce jobs, HDFS, Apache Spark, Hive, Pig, Sqoop, Flume, Kafka.

• Worked on Teradata DBA activities like database upgrade, creating user database, space management, TASM, role creations & backup of Teradata objects, Performance DBA like query tuning and rewrite.

• Worked in Teradata developments & TDE (Teradata Decision Experts) applications, including developing scripts like Bteq, Fastload, Multiload, FastExport, Stored procedures, Functions and Triggers

• Worked on Cloudera, Hortonworks and MapR distributions

• Demonstrated advanced proficiency in leveraging Five Tran Transformations to process and manipulate raw data from source systems into structured and usable formats for analysis.

• Loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza Sqoop

• Implemented multiple algorithms for analytics using Cassandra with Spark and Scala

• Worked on processing real-time data using Kafka 0.10.1 producers and stream processors, and implemented stream process using Kinesis and landed data into Data Lake S3

• Specified nodes and analysed data on Amazon RedShift clusters on AWS.

• Extensive experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2 instances, and RDS.

• Hands on experience with Google Cloud Services – GCP, BigQuery, GCS Bucket and G-Cloud function

• Proficient in Data Warehousing principles (Fact Tables, Dimensional Tables, Dimensional Data Modeling - Star Schema and Snowflake Schema)

• Utilized Apache Superset's filter and parameter features to create dynamic and interactive dashboards that allow users to tailor their analysis to specific criteria.

• Built Data Lake on AWS cloud and placed transformed data in Snowflake to build dashboards on Tableau. Technical Skills:

Project Methods: Agile, Scrum, Software development lifecycle, Model development lifecycle ETL Data Pipelines: Apache Airflow, Hive, Talend, Informatica Intelligent Cloud Services–IICS, Informatica 9.6 and 10.1, Informatica Power Exchange Big Data: Hadoop, Spark, HDFS, Flume Hadoop, Hive, MapReduce, Pig, Kafka, Sqoop, Oozie, Hadoop, Spark, Spark Streaming, Hive, and Kafka

Databases: Cassandra, HBase, DynamoDB, MongoDB, MS Access, SQL, MySQL, Oracle, PL/SQL, SQL, RDBMS, AWS Redshift, Amazon RDS, Teradata.

Data warehousing: Redshift, Snowflake, Google Big Query Programming & Scripting: Python, Scala, Java, Bash SQL, PL/SQL, C++,ER/ Studio Visualization: MS Power Point, Power BI, Tableau, Quick Sight, QlikView Libraries: Pandas, NumPy, Spicy, Scikit-learn, spaCy, matplotlib, TensorFlow, Keras Data modeling: ER/Dimensional Modeling, Star Schema, Snowflake Schema Cloud platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP) Version control: Git, SVN

Financial analysis: Risk assessment, fraud detection, anti-money laundering (AML) Domains: Banking, Telecom and Retail CPG

EDUCATION:

• Bachelor’s in computer science engineering, JNTU, India.

• Master’s in computer networking, University of Sunderland, UK. CERTIFICATIONS:

• GCP Associate Cloud Engineer - Certification ID: QeNqUH Professional Experience

United Airlines/Houston, TX Mar 2023-Till

Sr. Data engineer

Responsibilities:

• Worked on converting loads from Teradata to AWS.

• Led the onboarding process of various applications onto AWS Redshift, facilitating a smooth transition from Teradata.

• Helped individual users across the organization to migrate from Teradata to Redshift.

• Collaborated with end users to resolve data and performance-related issues during the onboarding of new users.

• Performing data analysis by comparing the data between Teradata and Redshift once data discrepancies are identified moving to next level analysis for checking on load runtimes and for few cases checking with source team as well.

• Implemented performance optimization strategies, enhancing query performance by optimizing sort keys and distribution keys.

• Converted complex Teradata SQL queries to redshift compatible SQL’s.

• Developed Airflow pipelines to efficiently load data from multiple sources into Redshift and monitored job schedules.

• Successfully migrated data from Teradata to AWS, improving data accessibility and cost efficiency.

• Guided L1 offshore team for onboarding and guided them through confluence pages for offshore hours support.

• Worked on migrating the reports and dashboards from OBIEE to Power BI.

• Assisted multiple users from data viz team to get connected to Redshift from PowerBI, Power apps, excel, Spotfire, Python.

• Instigated a Hadoop Cloudera distributions cluster using AWS EC2.

• Designed cloud-based architecture for scalability to handle growing data volumes.

• Perform thorough testing of the migrated data and ETL processes to ensure data accuracy and completeness.

• Create pipelines for building, testing, and deploying data engineering code and infrastructure as code (IaC).

• Utilized AWS Redshift to store Terabytes of data on the Cloud.

• Used Spark SQL and Data Frames API to load structured and semi-structured data into Spark Clusters

• SQL, Azure Synapse for analytics and MS Power BI for reporting.

• Leverage PySpark's capabilities for data manipulation, aggregation, and filtering to prepare data for further processing.

• Implemented AWS Fully Managed Kafka streaming to send data streams from the company APIs to Spark cluster in AWS Databricks.

• Implement data ingestion from various sources into the AWS S3 data lake using AWS Lambda functions.

• Utilize PySpark to extract and transform data from different file formats (CSV, JSON, Parquet) stored in S3.

• Joined, manipulated, and drew actionable insights from large data sources using Python and SQL.

• Implement data enrichment pipelines using PySpark to combine data from Snowflake with additional details from MongoDB.

• Used Spark-SQL and Hive Query Language (HQL) for obtaining client insights

• Develop PySpark ETL pipelines to cleanse, transform, and enrich the raw data.

• Integrate with MongoDB to retrieve relevant information and enrich the existing data.

• Finalized the data pipeline using DynamoDB as a NoSQL storage option. Google, Austin, TX Mar2022-Feb 2023

GCP Data Engineer

Responsibilities:

• Worked on data migration from an RDBMS to a NoSQL database and developed a schema for data deployed in varied data systems.

• Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and streaming sources

• Experienced in Maintaining the Hadoop cluster on GCP using the Google cloud storage, Big Query and Dataproc

• Led the migration/replicating from SQL Server to BigQuery using DataStream, GCS buckets Pub/sub and DataFlow.

• Used Apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.

• Implemented both batch and real-time ingestion workflows in Apache Druid, enabling the ingestion of historical and streaming data for comprehensive analytics.

• Extracted and transformed data from structured and semi-structured formats like JSON, CSV, and Parquet

• Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query

• Designed GCP Cloud composer DAG to load data from on-prem csv files to GCP Big Query Tables Scheduled DAG to load incremental mode.

• Using G-cloud function with Python to load data into big query for on arrival CSV files in GCS bucket.

• Developed consumer intelligence reports based on market research, data analytics, and social media.

• data encryption to ensure data protection and compliance.

• Integrated Informatica Cloud (IICS) with PagerDuty for monitoring scheduled jobs.

• Transformed data and developed metrics using Spark SQL to be displayed on dashboards.

• Stage the API or Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional services.

• Worked closely with data analysts, scientists, and business stakeholders to gather requirements and deliver effective data solutions aligned with business needs.

• Analysed the SQL scripts and designed the solution to implement using PySpark · Used cloud shell SDK in GCP to configure the services Data Proc, Storage, and Big Query

• Implemented data quality checks and validation processes within PySpark pipelines to ensure accurate and consistent data.

• Build data pipelines using airflow in GCP for ETL related jobs using different airflow operators.

• Created Alerts which were sent to MS teams for successful completion of informatica jobs using MS Power Automate

• Implemented custom visualizations and charts in Apache Superset using its extensible visualization library, catering to specific business requirements and data insights.

• Configured Snow pipe to pull the data from Google Cloud buckets into Snowflakes table.

• Installed Kafka manager for consumer lags and for monitoring Kafka metrics also, this has been used for adding topics, Partitions, etc.

• Designed solutions that scaled to handle growing data volumes while maintaining optimal performance and resource utilization.

• Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using python-MySQL connector and MySQL dB package to retrieve information.

• Developed Pre-processing job using Spark Data frames to flatten JSON documents to flat file

• Worked on data warehouse migration to GCP and modernize analytical capabilities.

• Perform Data Cleaning, features scaling, and features engineering using pandas and NumPy packages in Python.

• Leveraged Five Tran Transformations to enrich data sets by aggregating, summarizing, and calculating key metrics, providing valuable insights for business intelligence and reporting.

• Integrated PySpark applications with GCP for scalable and distributed data processing

• Tuned PySpark jobs for optimal performance by configuring partitioning, caching, and utilizing Data Frame optimization techniques.

• Exported the analysed data to Teradata using Sqoop for visualization and to generate reports for the BI team

· Migrated the computational code in HQL to PySpark

• Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running ad hoc queries, which allowed for a more reliable and faster reporting interface, giving sub-second query responses for basic queries.

• Utilized PySpark’ s window functions and advanced aggregation techniques to perform time-series analysis and sliding window computations.

• Executed complex join operations using PySpark to combine data from multiple sources and create unified datasets..

• Assess the existing on-premises data infrastructure, including data sources, databases, and ETL processes.

• Version controls the ETL code and configurations using tools like Git.

• Created automated Python scripts to convert data from different sources and to generate ETL pipelines.

• Converted SQL queries into Spark transformations using Spark RDDs, Python, and Scala.

• implemented in-memory data computation to generate the output response.

• Worked/Task tracking done in Jira – Kanban Jira board and Service Now for Agile methodology. Comcast, Denver, CO Mar2019- Feb 2022

Sr. Database developer /Data Analyst

Responsibilities:

• Involved in creating detailed LLD (Low Level Design) document which includes all Business requirements and technical specifications.

• Worked for designing Logical and Physical data models for Master Data and Transactional Data.

• Prepared the Mapping document which details out the mapping of source to the target data model with application of business rules.

• Support and Maintenance of Monthly Metric Reports for LOB Analysis.

• Designed and developed audit control process for PL/SQL jobs to keep track of all the job execution along with balance and reconciliation reports.

• Worked on developing ETL workflows on the data obtained using Python for processing it in HDFS and HBase using Flume.

• Involved in ETL Design and development sessions.

• Developed PL/SQL programs to load and validate the data into staging environment

• Tuning Legacy operations and Automated the Analysis reports.

• Developed Tableau Reports to display monthly analysis on each LOB requirements. Designed and Implemented the ETL for the report flow, created Dynamic Data Types, Implemented Data Aggregation in Tableau.

• Involved in Performance considerations in Tableau vis, calculation, and Dashboard design.

• Involved in creating OBIEE dashboards and migrating OBIEE Dashboard reports to Tableau.

• Automated Daily outlook email reports to Tableau display.

• Experienced in creating the sites and access control in Tableau Server.

• Used AWS tools like S3 and Athena to query large amounts of data stored in S3 buckets.

• Developed and maintained ETL pipelines using Apache Spark and Python on Google Cloud Platform (GCP) for large-scale data processing and analysis.

• Designed and implemented efficient data models and schema designs using Big Query for optimized data querying and storage.

• Outlined a scalable and efficient data architecture that integrates Snowflake, Oracle, GCP services, and other relevant components.

• Recognized data flow patterns, considering data extraction, transformation, loading, and analytics processing.

• Develop ETL pipelines to extract data from Oracle databases using efficient methods like change data capture (CDC) or scheduled batch processing.

• Worked with various data sources including structured, semi-structured and unstructured data to develop data integration solutions on GCP.

• Implemented real-time data processing using Spark, GCP Cloud Composer, and Google Dataflow for streaming data processing and analysis.

• Built data ingestion pipelines (Snowflake staging) using disparate sources and other data formats to enable real-time data processing and analysis.

• Integrated data pipelines with various data visualization and BI tools such as Tableau and Looker for dashboard and report generation.

• Mentored junior data engineers and provided technical guidance on best practices for ETL data pipelines, Snowflake, Snow pipes, and JSON.

• Implemented data security measures such as access control policies and data encryption to ensure data protection and compliance with regulatory requirements.

• Collaborated with cross-functional teams including data scientists and analysts to deliver data-driven solutions.

• Worked with stakeholders to understand business requirements and translate them into technical data engineering solutions.

American Electric Power, Columbus,OH Apr-2017-2019 BI Developer/Data Analyst

Responsibilities:

• Involved in planning and designing database schema and provided documentation.

• Based on business requirement, developed the Complex SQL queries with Joins and T-SQL, PL/SQL Stored procedure, Views, Trigger to implement the business rules and transformations.

• Extensively worked in ETL and data integration in developing ETL mappings and scripts using SSIS, Worked on Data transfer from a Text file to SQL Server by using bulk insert task in SSIS.

• Developed database objects including tables, clusters, Indexes, views, sequences, packages, triggers, and procedures to troubleshoot any database problems.

• Involved in Performance considerations in Tableau VIS, calculation, and Dashboard design.

• Automated Daily outlook email reports to Tableau display. Created grant charts and box charts for data display.

• Developed PL/SQL programs to load and validate the data into staging environment

• Tuning Legacy operations and automating the Analysis reports.

• Experienced in working with cross-functional technical teams.

• Experienced in creating AD queue with single user consumption and multiuser consumption.

• Involved in creating Word Cloud bubble charts in tableau to track the trends in a project.

• Used SQL loader and external tables in loading Bulk files Daily/Monthly Frequency.

• Involved in writing External table scripts with complex shell scripting and data scrubbing based on the Business requirements.

• Worked in importing and cleansing of data from various sources like Main Frame, Sybase, flat files, Excels to and from SQL Server with high volume data.

• Built interactive Tableau dashboards utilizing Parameters, calculated fields, Table Calculations, User Filters to handle views more efficiently.

• Developed various reports using best practices and different visualizations like Bars, Lines and Pies, Maps, Gantt Charts, Bubble Charts, Histograms, Bullet Charts and Highlight tables. Huntington National Bank, Columbus, OH Feb 2015- Feb 2017 BI Developer

Responsibilities:

• Develop SSIS, PL/SQL Packages based on the requirements. Gathering requirements from the Business Users for SSIS, PL/SQL Packages. Getting the data from different sources to the Staging area.

• Scheduling work sessions to help non-technical team members to understand the SQL queries used to create and run various PL/SQL, SSIS Packages used by the team.

• Developed various complex stored procedures, packages, interfaces, and triggers in PL/SQL.

• Created various SQL and PL/SQL scripts for verification of the required functionalities.

• Developed Database Triggers to enforce Data integrity and additional Referential Integrity.

• Developed complex SQL queries for data retrieval from various database objects including tables and views.

• Worked with DBA extensively to create External tables, Loading schedules, Redo & Undo logging. Worked on various backend Procedures and Functions using PL/SQL.

• Performance tuning of the existing packages. Changing the packages according to the new requirements so that they can fetch data from new tables under new databases.

• Managing the daily, weekly, and monthly packages for the team.

• Working on complex stored procedures and performance tuning the procedures which are slow.

• Creating documents that helps the business team to understand various SSIS Packages which they use and creating worksheets that they can use for running the packages on a monthly basis.

• Worked in Tableau environment to create dashboards like Yearly, monthly reports using Tableau desktop & publish them to server.

• Converted Excel Reports to Tableau Dashboard with High Visualization and Good Flexibility. United Nations, NYC Jun 2012- Dec 2014

Database Analyst/Report Developer

Responsibilities:

• Generated reports using MS SQL Server Reporting Services 2005/2008, UNIX Shell scripts from OLTP and OLAP data sources.

• Writing complex SQL Queries used by SSRS Reports.

• Created new logical and physical design of database to fit new business requirement and implemented new design into SQL Server 2005.

• Designed and developed audit control process for SQL jobs to keep track of all the job execution along with balance and reconciliation reports.

• Working with database connections, SQL joins, cardinalities, loops, aliases, views, aggregate

• conditions, parsing of objects and hierarchies.

• Involved actively with the PM role to understand and assign tasks to offshore team and get the daily updates, solely responsible for representing the offshore team.

• Performed SQL tuning using Explain Plan, Hints, and indexes. Responsible for performing code reviews.

• Created SSIS packages to load data into temporary staging tables. Utilized SQL*Loader to load flat files into database tables.

• Involved in table and index partitioning for performance improvement and manageability and identified, tested, and resolved database performance issues (monitoring and tuning) to ensure database optimization.

• Involved in creating the test Cases and validation scripts in SQL to test the migration of data from one data warehouse and OLAP to another.

• Scheduling work sessions to help non-technical team members to understand the PL/SQL queries used to create SSRS Reports and to run various SSIS Packages used by the team. Prudential Financial, Scranton, PA May 2011 – May 2012 BI Developer

Responsibilities:

• Developed stored procedures to transform the data from enterprise data warehouse to SQL server and load data to fact tables.

• Transferred data from various sources like MS Excel, MS Access, and SQL Server using SSIS and then created reports using this data using SSRS. Created new and converted several Crystal reports into detailed SSRS reports. Debugged reports for the business end users.

• Involved in table and index partitioning for performance improvement and manageability and identified, tested, and resolved database performance issues (monitoring and tuning) to ensure database optimization.

• Performed T-SQL script tuning and optimization of queries for reports that take longer execution time using MS SQL Profiler, index tuning wizard and SQL Query Analyzer in MS SQL Server 2005.

• Generated reports using SSRS that could be used to send information to different primary vendors, clients, and managers.

• Involved in documentation for Reports, DTS and SSIS packages.

• Designed and implemented stored procedures and triggers for automating tasks.

• Developed trend indicating reports which monitor changes in different categories and record historical analysis and displays them in a single dashboard.

• Developed graphs and charts to display telecommunication data categorized by hierarchy and organization.

• Developed Parameterized, Drill Through, Rolling, Linked and Sub-reports with specific KPIs. Developed indicators for rolling reports and drill down reports.

• Work simultaneously with slowly changing dimensions and real time data to report accurately.

• Generated Reports using Global Variables, Expressions and Functions for the reports and created stylish Report Layouts and Deployed the Reports using SSRS 2005/2008.

• Wrote Complex Stored Procedures, Triggers, Views and Queries. Created indexes, Constraints, and rules on database objects.



Contact this candidate