Data Engineer Senior

Location:

Chicago, IL

Posted:

February 29, 2024

Contact this candidate

Resume:

Shiva Sai Bandari

Senior Data Engineer

Phone: +1-872-***-****

Email: ***************@*****.***

LinkedIn: http://linkedin.com/in/shiva-sai-bandari-

Experience Summary:

With a decade of IT experience, I offer 5 years of specialized expertise as an AWS Data Engineer, emphasizing big data solutions in cloud environments. Proficient in Informatica Power Center, OLTP, and OLAP, I excel in Hadoop and Agile methodologies, ensuring robust data solutions.

Proficient in managing AWS cloud databases (Redshift/RDS) using PySpark for effective data warehousing.

Exceptional proficiency in AWS (Amazon Web Services), encompassing S3, Amazon RDS, IAM, EMR, Redshift, Apache Spark RDD concepts. Additionally, I am adept at developing Logical Data Architecture in alignment with Enterprise Architecture principles.

Led the development of a data pipeline on Amazon AWS, orchestrating the extraction of data from Date source and storage in Amazon EMR

Integrated Snowflake cloud data warehouse and AWS S3 bucket, connecting multiple source systems and facilitating the loading of nested JSON formatted data into Snowflake tables.

Hands-on experience with AWS EC2, involving server configuration for Auto Scaling and Elastic Load Balancing.

Executed the migration of an on-premises application to AWS, employing AWS services such as EC2 and S3 for processing and storing small data sets. Proficient in maintaining Hadoop clusters on AWS EMR.

Engaged in the end-to-end Big Data flow using MapReduce, encompassing data ingestion from upstream systems to HDFS and subsequent processing within Hadoop clusters.

Extensive expertise in designing and developing ETL solutions using Ab initio 3.1.6, Informatica 9.1/9.5, and DataStage.

Experience in Informatica Meta Data Management (MDM), Power Exchange, Change Data Capture, Enterprise Data Quality, Informatica Proactive Monitoring (IPM)-using rule point.

Engineered automation and SFTP processes through the creation of SHELL scripts and Python.

Proficiently employed ETL techniques to efficiently load data from diverse sources, including Oracle, MS SQL Server 2008/2012, Flat Files, XML files, and Teradata.

Engaged in projects using Agile methodologies, adapting to diverse project management approaches.

Proficient in data profiling, employing Frequency Reports in IIR (Informatica Identity Resolution) - both Auto and Custom modes.

Expertise in working with diverse data sources, including Oracle 11g/10g/9i/8x, SQL Server 2008/2005, DB2 8.0/7.0, Teradata, flat files, XML, COBOL, and Mainframe systems.

Developed and optimized SQL scripts in PL/SQL to extract, transform, and load data from Oracle databases to AWS data storage solutions like S3, Redshift, and RDS.

Solid understanding of database and data warehousing concepts, encompassing both OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing).

Designed, rigorously tested, and successfully deployed plans using Informatica Data Quality (IDQ).

Proven expertise in handling large-volume data integration and designing data warehouse solutions exceeding 10 TB.

Applied practical understanding of Star Schema and Snowflake Schema Methodology using Data Modeling tool Erwin 4.0/4.2.

Designed and implemented Partitioned and Bucketed Hive tables in Parquet file format with Snappy compression, adhering to Star Schema principles.

Extensive experience with Python, Pig, Sqoop, Oozie, Hadoop Streaming, and Hive in the context of data processing and analytics.

In-depth understanding of the Hadoop file distribution system, demonstrating a strong grasp of distributed storage and processing concepts.

Analyze or transform stored data by writing MapReduce or Pig jobs based on business requirements.

Extracted data from Oracle and SQL Server then used Teradata for data warehousing.

Implemented Power Exchange to seamlessly integrate diverse sources, including Mainframe MVS, VSAM, GDG, DB2, and XML files.

Proficient in leveraging Microsoft Power BI to design interactive dashboards and generate insightful reports, utilizing a wide range of visualization tools and features to transform complex data into actionable insights. Skilled in Power BI Desktop and Service for data modeling, visualization customization, and report optimization, enabling effective communication of data-driven insights to stakeholders.

Education:

Chicago State University (Aug 2012-Dec 2013)

-Master’s in computer science

Malla Reddy Engineering College (Aug 2008-May 2012)

-Bachelor’s in mechanical engineering

Certifications:

AWS Certified Solutions Architect - Associate

Technical Skills:

Cloud : AWS (Amazon EC2, Amazon VPC, Amazon S3, Lambda, IAM, Redshift, Amazon RDS, Amazon Kinesis, Athena, EMR, Glue).

Data Warehousing ETL : Informatica Power Center 9.1/9.5, Power Exchange, SSIS, Redshift.

RDBMS : Oracle 11g/9i/8i, MS SQL Server 2008/2005, Teradata 12/13/14/15

Database Languages : SQL, PL/SQL, T-SQL, N-SQL

Office Package : MS-Office 2007/XP/2003/2000

Scripting : Unix shell, Python.

Programming Languages : Python, Scala.

Scheduler Tools : Autosys, Airflow.

Reporting tools : PowerBI, Tableau (Exposure).

Database Utilities : SQL Server Management Studio (SSMS), SQL Loader.

Big Data Technologies : Hive QL, Pig Latin, Hadoop Basics, Oozie.

PROJECT DETAILS:

Client: ATT, Dallas. Nov 2022 to Present

Role: Senior AWS Data Engineer

Use Case: As a Senior AWS Data Engineer at ATT, I optimized telecommunications network performance by analyzing user behavior and ensuring data quality. Leveraging AWS services like EMR, S3, Glue, and Athena, I ingested and transformed data seamlessly for integration with Redshift. Utilizing Spark applications in Databricks, I extracted insights from various file formats, uncovering customer usage patterns. Additionally, I employed Redis for efficient data storage and manipulation, enhancing analytics. Through these efforts, we optimized network performance, analyzed user behavior, and ensured data quality for ATT.

Responsibilities:

Ingested data into the data lake (S3) and employed AWS Glue to expose and transform the data seamlessly for integration with Redshift.

Configured EMR clusters for efficient data ingestion and employed dbt (data build tool) to perform transformative processes on data within Redshift.

Implemented job scheduling in Airflow to automate the seamless ingestion process into the data lake.

Orchestrated multiple ETL jobs using AWS step functions and lambda, also used AWS Glue for loading and preparing data Analytics for customers.

Collaborated with AWS services like RDS, S3, and Redshift in conjunction with PL/SQL for data integration and analytics purposes.

Implemented a serverless architecture using API gateway, lambda, and deployed AWS lambda code from S3 bucket.

Developed a Lambda deployment function and configured it to receive events from an S3 Bucket.

Involved in writing Java and Node.js API for Amazon Lambda to manage some of the AWS services.

Create Glue ETLs using S3, Redshift, NoSQL (DynamoDB & MongoDB), Lambda, SecretManager and other AWS services (Python, SQL).

Utilized AWS Lambda to efficiently run servers without manual management, implementing triggers for code execution through S3 and AWS

Engineered data transition programs from DynamoDB to AWS Redshift through an ETL process using AWS Lambda.

Implemented the AWS cloud computing platform by using RDS, Python, Dynamo DB, S3, and Redshift.

Utilized AWS SDKs in Python to integrate with internal systems and services, streamlining data processing workflows and enhancing automation capabilities.

Utilized AWS Lake Formation to design and implement scalable data lakes for storing, cataloging, and securing structured and unstructured data.

Utilized Redis Sorted Sets and Hashes to efficiently store and manipulate structured data in memory, enabling fast retrieval and aggregation of data for analytics and reporting purposes.

Worked in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.

Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC, and Parquet formats.

Engineered automated data archiving workflows by integrating AWS Kinesis with data lake solutions such as Amazon S3 and AWS Glue, enabling seamless storage and analysis of real-time streaming data alongside batch data sources.

Developed PySpark scripts, UDFs involving both Data frames and RDD's using SparkSQL for aggregation, queries, and writing data back into the OLTP system directly or through Sqoop.

Administer and manage Informatica PowerCenter environments, ensuring optimal performance and reliability.

Ensured data quality and integrity using PL/SQL-based data validation checks in AWS environments.

Handle metadata management, security, and configuration of Informatica components.

Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persist the data in HDFS.

Migrated on-premises database structure data to AWS Redshift data warehouse.

Integrated Snowflake cloud data warehouse and AWS S3 bucket, connecting multiple source systems and facilitating the loading of nested JSON formatted data into Snowflake tables.

Implemented data quality checks and validation rules in AWS Lake Formation to maintain data integrity and reliability within the data lake.

Configured and managed AWS VPN connections to securely connect on-premises networks to AWS resources, ensuring secure data transmission and adherence to security best practices.

Developed custom SQS message processing logic using AWS Lambda functions to implement business-specific data processing tasks, enhancing flexibility and extensibility of data workflows.

Created and managed IAM users, groups, and roles, assigning appropriate permissions based on job roles and responsibilities.

Developed ETL pipelines utilizing Spark and Hive to execute diverse business-specific transformations, considering Snowflake Schema methodology.

Proficient in deploying applications on Kubernetes clusters, leveraging its container orchestration capabilities.

Implemented workload management in Redshift to prioritize basic dashboard queries over more complex longer-running ad-hoc queries.

Designed and developed ETL jobs to extract data from various data applications and load it into the data mart in Redshift.

Developed MapReduce jobs for extracting, transforming, and loading data between HDFS and relational databases using Sqoop.

Developed and optimized SQL scripts in PL/SQL to extract, transform, and load data from Oracle databases to AWS data storage solutions like S3, Redshift, and RDS.

Utilized Infrastructure as Code (IaC) tools like AWS CloudFormation or Terraform to provision and manage Kubernetes infrastructure on AWS in a declarative manner.

Implemented MapReduce programs to handle large volumes of structured and semi-structured data, ensuring efficient parallel processing.

Ensured data integrity and accuracy by validating and cleansing input data prior to loading using SQL Loader features and transformations.

Responsible for loading and transforming huge sets of structured, semi-structured, and unstructured data.

Designed and developed end-to-end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.

Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.

Environment: AWS S3, Redshift, EMR, Airflow, Spark, Lambda, API Gateway, DynamoDB, MongoDB, RDS, Jenkins, Git, Docker, MapReduce, Kubernetes, Databricks, HDFS, Elastic Map Reduce cluster, Snowflake, Cloud Watch, Hive, PySpark, SQL, Informatica, PL/SQL, Infrastructure as Code (IaC), SQL Loader, IAM, AWS VPN, SDK, Lake Formation, Kinesis, Redis, SQS.

Client: MasterCard, Illinois. Dec 2018 to Oct 2022

Role: Senior AWS Data Engineer

Use Case: As a Senior AWS Data Engineer at MasterCard, I led a real-time fraud detection project. Utilizing AWS Kinesis, I integrated live transaction data with S3 and Glue. By implementing Amazon SNS for alerts, we responded swiftly to suspicious activities. My role involved translating requirements into scalable solutions using AWS services and SDKs. This initiative optimized fraud detection, enhancing MasterCard's security measures.

Responsibilities:

Extensively utilized AWS services, including EMR, S3, Glue Megastore, and Athena, in the construction of robust data applications.

Hands-on experience in creating and managing PostgreSQL instances through Amazon RDS.

Leveraged Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and utilized Simple Storage Service (S3) as the primary storage mechanism.

Proficient in utilizing AWS utilities, including EMR, S3, and CloudWatch, for executing and monitoring Hadoop and Spark jobs on the AWS platform.

Engaged in the end-to-end Big Data flow using PL/SQL, involving data ingestion, transformation, and loading processes on AWS cloud platforms.

Played a significant role in the extensive migration and rewriting of existing Oozie jobs to AWS Simple Workflow, enhancing workflow management efficiency.

Integrated AWS Kinesis with data lake solutions such as S3 and Glue for storing and analyzing real-time streaming data alongside batch data sources.

Designed and implemented event-driven architectures using Amazon SNS for real-time alerts and notifications.

Worked closely with stakeholders to understand business requirements and translate them into technical solutions, leveraging AWS services and SDKs to deliver robust and scalable data processing solutions.

Led the design, implementation, and architecture of expansive data intelligence solutions on large-scale big data platforms.

Conducted analysis on large and critical datasets utilizing HDFS, HBase, Hive, HQL, Pig, Sqoop, and Zookeeper.

Executed multiple Proof of Concepts (POCs) using Spark and Scala, deployed on the Yarn Cluster. Conducted performance comparisons between Spark, Hive, and SQL.

Integrated CloudFormation with AWS Step Functions to orchestrate complex data processing workflows, enabling seamless coordination and execution of ETL jobs and data transformations.

Hands-on experience in fetching live stream data from UDB into HBase tables using PySpark streaming and Apache Kafka.

Implemented fault-tolerant data processing workflows in PySpark, utilizing resilient distributed datasets (RDDs) and lineage tracking mechanisms to ensure data reliability and integrity.

Developed programs in Spark using Python (PySpark) packages, focusing on performance tuning, optimization, and data quality validations.

Contributed to RDD (Resilient Distributed Datasets) Architecture, implementing Spark operations on RDD, and optimizing transformations and actions within the Spark framework.

Engaged in the end-to-end Big Data flow of the application, encompassing data ingestion from upstream to HDFS, and subsequent processing and analysis of the data within HDFS, considering Star Schema requirements.

Utilized Zookeeper for distributed coordination and synchronization in Hadoop environments.

Implemented and maintained Zookeeper configurations to manage distributed systems and maintain consensus among multiple nodes.

Configured Autosys job schedules to trigger AWS Glue jobs for seamless data transformation and integration with Redshift, enhancing data availability and accessibility for analytics purposes.

Utilized SSMS to perform various database administration tasks, including database creation, configuration, and maintenance.

Implemented Kafka for real-time data streaming, enhancing the architecture with scalable and fault-tolerant event-driven communication.

Executed SQL queries in dimensional and relational data warehouses, conducting comprehensive Data Analysis and Profiling using intricate SQL queries across various systems.

Developed a PySpark framework to efficiently transfer data from DB2 to Amazon S3, optimizing the extraction and storage processes.

Implemented backup and recovery strategies for AWS Aurora databases to ensure data resilience and disaster recovery readiness.

Engineered a framework for the conversion of existing PowerCenter mappings into PySpark (Python and Spark) jobs, ensuring seamless transition and leveraging the capabilities of Python and Spark technologies.

Implemented efficient and scalable data transformations on the ingested data leveraging the Spark framework.

Developed various custom UDF’s in Spark for performing transformations on date fields, complex string columns, and encrypting PI fields, etc.

Implemented Kafka for real-time data streaming, enhancing the architecture with scalable and fault-tolerant event-driven communication.

Executed SQL queries in dimensional and relational data warehouses, conducting comprehensive Data Analysis and Profiling using intricate SQL queries across diverse systems.

Implemented VPC peering connections and VPN gateways to establish secure connectivity between VPCs, on-premises data centers, and external networks, facilitating hybrid cloud deployments and cross-region communication.

Implemented workload management in Redshift to prioritize basic dashboard queries over more complex, longer-running ad hoc queries, maintaining Star Schema efficiency.

Engineered ETL pipelines utilizing Spark and Hive to execute diverse business-specific transformations.

Efficiently loaded processed data into Redshift tables, facilitating downstream ETL and Reporting teams to seamlessly consume the processed data.

Proactively engaged in data modeling discussions and adeptly troubleshooted and resolved data processing issues.

Applied MapReduce for implementing custom data transformations and aggregations, supporting the development of input adapters for data dumps from FTP Servers using Apache Spark.

Played a key role in the comprehensive migration of our existing on-premises data pipelines to AWS cloud, focusing on improved scalability and infrastructure maintenance.

Engaged in projects using Agile/Scrum methodologies, adapting to diverse project management approaches.

Participated in multiple requirements gathering sessions with source system teams and business users.

Contributed to the development of input adapters for data dumps from FTP Servers using Apache Spark.

Automated the data pipeline to conduct ETL processes for all datasets, including both full loads and incremental loads of data.

Implemented Oozie and Oozie Coordinators to automate and schedule our data pipelines effectively.

Conducted analysis on large and critical datasets utilizing HDFS, HBase, Hive, HQL, Pig, Sqoop, and Zookeeper.

Collaborated in the design and preparation of Low-Level Design/Source-Target job specifications, ensuring detailed and comprehensive documentation, adhering to Snowflake Schema principles.

Developed and maintained UNIX shell scripts for the automation of ETL jobs, ensuring efficient data processing and validation.

Collaborated on version control and code management using GitHub, ensuring seamless collaboration among development teams.

Implemented Git workflows, pull requests, and code reviews on GitHub for efficient and organized software development.

Utilized Tableau to create customized, interactive reports, worksheets, and dashboards.

Developed and maintained UNIX shell scripts for the automation of ETL jobs, ensuring efficient data processing and validation.

Collaborated on version control and code management using GitHub, ensuring seamless collaboration among development teams.

Implemented Git workflows, pull requests, and code reviews on GitHub for efficient and organized software development.

Engaged in fetching live stream data from UDB into HBase tables using PySpark streaming and Apache Kafka.

Worked on extracting, transforming, and loading data from UDB into the data lake and other target systems.

Environment: EMR, S3, Glue Megastore, Athena, EC2, Kinesis, CloudWatch, Redshift, RDS, PostgreSQL on Amazon RDS, HDFS, HBase, Hive, Pig, Sqoop, Zookeeper, Oozie, Spark, Scala, Yarn Cluster, Kafka, PySpark, Apache Kafka, UDB, SQL, Redshift, MapReduce,SDK, Tableau, Snowflake, UNIX Shell Scripting, GitHub, Git workflows, Pull requests, GitHub, SSMS, Autosys, SNS, VPC, Aurora, Cloud Formation.

Client: CVS Pharmacy, Woonsocket, RI. Aug 2016 to Nov 2018

Role: Big Data Engineer.

Use Case: During my tenure as a Big Data Engineer at CVS Pharmacy, I led the development of prototypes and executed intricate big data projects. My role involved managing large datasets across various platforms, using tools like Sqoop to seamlessly transfer data between HDFS and RDBMS. I also developed Python scripts to interface with the Cassandra Rest API, enabling smooth data transformation and loading into Hive. Spark Streaming APIs were utilized for real-time data processing, particularly in building a common learner data model from Kafka. Collaborating closely with MDM systems teams, I contributed to technical aspects and report generation. ETL processes were streamlined using FLUME and SQOOP, and algorithms were optimized in Hadoop using Spark. Additionally, I executed MapReduce jobs in Pig and Hive, managed structured data efficiently using Cassandra, and leveraged Tableau for generating daily data reports from Hive.

Responsibilities:

Organized the development of prototypes for selected solutions and implemented complex big data projects. Focused on collecting, parsing, managing, analyzing, and visualizing large datasets across multiple platforms.

Expertise in importing and exporting data using Sqoop between HDFS and RDBMS, ensuring seamless migration in alignment with client requirements.

Implemented a Python script to invoke the Cassandra Rest API, conducted necessary transformations, and loaded the data seamlessly into Hive.

Designed, developed, and maintained data integration programs in a Hadoop and RDBMS environment. Worked with both traditional and non-traditional source systems, as well as RDBMS and NoSQL data stores, to enable data access and analysis.

Employed Spark Streaming APIs for on-the-fly transformations and actions, building a common learner data model sourced from Kafka.

Executed the import of data from diverse sources to the HBase cluster using Kafka Connect. Additionally, contributed to the creation of data models for HBase based on existing data models.

Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.

Collaborated with MDM systems team on technical aspects and report generation. Developed Spark code using Scala and Spark-SQL for accelerated processing and testing, along with executing complex HiveQL queries on Hive tables.

Contributed to the development of ETL processes, loading data from diverse sources to HDFS utilizing FLUME and SQOOP. PL conducted structural modifications through MapReduce and HIVE.

Optimized existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDDs.

Executed multiple MapReduce jobs in Pig and Hive to perform data cleaning and pre-processing tasks.

Engaged in the end-to-end Big Data flow of the application, encompassing data ingestion from upstream to HDFS, and subsequent processing and analysis of the data within HDFS.

Designed and implemented Partitioned and Bucketed Hive tables in Parquet file format with Snappy compression. Successfully loaded data into Parquet Hive tables from Avro Hive tables.

Utilized reporting tools such as Tableau to connect with Hive, facilitating the generation of daily data reports.

Effectively managed structured data designed for scalable deployment across numerous commodity servers, eliminating single points of failure using Cassandra.

Environment: HDFS, Hadoop MapReduce, Pig, Hive, Sqoop, FLUME, Spark, Spark Streaming APIs, Spark Context, Data Frames, Pair RDDs, Scala, Spark-SQL, Kafka, Kafka Connect, HBase, RDBMS, Cassandra, NoSQL data, stores, Tableau, Oozie coordinators, MDM, Avro, Parquet, Snappy, HiveQL, Python, REST API, Hive.

Client: British Petroleum. Houston, TX. Feb 2014 to July 2016

Role: Data Warehouse Developer

Use Case:

As a Data Warehouse Developer at British Petroleum, I optimized toll transaction processing by efficiently loading data into Teradata Data Warehouse using BTEQ, FASTEXPORT, and MULTI LOAD. I designed Slowly Changing Dimensions for data integrity, integrated diverse data sources with Power Exchange, and automated processes with UNIX shell scripting. Utilizing Python, Pig, Sqoop, and Hive in Hadoop clusters, I managed end-to-end Big Data flow, ensuring efficient data processing. Additionally, I developed interactive reports with Power BI, implemented error handling mechanisms, and followed Agile/Scrum methodologies for project management, driving efficiency and accuracy in data processing and analysis.

Responsibilities:

Efficiently loaded data into Teradata Data Warehouse utilizing utilities such as BTEQ, FASTEXPORT, MULTI LOAD, and incorporating compression techniques.

Designed data structures for storage and programmed logic to facilitate seamless data flow across various stages of toll transaction processing.

Developed Slowly Changing Dimensions for Type 1 SCD and Type 2 SCD.

Extensive experience with Python, Pig, Sqoop, Oozie, Hadoop Streaming, and Hive in the context of data processing and analytics.

Applied practical understanding of Star Schema and Snowflake Schema Methodology using Data Modeling tool Erwin.

Engaged in the end-to-end Big Data flow using MapReduce, encompassing data ingestion from upstream systems to HDFS and subsequent processing within Hadoop clusters.

In-depth understanding of the Hadoop File Distribution System (HDFS), demonstrating a strong grasp of distributed storage and processing concepts.

Developed code in accordance with business logic, executing Ab Initio graphs and Informatica mappings.

Integrated various data sources, including Mainframe MVS, VSAM, GDG, DB2, and XML files, seamlessly using Power Exchange technology.

Utilized Informatica PowerExchange for Loading/Retrieving data from mainframe systems.

Engaged in projects using Agile/Scrum methodologies, adapting to diverse project management approaches.

Participated in multiple requirements gathering sessions with source system teams and business users.

Consistently maintained and tracked weekly status calls with the team, ensuring timely delivery of project milestones.

Developed mappings to extract data from SQL Server, Oracle, Flat files, XML files, and loaded it into Data warehouse using the Mapping Designer.

Extensive Unix Shell Scripting knowledge.

Prepared comprehensive Low-Level Design Documents, detailing the intricacies of the design specifications.

Prepared detailed test plans, test cases, test scripts, and test validation datasets for all newly constructed interfaces.

Automated all new ETL jobs through the implementation of UNIX shell scripts, incorporating data validation checks inclusive of business rules and referential integrity checks.

Designed and implemented Partitioned and Bucketed Hive tables in Parquet file format with Snappy compression, adhering to Star Schema principles.

Develop interactive reports and dashboards using Power BI to visualize key insights and KPIs derived from various data sources. Customize visualizations and layouts to meet business requirements.

Implemented error handling mechanisms and balancing logic to ensure robust data processing and maintain data integrity.

Solid understanding of database and data warehousing concepts, encompassing both OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing).

Analyzed or transformed stored data by writing MapReduce or Pig jobs based on business requirements.

Environment: Teradata Data Warehouse (BTEQ, FASTEXPORT, MULTI LOAD), Python, Pig, Sqoop, Oozie, Hadoop Streaming, Hive, Erwin, MapReduce, Ab Initio, Informatica PowerExchange, Mainframe, SQL Server, Oracle, Parquet, UNIX Shell Scripting, OLTP, OLAP, PL/SQL, Power BI.

Contact this candidate