Post Job Free
Sign in

Big Data Sql Database

Location:
Seattle, WA
Posted:
July 30, 2024

Contact this candidate

Resume:

Gautham Tota

Contact : 713-***-****

Mail ID : ad7m6x@r.postjobfree.com

Professional Summary:

Experience with the use of Azure services like Azure SQL Database, Networking, Azure DNS, Azure Active Directory, Azure Blob Storage, Azure Virtual Machines, and administering Azure resources using Azure Portal & Azure CLI.

Hands on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, Data Proc.

Comprehensive working experience in implementing Big Data projects using Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie.

Experience working on Hortonworks / Cloudera / Map R.

Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers, etc.

In-depth understanding of Apache spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages, and task.

Experience working on Spark and Spark Streaming.

Hands-on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala, and Flume.

Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Spark, Kafka, Storm, Zookeeper, and Flume

Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it

Worked on Performance Tuning to Ensure that assigned systems were patched, configured, and optimized for maximum functionality and availability. Implemented solutions that reduced single points of failure and improved system uptime to 99.9% availability

Experience with distributed systems, large-scale non-relational data stores, and multi-terabyte data warehouses.

Firm grip on data modeling, data marts, database performance tuning, and NoSQL map-reduce systems

Experience in managing and reviewing Hadoop log files

Real-time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing, and analysis of data

Experience in setting up Hadoop clusters on cloud platforms like Azure.

Customized the dashboards and done access management and identity in Azure

Worked on Data serialization formats to convert complex objects into sequence bits using Avro, Parquet, JSON, CSV formats.

Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s.

Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning, and buckets.

Proficient in NoSQL databases like HBase.

Experience in importing and exporting data using Sqoop between HDFS and Relational Database Systems.

Built Talend and NiFi integrations for ingestion bi-directional data into different sources.

Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL.

Loaded and transformed large sets of structured, semi-structured, and unstructured data in various formats like text, zip, XML, and JSON.

Experience in designing both time-driven and data-driven automated workflows using Oozie.

Good understanding of Zookeeper for monitoring and managing Hadoop jobs.

Monitoring Map Reduce Jobs and YARN Applications.

Strong Experience in installing and working on NoSQL databases like HBase, Cassandra.

Work experience with cloud infrastructures such as Azure Services Compute, Amazon Web Services (AWS) EC2, and S3.

Used Git for source code and version control management.

Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures.

Proficient in Java, J2EE, JDBC, Collection Framework, JSON, XML, REST, SOAP Web services. Strong understanding of Agile and Waterfall SDLC methodologies.

Experience in working with small and large groups and successful in meeting new technical challenges and finding solutions to meet the needs of the customer.

Have excellent problem solving, proactive thinking, analytical, programming, and communication skills.

Experience working both independently and collaboratively to solve problems and deliver high-quality results in a fast-paced, unstructured environment.

Technical Skills:

Big Data Frameworks

Hadoop (HDFS, MapReduce), Spark, Spark SQL, Spark Streaming, Hive, Impala, Kafka, HBase, Flume, Pig, Sqoop, Oozie, Cassandra.

Bigdata distribution

Cloudera, Hortonworks, Azure

Programming languages

Core Java, Scala, Python, Shell scripting

Operating Systems

Windows, Linux (Ubuntu, Cent OS)

Databases

Oracle, SQL Server, MySQL

Designing Tools

UML, Visio

IDEs

Eclipse, NetBeans

Java Technologies

JSP, JDBC, Servlets, Junit

Web Technologies

XML, HTML, JavaScript, jQuery, JSON

Linux Experience

System Administration Tools, Puppet

Development methodologies

Agile, Waterfall

Logging Tools

Log4j

Application / Web Servers

Apache Tomcat, WebSphere

Messaging Services

ActiveMQ, Kafka, JMS

Version Tools

Git and CVS

Others

Putty, WinSCP, Data Lake, Talend, Azure, Terraform

PEER PARTICIPATIONS and CERTIFICATIONS:

Azure Data Engineer Professional

Azure AI Engineer Associate

Google Cloud Professional Machine Learning Engineer

Google Cloud Professional Database Engineer

Certified: Python Programming

Education Details:

Bachelors in Mechanical Engineering (India, 2014)

Masters in computer science – University of the Cumberlands, KY (2018)

Professional Experience.

Client: Ralph Lauren -Sterling, Virginia

Role: Azure Data Developer March 2022 to Present

Responsibilities:

Building robust and scalable data integration (ETL) pipelines using SQL, Azure Data Lake, and Azure Databricks.

Designing solutions based on needs gathered after discussing with end users/Stakeholders.

Code/implement solutions based on design adhering to department best practices and processes.

Developing codes using advanced programming languages (Python) to gather and parse the incoming data from multiple data sources (Azure Blob Storage and Azure SQL Data Warehouse), processing it, and creating views for the downstream teams to consume for further processing.

Fine-tuning the performance of PySpark run-time jobs on Azure Databricks depending on different factors (file format, compatibility with tools required) and incorporating techniques like salting and partitioning.

Testing for edge or outlier cases with the pipeline to avoid downtime. Testing the developed pipelines in different environments (developing, staging, and production) to validate the desired outcomes.

Worked in moving high and low volume data objects from Teradata and Hadoop to Azure Synapse.

Documenting the successfully created pipelines in the PostgreSQL database so the Analytics Operations team can access the views.

Created interactive visualizations and dashboards using Power BI that enabled business users and executives to explore product usage and customer trends.

Merging all the completed codes to Azure Repos, updating details about the job in Confluence, and ensuring all the stakeholders are notified about the completed jobs and the resolved errors.

Collaborate with Software Solution team members and other staff to validate desired outcomes for code before, during, and post-development.

Involved in Data Modeling using Star Schema, Snowflake Schema.

Cleaning the raw data to make it available for visualization leads to better business and decisions.

Based on the business requirements from the business development team and discussions with MTD (Developers) team, developing data pipelines that would be easier for other teams to transition and work with.

Experienced in loading and transforming large sets of structured, semi-structured data using ingestion tool Talend.

Experience in Power BI calculations and applying complex calculations to large, complex data sets.

Worked on Azure Synapse database on queries and writing Stored Procedures for normalization.

Worked with Azure Synapse’s stored procedures, used procedures with corresponding DDL statements, used JavaScript API to easily wrap and execute numerous SQL queries.

Training technical staff to understand how to access/utilize the delivered solution.

Responsible for scheduled maintenance of Azure File Storage, Azure Databricks clusters, and other deployed resources for testing, staging, and production phases.

Reviewing and debugging the codes to speed up production.

Environment: Hadoop, HDFS, Azure Data Factory, Azure Data Lake Analytics, Azure HDInsight, Azure Synapse Analytics, Azure HDInsight, Azure Data Factory, Hive NoSQL HBase, Shell Scripting, Scala, Spark SQL, Azure SQL Database, Power BI.

Client: PayPal (Remote) - Austin, Texas September 2021 to January 2022

Role: GCP Data Engineer

Responsibilities:

Analyzed the scope of migrating existing data and its pipeline to GCP cloud.

Analyzed data pulling from EIDW (Teradata) and sent them as files on which transformation was applied and was loaded in SQL server tables.

Analyzed ML models created from the data.

Implementation of DataLake architecture in GCP and finally the data was to be loaded into Big Query.

Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.

Migrating data from FS to Snowflake within the organization

Analyzed the Python job scripts on existing pipeline and formed mapping document between source and target and their corresponding transformation logic of database tables.

Creating data dictionary documents like collection of names, definitions, attributes, and table properties of databases through SQL Server Management Studio.

Used Hive to analyze the partitioned and bucketed data and computed various metrics for reporting.

Implemented data streaming capability using Kafka and Talend for multiple data sources.

Developed a POC for project migration from on prem Hadoop MapR system to Snowflake

Analyzed the data by performing Hive queries (Hive SQL) and running Pig scripts (Pig Latin) to study customer behavior.

Implemented PySpark and Spark SQL for faster testing and processing of data.

Developed multiple MapReduce jobs for data cleaning.

Used JIRA to track bugs.

Environment: SQL Server, Python, SQL, GCP cloud storage bucket, BigQuery, Snowflake, PySpark,

Client: CVS Health - Chicago, IL June 2020 to July 2021

Role: Data Engineer

Responsibilities:

Developed ETL data pipelines using Sqoop, Spark, Spark SQL, Scala, and Oozie.

Used Spark for interactive queries, processing of streaming data and integrated with popular NoSQL databases.

Experience with AWS Cloud IAM, Data pipeline, EMR, S3, EC2.

Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances

Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.

Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations

Developed Spark code using Scala and Spark-SQL for faster processing of data.

Created Oozie workflow engine to run multiple Spark jobs.

Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-SQL, Data Frame, pair RDD's, Spark YARN.

Experience with terraform scripts which automates the step execution in EMR to load the data to Scylla DB.

Developed stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.

Prepared scripts to automate the ingestion process using Pyspark and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.

Implemented scheduled downtime for non-prod servers for optimizing AWS pricing.

De-normalizing the data as part of transformation, which is coming from Netezza and loading it to No SQL Databases and MySQL.

Experienced in Dimensional Data modelling, Star/Snowflake Schema, FACT and Dimension tables.

Developed Kafka consumer API in Scala for consuming data from Kafka topics.

Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system using Scala programming.

Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.

Good knowledge in setting up batch intervals, split intervals, and window intervals in Spark Streaming using Scala Programming language.

Implemented Spark-SQL with various data sources like JSON, Parquet, ORC, and Hive.

Loaded the data into Spark RDD and did in memory data Computation to generate the output response.

Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real-time and persist it to Cassandra.

Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.

Developed Spark scripts using Scala Shell commands as per the requirements.

Environment: HDFS, Spark, Scala, Tomcat, Netezza, EMR, Oracle, Sqoop, AWS, Terraform, Scylla DB, Cassandra, MySQL, Oozie

Client: Cardinal Health, Columbus, OHIO Jan 2019 to May 2020

Role: Sr. Hadoop Developer

Responsibilities:

Experience with complete SDLC process staging code reviews, source code management, and build process.

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.

Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval, and processing systems.

Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.

Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.

Experience in moving data between GCP and Azure using Azure Data Factory.

Experience in building power bi reports on Azure Analysis services for better performance.

Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery

Developed data pipelines using Flume, Sqoop, Pig, and Map Reduce to ingest data into HDFS for analysis.

Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

Developed Oozie Workflows for daily incremental loads, which get data from Teradata and then imported into hive tables.

Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data into HDFS through Sqoop.

Developed pig scripts to transform the data into a structured format and are automated through Oozie coordinators.

Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming.

Wrote Sqoop scripts for importing large data sets from Teradata into HDFS.

Performed Data Ingestion from multiple internal clients using Apache Kafka.

Wrote MapReduce jobs to discover trends in data usage by the users.

Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe's.

Load and transform large sets of structured, semi structured, and unstructured data Pig.

Experienced working on Pig to do transformations, event joins, filtering, and some pre-aggregations before storing the data onto HDFS.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Involved in developing Hive UDF’s for the needed functionality that is not available out of the box from Hive.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala, and Python.

Responsible for executing hive queries using Hive Command Line, Web GUI HUE, and Impala to read, write and query the data into HBase.

Developed and executed hive queries for denormalizing the data.

Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.

Experience loading and transforming structured and unstructured data into HBase and exposure handling Automatic failover in HBase.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s.

Environment: Cloudera, GCP, Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Zookeeper, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Phabricator.

Client: Options Clearing Corporation, Chicago, IL Feb 2018 to Dec 2018

Role: Hadoop Developer

Responsibilities:

Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop

Implemented real time analytics pipeline using Confluent Kafka, storm, elastic search, Splunk and green plum.

Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.

Responsible for all Public (AWS) and Private (Openstack/VMWare/DCOS/Mesos/Marathon) cloud infrastructure

Design and develop Informatica BDE Application and Hive Queries to ingest Landing Raw zone and transform the data with business logic to refined zone and to Green plum data marts for reporting layer for consumption through Tableau.

Installed, configured, and maintained big data technologies and systems. Maintained documentation and troubleshooting playbooks.

Automated the installation and maintenance of Kafka, storm, zookeeper and elastic search using salt stack technology.

Developed connectors for elastic search and green plum for data transfer from a Kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.

Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in Hive

Exploring with Spark improving the performance and optimization of the existing algorithms Hadoop using Spark context, Spark-SQL, Data Frame, Spark YARN

Imported and exporting data into HDFS and Hive using SQOOP & Developed POC on Apache-Spark and Kafka. Proactively monitored performance, Assisted in capacity planning.

Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop

Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS Good understanding of performance tuning with NoSQL, Kafka, Storm and SQL Technologies

Design/Develop framework to leverage platform capabilities using MapReduce, Hive UDFs

Worked on data transformation pipelines like Storm. Worked with operational analytics and log management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Green plum.

Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools in Hadoop technology stack (MapReduce, Yarm, Pig, Hive, HDFS)

Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, Eclipse, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, SOLR

Client: Infosolz, Kolkata, India Aug 2015 to Dec 2016

Role: Jr. Hadoop Developer

Responsibilities:

Worked with business teams and created Hive queries for ad hoc access.

Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Involved in review of functional and non-functional requirements

Responsible to manage data coming from various sources.

Loaded daily data from websites to Hadoop cluster by using Flume.

Involved in loading data from UNIX file system to HDFS.

Creating Hive tables and working on them using Hive QL.

Created complex Hive tables and executed complex Hive queries on Hive warehouse.

Wrote MapReduce code to convert unstructured data to semi structured data.

Used Pig to extract, transformation & load of semi structured data.

Installed and configured Hive and written Hive UDFs.

Develop Hive queries for the analysts.

Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Design technical solution for real-time analytics using Kafka and HBase.

Cluster co-ordination services through Zookeeper.

Collected the logs data from web servers and integrated it to HDFS using Flume.

Creating Hive tables and working on them using Hive QL.

Used Pig as ETL tool to do transformations, event joins, and some pre-aggregations before storing the data onto HDFS.

Support the data analysts and developers of BI and for Hive/Pig development.

Environment: Apache Hadoop, HDFS, Cassandra, MapReduce, HBase, Impala, Java (jdk1.6), Kafka, MySQL, Amazon, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig, Infosphere Python, Scala, NoSQL, Flume, Oozie

Client: Kovair Software Pvt. Ltd, India April 2013 to June 2015

Role: SQL Developer

Responsibilities:

Create and maintain database for Server Inventory, Performance Inventory

Working with SQL, and T-SQL, VBA

Involved in creating tables, stored procedures, indexes

Creating and Maintain users

Creating / Running jobs with packages

Design, Develop, Deploy Packages with WMI Queries

Importing Data from various sources like Excel, SQL Server, Front base

Collecting Server Inventory data from users using InfoPath 2003/2007 into SQL Server 2005

Creating Linked Servers to other databases like Front base, and import data

Using Linked Servers

Ensuring Data Consistency, Analyzing the Data

Generate Dashboard Reports for Internal Users using SQL Server 2005 Reporting Services

Backing up the Database

Rack/Stack ProLiant Servers, installing base operating system

Deploy various reports on SQL Server 2005 Reporting Server

Design Reports as of User Requirements

Involved in migrating servers i.e. physical to virtual, virtual to virtual

Installing and Configuring SQL Server 2005 on Virtual Machines

Migrated hundreds of Physical Machines to Virtual Machines

Conduct System Testing and functionality after virtualization

Monitor the migrated System for next 48 hrs.

Closely work with team

Environment: Java/J2EE, JDK 1.7/1.8, LINUX, Spring MVC, Eclipse, JUnit, Servlets, DB2, Oracle 11g/12c, GIT, GitHub, JSON, RESTful, HTML5, CSS3, JavaScript, Rally, Agile/Scrum

** References will Provided up on Request **



Contact this candidate