Sql Server Data Lake

Location:

Dallas, TX

Posted:

September 20, 2023

Contact this candidate

Resume:

Naga Satya Alaikya Kalidindi

Email: ************@*****.***

Phone: 518-***-****

Big Data Engineer/Hadoop Developer

Summary of Experience:

Over 5+ years of diversified experience in Software Design & Development. Experience as Hadoop developer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.

Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD)

Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Map R, Amazon EMR) to fully implement and leverage new Hadoop features.

Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data

Strong experience using HDFS, MapReduce, Hive, Spark, Sqoop, Oozie, and HBase.

Establishes and executes the Data Quality Governance Framework, which includes end - to-end process and data quality framework for assessing decisions that ensure the suitability of data for its intended purpose.

Experience setting up AWS Data Platform - AWS CloudFormation, Development End Points, AWS Glue, EMR and Jupyter/Sagemaker Notebooks, Redshift, S3, and EC2 instances

Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.

Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.

Experience in developing Spark Applications using Spark RDD, Spark - SQL and Data frame APIs.

Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.

Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.

Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.

Database design, modeling, migration and development experience in using stored procedures, triggers, cursor, constraints and functions. Used My SQL, MS SQL Server, DB2, and Oracle

Strong understanding of Java Virtual Machines and multi-threading process.

Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.

Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.

Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory

Experience with Snowflake Multi-Cluster Warehouses

Expert in generating on-demand and scheduled reports for business analysis or management decision using SQL Servers Reporting Services (SSRS), Tableau. Periodic reporting is done on a daily, weekly, monthly and quarterly basis which helps the client.

Good understanding of the Data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, a Schema Modelling, Fact and Dimension tables.

Experience in manipulating/analysing large datasets and finding patterns and insights within structured and unstructured data.

Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.

Strong experience with ETL and/or orchestration tools (e.g. Talend, Oozie, Airflow)

Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database

Experience with Software development tools such as JIRA, Play, GIT.

Strong written and oral communication skills for giving presentations to non-technical stakeholders.

Experience in writing complex SQL queries, creating reports and dashboards.

Proficient in using Unix based Command Line Interface.

IT Skills:

Big Data Tools: Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Impala, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper

Hadoop Distribution: Cloudera CDH, Apache, AWS, Horton Works HDP

Programming Languages: SQL, PL/SQL, Python, UNIX, Pyspark, Pig, HiveQL, Scala, Shell Scripting

Spark Components: RDD, Spark SQL, Spark Streaming

Data Modeling Tools: Erwin Data Modeler, ER Studio v17

Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile

Cloud Management: MS Azure, Amazon Web Services (AWS)- EC2, EMR, S3, Redshift, EMR, Lambda, Athena

Databases: Oracle 12c/11g/ 10g, MySql, MS Sql, DB2, Snowflake

No Sql Databases: MongoDB, Hbase, Cassandra

OLAP Tools: Tableau, SSAS, Business Objects, and Crystal Reports 9

ETL/Data warehouse Tools: Informatica, and Tableau.

Version Control: CVS, SVN, Clear Case, Git

Operating System: Windows, Unix, Sun Solaris

PROFESSIONAL EXPERIENCE:

Client: Vistex INC, Hoffman Estate, IL Apr 2020 to till date

Role: Bigdata Engineer/Hadoop Developer

Responsibilities:

Developed a NIFI Workflow to pick up the data from SFTP server and send that to Kafka broker.

Developed Oozie workflow engine to run multiple Hive, Pig, Tealeaf, Mongo DB, Git, Sqoop and Spark jobs.

Good experience in using Relational databases Oracle, MY SQL, SQL Server and PostgreSQL

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing

Installed Kafka Producer on different severs and Scheduled to produce data for every 10 seconds

Developed a Spark job in Java which indexes data into ElasticSearch from external Hive tables which are in HDFS.

Expert on Tableau reports, dashboards and publishing to the end users for executive level Business Decision.

Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS and worked extensively with Sqoop for importing metadata from Oracle.

Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.

Responsible for implementation and ongoing administration of Hadoop infrastructure

Working on designing the Map Reduce and Yarn flow and writing Map Reduce scripts, performance tuning and debugging.

Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Impala, Tealeaf, Pair RDD's, Nifi, DevOps, Spark YARN.

Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users

Experienced Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.

Used the AWS-CLI to suspend an AWS Lambda function. Used AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS.

Designed, developed and tested Tableau visualizations for dashboard and ad-hoc reporting solutions by connecting from different data sources and databases.

Using Spark Dataframe API in Scala for analyzing data.

Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required

Worked with various HDFS file formats like Parque, IAM, Json for serializing and deserializing.

Implemented Cluster for NoSQL tool HBase as a part of POC to address HBase limitations.

Used IAM to detect and stop risky identity behaviors using rules, machine learning, and other statistical algorithms

Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.

Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing

Installed application on AWS EC2 instances and configured the storage on S3 buckets.

Evaluating client needs and translating their business requirement to functional specifications thereby on boarding them onto Hadoop ecosystem.

Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster

Used Kafka and Kafka brokers to initiate spark context and processing livestreaming.

Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.

Implemented many Kafka ingestion jobs to consume the real time data processing and batch processing.

Migrated Map reduce jobs to Spark jobs to achieve better performance.

Strong Knowledge on architecture and components of Tealeaf, and efficient in working with Spark Core, SparkSQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka

Responsible for creating SQL data sets for Tableau recurring and Ad-hoc Reports.

Responsible to manage data coming from different sources through Kafka.

Responsible for developing data pipeline using Spark, Scala, Apache Kafka to ingestion the data from CSL source and store in HDFS protected folder.

Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.

Stored data in AWS S3 like HDFS and performed EMR programs on data stored.

Exposure to Spark, Spark Streaming, Spark MLlib, snowflake, Scala and Creating the Data Frames handled in Spark with Scala.

Very good implementation experience of Object-Oriented concepts, Multithreading and Java/Scala

Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Pair RDD's, Spark YARN

Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI.

Written the Map Reduce programs, Hive UDFs in Java

Extracted and updated the data into HDFS using Sqoop import and export.

Developed HIVE UDFs to incorporate external business logic into Hive script and Developed join data set scripts

Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra.

Monitor Hadoop cluster connectivity and security and File system management

Worked on AWS Lambda functions in python for AWS Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.

Developed Apache Spark applications by using spark for data processing from various streaming sources.

Environment: Hadoop (HDFS, Map Reduce), Kafka, Scala, AWS Services (Lambda, EMR, Auto scaling), Yarn, IAM, PostgreSQL, Spark, Impala, Mongo DB, Java, Tableau, Pig, DevOps, HBase, Oozie, Hue, Sqoop, Flume, Oracle, NIFI, Git.

Client: Anthem, Atlanta, GA Mar 20 to Oct 20

Role: Data Engineer/Hadoop Developer/ETL Developer

Responsibilities:

Collecting data from various Flume agents that are imported on various servers using Multi-hop flow.

Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume.

Experienced with handling administration activations using Cloudera manager.

Create an Architectural solution that leverages the best Azure analytics tools to solve our specific need in Chevron use case

Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks

Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies like Hadoop Hive, Azure Data Lake storage

Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Manage and review Hadoop log files

Writing pyspark and spark Sql transformation in Azure Databricks to perform complex transformations for business rule implementation

Implement Spark Kafka streaming to pick up the data from Kafka and send to Spark pipeline.

Experience in working with different join patterns and implemented both Map and Reduce Side Joins.

Wrote Flume configuration files for importing streaming log data into HBase with Flume.

Imported several transactional logs from web servers with Flume to ingest the data into HDFS.

Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.

Created Partitioned Hive tables and worked on them using HiveQL.

Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.

Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.

Acted for bringing in data under HBase using HBase shell also HBase client API.

Worked on UDFS using Python for data cleansing

Created ad-hoc reports to users in Tableau Desktop by connecting various data sources, multiple views and associated reports.

Documenting project design and test plan for various projects landing on Hadoop platform

Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.

Implemented the workflows using Apache Oozie framework to automate tasks.

Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.

Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.

Troubleshoot day to day issues on multiple Hadoop Cluster.

Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.

Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop.

Develop Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.

Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.

Created various reports using Tableau based on the client’s needs.

Built pipelines to move hashed and un-hashed data from XML files to Data lake.

Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.

Loading Data into HBase using Bulk Load and Non-bulk load.

Extensively worked with Spark-SQL context to create data frames and datasets to pre-process the model data.

Environment: Hadoop, Cloudera, Flume, HBase, HDFS, Tableau, MapReduce, Kafka, YARN, Hive, Pig, Sqoop, Oozie, Java, Azure, Data Factory, Databricks, HDInsight, PL/SQL, MySQL, Oracle, TEZ

Client: Prithvi Information Solution Ltd, Hyderabad, India Mar 2017 to Nov 2019

Role: Hadoop Developer

Responsibilities:

Importing and exporting data into HDFS from Oracle Database and vice versa using sqoop.

Involved in review of functional and non-functional requirements.

Wrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration.

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce

Installed and configured Pig and also written Pig Latin scripts.

Created SSIS packages to pull data from SQL Server and exported to Excel Spreadsheets and vice versa.

Loading data from various sources like OLEDB, flat files to SQL Server database Using SSIS Packages and created data mappings to load the data from source to destination

Setup and benchmarked Hadoop/HBase clusters for internal use

Creating Hive tables and working on them using Hive QL. Experienced in defining job flows.

Created batch jobs and configuration files to create automated process using SSIS.

Designed and implemented data transfer from and to Hadoop and AWS.

Designed and implemented MapReduce-based large-scale parallel relation-learning system

Extensive use of Expressions, Variables, Row Count in SSIS packages

Deploying and scheduling reports using SSRS to generate daily, weekly, monthly and quarterly reports.

Involved in creating Hive tables, loading the data and writing hive queries that will run internally in a map reduce way. Developed a custom File System plugin for Hadoop so it can access files on Data Platform.

The custom File System plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.

Environment: Hadoop, MapReduce, AWS, Amazon S3. Pig, SQL Server, Hive, Hbase, SSIS, SSRS, Report Builder, MS Office, Excel, Flat Files, T-SQL.

Education Details:

Bachelor of Technology in JNTU Kakinada 2009.

Master of Technology in JNTU Kakinada 2012.

Contact this candidate