Data Engineer Sql Server

Location:

Piscataway, NJ

Posted:

November 22, 2023

Contact this candidate

Resume:

SHEKHAR REDDY

Sr Data Engineer

Mail:***************@*****.*** Phone:475-***-****

Professional Summary:

Proficient Data Engineer with 8 years of experience designing and implementing solutions for complex business problems involving all aspects of Database Management Systems, large-scale data warehousing, reporting solutions, data streaming, and real-time analytics.

Extensive experience in utilizing Hadoop components such as Apache Spark, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, and YARN.

Working knowledge of Cloud Environments (AWS, Azure, GCP) and experience in automating, configuring, and deploying instances on the Cloud.

Experience designing and implementing Data & Analytics solutions, including ODS, Data Warehouses, Lakes, Pipelines, BI, Reporting, Metadata, Data Quality, Modeling, Compliance, and Governance.

Good Exposure to Data Quality, Data Mapping, and Data Filtration using Data warehouse ETL tools like Tableau, Power BI, Informatica, Talend, SSIS, SSRS, SSAS, and ER Studio.

Expert in SQL extensively worked DBMSs like Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, Amazon Redshift, HBase, and MongoDB.

Experience in Database Design and Development for Business Intelligence using SQL Server, Integration Services (SSIS), SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema, and Snowflake Schema.

Knowledge of Teradata and converting the data to batch processing via distributed computing.

Extensive experience in implementing Continuous Integration (CI), Continuous Delivery, and Continuous Deployment (CD) on various Applications using Jenkins, Maven, GIT, Nexus, Docker, and Kubernetes.

Good Exposure to creating various dashboards in Reporting Tools like SAS, Tableau, Power BI, BO, and QlikView using multiple filters and sets while dealing with massive data.

Well-versed with Agile with SCRUM, Waterfall Model, and Test-driven Development (TDD) methodologies.

Ability to handle complex situations within crunched deadlines, innovative thinker and creative problem solver, multitasking abilities, strong analytical and coordination skills.

Technical Skills:

Languages: Python, R, Java, Shell scripting, Scala, SQL, PL/SQL, T-SQL.

Cloud: AWS, Azure, GCP

Hadoop distributions: Amazon Elastic MapReduce, Cloudera, Hortonworks Data Platform (HDP), Mapper Hadoop Distribution.

ETL Tools: Tableau, Power BI, Informatica, Talend, SSIS, SSRS, SSAS, ER Studio

Databases: Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, Amazon Redshift, HBase, MongoDB

Data Warehouses: Snowflake, Big Query, Netezza

Web Technologies:HTML5, CSS3, JavaScript, XML

Big Data Tools: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka, Flume, Impala, Oozie, Zookeeper, MapReduce, Airflow, Nifi, Yarn, Kibana, Splunk

ML Algorithms: Logistic Regression, Decision Tree, Random Forest, KNearestNeighbour (KNN), Principal Component Analysis.

Methodologies: Agile/Scrum/Kanban, Waterfall

Operating Systems: Windows, Red Hat Linux, UNIX, Mac OS

Other tools: JSON, JIRA, Nagios, Jenkins, Maven, ANT, Docker, Kubernetes, Terraform, Ansible, GIT

Education Details:

Bachelor of Engineering Computer Science from JNTUH, Hyderabad, India (2011 – 2015)

Work Experience:

Client: Walmart April 2022 - Present

Role: Sr Data Engineer Location: Sunnyvale, CA

Responsibilities:

Build Complex distributed systems involving massive data handling, collecting metrics building data pipelines, and Analytics.

Involved in Data Ingestion to one or more Azure Services - Azure Data Factory, Azure Databricks, Azure SQL Database, Azure Synapse Analytics (formerly SQL Data Warehouse), Azure Cosmos DB, Azure Data Lake Storage, and Azure Analysis Services.

Conceived, designed, and implemented ETL solutions with SQL server integration services (SSIS).

Involved in the Data modeling, Physical and logical Design of the Database.

Created and modified several database objects such as Tables, Views, Indexes, constraints, stored procedures, packages, functions, and triggers using SQL and PL/SQL.

Developed Python scripts for file validations in Databricks and used ADF to automate the process.

Analyze, design, and build Modern data solutions using Azure PaaS service to support data visualization.

Set up the Hadoop and Spark cluster for the various POCs to load the Cookie level data and real-time streaming. Integrate with ecosystems like Hive, HBase, Kafka, Spark, and HDFS/Data Lake/Blob Storage.

Involved in developing Spark applications using Py Spark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.

Use T-SQL to develop and maintain SQL tables, indexes, stored procedures, views, and triggers created for Power BI reporting.

Designed and developed a new solution to process the NRT data using Azure stream analytics, Azure Event Hub, and Service Bus Queue.

Created Build and Release for multiple projects (modules) in a production environment using Visual Studio Team Services (VSTS).

Developed CI/CD system with Jenkins on Kubernetes container environment, utilizing Kubernetes and Docker for the CI/CD system to build, test and deploy.

Design, develop, and deliver the REST APIs necessary to support new feature development and enhancements in an agile environment.

Used Apache airflow in the GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators, and Python callable and branching operators.

Responsible for using GIT for version control. Worked in Agile Methodology and used JIRA to maintain the project stories.

Environment: Python, Azure, ETL, SQL, ADF, Hadoop, Spark, Hive, HBase, Kafka, HDFS, PySpark, T-SQL, Power BI, VSTS, Jenkins, Kubernetes, Docker, REST APIs, GIT, Agile, JIRA.

Client: CMS: - Centers for Medicare & Medicaid Services. Jan 2021 – March 2022

Role: Data engineer Location: Baltimore, MD

Responsibilities:

Designed and set up Enterprise Data Lake to support various use cases, including Analytics, processing, storing, and reporting of voluminous, rapidly changing data.

Responsible for Setup and building AWS infrastructure using resources VPC, EC2, S3, Dynamo DB, IAM, EBS, Route53, SNS, SES, SQS, Cloud Watch, Cloud Trail, Security Group, Kinesis, Auto scaling, and RDS using Cloud Formation templates.

Created multiple scripts to automate ETL/ ELT process using PySpark from various sources. Worked on dimensional and relational data modeling concepts.

Processed data from different sources to Snowflake and MySQL.

Designs and implements Scala programs using Spark Data frames and RDDs for transformations and actions on input data.

Created Bash Script that uses AWS CLI commands to return a report of desired metrics from CloudWatch and CloudTrail.

Programming using Python, Scala, and Hadoop framework utilizing Cloudera Ecosystem projects (HDFS, Spark, Sqoop, Hive, HBase, Oozie, Impala, Zookeeper, etc.).

Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and uploading into the Data warehouse servers.

Designing and building data warehouses and data marts on AWS Redshift. Used AWS Glue for transformations and AWS Lambda to automate the process.

Developed strategies and plans for data capacity planning, life cycle data management, scalability, backup, and archiving and ensure data security and privacy standards, such as role-based security, encryption, and tokenization, were implemented.

Worked with AWS services like AWS CloudFormation, AWS Code Pipeline, and AWS Code Deploy for infrastructure automation, continuous integration, and continuous deployment (CI/CD).

Written multiple Map Reduce programs for data extraction, transformation, and aggregation from numerous file formats, including XML, JSON, CSV, and other compressed files.

Implementing Jenkins, building pipelines to drive all Microservices created to the Docker registry, and deploying to Kubernetes.

Created views in Tableau Desktop and released them to the internal team for review, further data analysis, and customization utilizing filters and actions for business performance.

Used Apache airflow in the GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators, and Python callable and branching operators.

Used cloud shell SDK in GCP to configure Data Proc, Storage, and Big Query services.

Involved in developing Python scripts, Informatica, and other ETL tools for extracting, transforming, and loading data into the data warehouse. Used Jira for ticketing and tracking issues.

Used the GIT version control tool to push and pull functions to get the updated code from the repository.

Environment: Python, AWS, GCP, ETL, MySQL, Snowflake, Scala, Hadoop, Spark, Hive, HBase, Kafka, HDFS, Py Spark, T-SQL, Tableau, Jenkins, Kubernetes, Docker, Map Reduce, XML, JSON, REST APIs, Apache airflow, GIT, Agile, JIRA.

Client: Brainbox Apps. Oct 2017 – Dec 2020

Role: Data Engineer Location: Rajasthan, India.

Responsibilities:

Extensive involvement in Data Analysis, Cleansing, Requirements gathering, Data Mapping, Functional and Technical design docs, and Process Flow diagrams.

Involved in extensive Data validation using SQL queries and backend testing.

Leveraging Driver tables to pull data in Teradata and Redshift. Worked on Django API for accessing the database.

Used to manage and review the Hadoop log files. Importing and exporting data into HDFS, Pig, Hive, and HBase using SQOOP.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

Extracted (Flat files, mainframe files), Transformed, and Loaded data into the landing area and then into the staging area, followed by integration and a semantic layer of Data Warehouse (Teradata) using Informatica mappings and complex transformations (Aggregator, Joiner, Lookup, Update Strategy, Source Qualifier, Filter, Router, and Expression.

Connected to Redshift through Tableau to extract live data for real-time Analysis.

Created Tableau Dashboard for the top key Performance Indicators for the top management by connecting various data sources like Excel, Flat files, and SQL Database.

Designed and created backend data access modules using PL/SQL stored procedures and Oracle.

Wrote SQL Queries and implemented stored procedures, functions, packages, tables, views, Cursors, and triggers.

Extract Real-time feed using Kafka and Spark Streaming; convert it to RDD, process data as Data Frame, and save it as Parquet format in HDFS.

Experience using Python collections for manipulating and looping through different user-defined objects.

Developed and executed the User Acceptance Testing portion of the test plan.

Involved in Agile methodologies, daily Scrum meetings, and Sprint planning.

Environment: Python, AWS, SQL, Teradata, Redshift, Django, Hadoop, HDFS, Pig, Hive, HBase, SQOOP, Scala, Kafka, Cassandra, Zookeeper, Tableau, Oracle, UNIX/LINUX, Agile, Scrum

Client: FusionCharts July 2015 – Sept 2017

Role: Data Analyst Location: West Bengal, India

Responsibilities:

Collaborated with data managers to define and implement data standards and common data elements for data collection.

Built ETL Pipeline using SQL to query telecom data from MySQL database by filtering, joining, and aggregating various tables.

Manipulated the raw data with NumPy and Pandas library in Python for data cleaning, exploratory Analysis, and feature engineering.

Designed A/B tests to identify variables contributing to customer churn and used Shiny library in R to turn analyses into dashboards.

Applied data mining in Spark to extract various features that provide additional information to enhance churn prediction.

Supported in constructing machine learning models using the sci-kit-learn library in Python to predict customer churn, including Decision Tree Model, Random Forest Model, and Gradient Boost Model.

Performed and utilized necessary PL/SQL queries to analyze and validate the data.

Involved in requirement gathering, database design, and implementation of star schema, Snowflake schema/dimensional data warehouse using Erwin.

Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data.

Assisted in maintaining all MySQL database applications and resolved database-related issues submitted to the help desk ticketing system.

Used Teradata utilities such as Fast Export and MLOAD for handling various tasks.

Implemented highly available, scalable, and self-healing systems on the AWS platform.

Used Tableau to design and maintain reports and dashboards to track and communicate customer churn prediction performance.

Environment: Python, Spark, SQL, R, T-SQL, Erwin, MySQL, Teradata, AWS, Tableau.

Contact this candidate