Data Engineer Sql Server

Location:

Downtown, TX, 78701

Posted:

December 26, 2023

Contact this candidate

Resume:

SHEFALI

Email: *******.**********@*****.*** PH: 936-***-****

Sr.Data Engineer

Professional Summary

Over 8+ years of experience as a DATA ENGINEER with excellent knowledge on the Data warehouse environment, PySpark, Data Lake, Oracle, Teradata, SQL SERVER and on AWS Services with Data Visualization tool like Tableau and on Apache Hadoop technologies like Hadoop Distributed File System (HDFS), Map Reduce framework, Hive, Sqoop, Spark and Scala including Microsoft technologies like .Net.

Having 6+ years of experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark, Scala and Python.

Experience in Evaluation/design/development/deployment of additional technologies and automation for managed services on AWS Alexa, LEX, Lambda, EC2, EMR, Kinesis, Firehose, SQS, SNS, CloudWatch, Data Pipeline, Dynamo DB, AWS Glue, Athena, Aurora DB and RDS etc.

Created Chatbots using Amazon Alexa and Amazon Lex Services using Node.js Scripting Language.

Having experience on AWS Alexa and AWSLEX Chat Bot Development using AWS Lambda Functions.

Having experience as a BI Consultant in Data Warehousing Environment that includes Analysis, Data Modeling, Design, Development, Administration, Data Migration, Testing, support, and Maintenance using MSBI Tools (SQL Server, SSIS, SSAS, SSRS) along with Power BI and SharePoint 2010.

Extensive working experience in Normalization and De-Normalization techniques for both OLTP and OLAP systems in creating Database Objects like tables, Constraints (Primary key, Foreign Key, Unique, Default), Indexes.

Strong Experience with writing and understanding complex Hive queries (HQL)

Expertise in writing SQL Queries, Dynamic-queries, sub-queries, and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors and extensive experience in advanced SQL Queries and PL/SQL stored procedures.

Strong knowledge of Amazon Kinesis, AWS Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and Amazon Simple Workflow Service (Amazon SWF).

Developed Docker health checks and auto-scaling policies to maintain application uptime and resource utilization.

Having experience in Design, Development, Administration, Data Migration, Testing, Support and Maintenance using Redshift database.

Implemented backup and disaster recovery solutions for Kubernetes workloads and etc

Integrated Kubernetes with identity providers (e.g., LDAP, OAuth, SSO) for centralized authentication and authorization.

Managed Kubernetes secrets and Config Maps for sensitive data and configuration management.

Utilized Docker volumes and data persistence techniques to ensure data integrity and availability for stateful applications.

Strong experience in migrating and implementation of multiple applications from on-premises to cloud using AWS services SQS, SNS, Redshift, Dynamo DB, Cloud Formation, Route 53, EC2, RDS, Aurora, Lambda, Kinesis and VPC.

Experience in implementing big data solution using Azure Data Lake, Data Factory, HDInsight, and data bricks.

Experience in writing complex queries and stored procedures in Teradata and other MPP systems (Redshift, Teradata and Netezza) as well.

Experience in using EC2 instances, RDS instance and working experience on big data clusters on EC2.

Strong experience in extracting and loading data using complex business logic using impala from different data sources.

Strong hands-on experience in implementing Dashboards, Data Visualizations and Analytics using Tableau Desktop.

Created Tableau Dashboards using Stack Bars, Bar Graphs, Scattered Plots, Gantt Charts, Geographical maps using Show Me Functionality.

Developed Tableau Data Visualization using Heat Maps, Tree Maps, Scatter Plots, Circle Plots and Pie Charts.

Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc., and published them to the web page using URL action and displayed the filtered results dynamically.

Effectively used Data Blending feature in Tableau and defined best practices for Tableau report development.

Experienced in Consolidating and auditing Metadata from disparate tools and sources, including business intelligence (BI), extract, transform, and load (ETL), relational databases, modeling tools, and third-party metadata into a single repository.

Sound knowledge of database architecture for OLTP and OLAP applications, Data Analysis and ETL processes and data modeling and dimension modeling.

Experience in tasks like Project tracking, Mentoring, Version Controls, Software Change Request management, Project Deliveries / Quality Control and Migration.

Technical Skills:

Data Technologies

MapReduce, HDFS, Sqoop, PIG, Hive, Kafka, Yarn, SparkMLIib

Databases

Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata

Frameworks

Hadoop, Apache Spark, APEX

Programming Languages

Python, Pyspark, Scala, Shell Scripting, Java, Unix, SQL, PL/SQL

Web Services

AWS, Azure, Snowflake, Databricks

Visualization/Reporting

Tableau, PowerBI, SSAS, SSRS

Development Tools

PyCharm, Jupyter Notebook, Docker, and Kubernetes

Version Control

Git, GitHub

Methodologies

Agile (Scrum), Waterfall

Professional Experience

Data Engineer

Broadridge, Lake Success, NY April 2022 to Present

Roles & Responsibilities

Developed generic Spark Ingestion code that queries data from multiple data sources (Oracle, Teradata, SAP, RDS) and writes Data to AWS S3.

Providing E2E solutions for Enterprise Data Management including Data Architecture, Designing and Architecting Data (Quality, Governance), Metadata Management, Designing & Coiling Data Strategy, Master Data Management and Data Modelling-Conceptual, Logical & Physical.

Worked in Agile Data Modeling methodology and creating data models in sprints in SOA architecture and involved with delivery of complex enterprise data solutions with comprehensive understandings in Architecture, Security, Performance, Scalability, and Reliability.

Developed Databricks Notebooks to generate the hive create statements from the data and load the data into the table.

Developed Python code that is used for orchestration of all batch process including batch and file-based processing.

Developed HQL’s to transfer data between source table and target tables.

Responsible for full data loads from production to AWS Redshift staging environment and worked on migrating of EDW to AWS using EMR and various other technologies.

Worked on designing Conceptual, Logical and Physical data models and performed data design reviews with the Project team members.

Expertise in converting AWS existing infrastructure to server less architecture (AWS Lambda, Kinesis) and deployed via Terraform or AWS Cloud formation.

Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).

Developed Airflow dags to orchestrate sequential and parallel ETL Jobs.

Developed PySpark code that is used to compare data between HDFS and S3.

Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive, Spark, and Scala.

Extensively used Hive, Spark optimization techniques like Partitioning, Bucketing, Map Join, parallel execution, Broadcast join and Repartitioning.

Developed and maintained DataStage job templates and standards, ensuring consistency and adherence to best practices across the organization.

Managed and monitored DataStage server performance, fine-tuning job configurations for optimal resource utilization.

Led the adoption of Docker containers within the organization, resulting in improved deployment efficiency and reduced infrastructure costs.

Implemented Docker Swarm and Kubernetes orchestration for container management, enhancing scalability and high availability of applications.

Developed Docker containerization strategies to containerize legacy applications, reducing deployment time by 50%.

Worked on migration projects to upgrade DataStage to newer versions, ensuring minimal disruptions to ongoing operations.

Strong understanding of Data Modeling (Relational, dimensional, Star and Snowflake Schema), Data analysis, implementations of Data warehousing using Windows and UNIX.

Led a team of ETL developers in designing and implementing complex data integration solutions using IBM DataStage.

Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop and Falcon to import and export the data to Oracle.

Used various spark Transformations and Actions for cleansing the input data and Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.

Spearheaded the adoption of Kubernetes within the organization, resulting in improved application scalability, resilience, and resource optimization.

Implemented Jenkins pipelines to automate end-to-end machine learning workflows, from data preprocessing to model training and deployment.

Orchestrated machine learning experiments using Jenkins, allowing for efficient hyperparameter tuning and model evaluation.

Developed custom Jenkins plugins and extensions to integrate with popular machine learning frameworks like TensorFlow, PyTorch, and scikit-learn.

Orchestrated the migration of legacy monolithic applications to Kubernetes-based microservices, increasing agility and reducing operational overhead.

Implemented and managed Kubernetes clusters on multiple cloud platforms, including AWS, Azure, and GCP, for hybrid and multi-cloud deployments.

Developed Spark scripts by using Scala shell commands as per the requirement.

Created partitioned, bucketed Hive tables, loaded data into respective partitions at runtime, for quick downstream access.

Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.

Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.

Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Worked with client stakeholders to generate valuable insights that are useful for business.

Worked on a POC and suggested all the possible optimization techniques that can be used in SPARK (broadcast join, repartition vs Coalesce, partitioning vs Bucketing)

Environment: Erwin 9.x, SQL, Oracle, Teradata, SAP, RDS, SQL Server, Rational Rose, Windows XP, Oracle 10g, TOAD, PL/SQL, Flat Files, Teradata, Jenkins, T-SQL, Netezza, MDM, Data Stage, Docker, informatica Power Centre, Kubernetes, DB2, SSRS, SAS, SQL Server, SSIS, SSAS, SSRS, Visio, SharePoint, Informatica, Tableau, LINUX

Data Engineer

Merck Pharma, Branchburg, NJ November 2019 to March 2022

Roles & Responsibilities

Developed common component code using Python, these modules can be used by any other downstream applications for data load, data transformation and data migration.

Developed Python code for Databricks Cluster Start, Cluster Stop and Submitting jobs of any compatible type.

Developed orchestration modules using python code to handle the ingestion and transformation flows. These modules log when the airflow task is started, stopped, and running and helps in achieving restorability.

Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.

Used Sqoop scripts to ingest data from different RDBMS sources into Hadoop Cluster (HDFS) and created Hive tables, partitions, data loading into hive tables, etc.

Created custom Docker images optimized for various programming languages and frameworks, ensuring consistent development and deployment environments.

Managed Docker Compose configurations to define multi-container applications, simplifying development workflows.

Developed custom DataStage stages and connectors to integrate with external data sources and targets.

Managed metadata and data lineage in DataStage, maintaining documentation and ensuring data traceability.

Worked on several functions in Scala Library to build Spark Applications, Spark SQL, RDDs -Transformations, Actions, Data Frames and pushed the results to the HDFS.

Developed Jenkins workflows to automate data collection, preprocessing, and labeling for machine learning datasets.

Utilized Jenkins to schedule regular data ingestion and updates for machine learning models with fresh data.

Created custom DataStage routines and user-defined functions to address unique data transformation requirements.

Implemented change data capture (CDC) mechanisms in DataStage to capture and process real-time data changes.

Worked on DataStage job migration and deployment, ensuring seamless transitions between development, testing, and production environments.

Integrated Kubernetes with CI/CD pipelines, automating the deployment of containerized applications through tools like Jenkins and GitLab CI/CD.

Implemented Kubernetes networking solutions like Ingress controllers and Service Mesh (e.g., Istio) for enhanced application routing and traffic control.

CreatingHiveexternaltablesandpartitionedtablesusingHiveIndexandusedHQLtomakeeaseofdataanalytics.

Worked on Building and implementing real-time streaming ETL pipeline using Kafka StreamsAPI.

Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Performed exploratory data analysis (EDA) using Python and done Python integration with Hadoop Map Reduce, Spark and worked on NoSQL databases including Mongo DB, and Cassandra and implemented multi-datacenter and multi-rack Cassandra cluster.

Implemented Cloud Identity-Aware Proxy (IAP) for secure access control to applications hosted on GCP.

Conducted regular security assessments and vulnerability scans on SAP infrastructure, addressing and remediating any issues.

Analyzed existing systems and proposed improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.

Generated DDL from model for RDBMS (Oracle, Teradata) and created, managed, and modified logical and physical data models using a variety of data modeling philosophies and techniques including Inmon or Kimball.

Involved in complete remodeling of data processing pipeline by changing the design of the data flow and created pipelines to migrate the data from on-prem resources through the data lake and load the data into the Azure SQL Datawarehouse.

Environment: Erwin r9.6, SQL, Oracle 12c, MDM, Azure-SQL, Azure DW, Azure Data Bricks, Azure ADLS, Data Factory, Teradata SQL Assistant, Kubernetes, Python, DataStage, Docker, Tableau, Netezza, Informatica, PL/SQL, SQL Server, Jenkins, Windows, Hive, Sqoop, Cassandra, Hadoop, and UNIX

Data Engineer

Charter Communications, Negaunee, MI May 2017 to October 2019

Roles & Responsibilities

Migrated complex MapReduce programs into Spark RDD transformations, actions.

Created and worked on Sqoop jobs with incremental load to populate Hive External tables.

Involved in importing data from Microsoft SQL Server and Teradata into HDFS using Sqoop.

Developed Sqoop scripts to import data from relational sources and handled incremental loading.

Extensively worked on Data frames and Datasets using Spark and Spark SQL.

Used AWS services like EC2 and S3 for small datasets.

Worked with Data Steward Team for designing, documenting, and configuring Informatica Data Director for supporting management of MDM data.

Worked with BTEQ to submit SQL statements, import, and export data, and generate reports in Teradata.

Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data and developed SQL Stored procedures to query dimension and fact tables in data warehouse.

Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.

Implemented automated testing within Jenkins pipelines to verify the correctness of machine learning models and their integration with applications.

Established Jenkins jobs for continuous model monitoring, including fairness, bias, and security checks.

Conducted code reviews and mentored junior ETL developers to improve coding standards and skills in DataStage.

Designed and implemented DataStage job scheduling and orchestration using tools like Control-M or Autosys.

Developed serverless applications using Google Cloud Functions and Cloud Pub/Sub, reducing operational overhead.

Involved in ETL processing using Pig & Hive in AWS EMR, S3 and Data Profiling, Mapping, and Integration from multiple sources to AWS S3.

Designed and optimized Kubernetes deployment strategies, including rolling updates, blue-green deployments, and canary releases.

Created custom Kubernetes operators to automate the management of complex stateful applications, reducing manual intervention.

Experienced in deploying and scheduling Reports using SSRS to generate all daily, weekly, monthly, and quarterly Reports including status and experienced in designing and deploying reports with Drill Down, Drill Through and Drop-down menu option and Parameterized and Linked reports.

Implemented Docker security best practices, including image scanning and runtime security, to mitigate vulnerabilities.

Integrated Docker into the CI/CD pipeline, enabling automated testing and deployment of containerized applications.

Collaborated with DevOps teams to optimize Docker file creation and image size, reducing deployment times and resource consumption.

Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models

Developed Tableau visualizations and dashboards using Tableau Desktop, Tableau workbooks from multiple data sources using Data Blending.

Extensively worked on Jenkins to implement continuous integration (CI) and Continuous deployment (CD) processes.

Incorporating GIT to keep track of the history, merging code between the different versions of the software, check-in/check-out options.

Environment: ER/Studio, OLTP, SQL, SSIS, SSAS, SSRS, PL/SQL and 3NF, Hadoop, Hive, Pig, MapReduce, MongoDB, HBase, AWS S3, AWS Redshift, Kubernetes, Python, Big Data, Docker, Spark, XML, Tableau, SSRS, Teradata, Netezza and Teradata SQL Assistance.

Data Engineer

Grapesoft Solutions Hyderabad, India October 2015 to February 2017

Roles & Responsibilities

Used update strategy to effectively migrate data from source to target.

Moved the mappings from development environment to test environment.

Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database.

Interacted with the business community and database administrators to identify the Business requirements and data realties.

Experienced in Agile Methodologies and SCRUM Process.

Built various graphs for business decision making using Python MatPlotLib library.

Worked in development of applications especially in UNIX environment and familiar with all its commands.

Implement code in Python to retrieve and manipulate data.

Reviewed basic SQL queries and edited inner, left, & right joins in Tableau Desktop by connecting live/dynamic and static datasets.

Reported and created dashboards for Global Services & Technical Services using SSRS, Oracle BI, and Excel. Deployed Excel VLOOKUP, PivotTable, and Access Query functionalities to research data issues.

Created conceptual, logical and physical relational models for integration and base layer; created logical and physical dimensional models for presentation layer and dim layer for a dimensional data warehouse in Power Designer.

Involved in reviewing business requirements and analyzing data sources form Excel, Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.

Analyzing, designing, developing, implementing, and maintaining ETL jobs using IBM Info Sphere Data stage and Netezza.

Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.

Extensively worked in Client-Server application development using Oracle 10g, Teradata 14, SQL, PL/SQL, Oracle Import and Export Utilities.

Coordinated with DB2 on database build and table normalizations and de-normalizations.

Conducted brainstorming sessions with application developers and DBAs to discuss about various de-normalization, partitioning and indexing schemes for Physical Model.

Involved in several facets of MDM implementations including Data Profiling, metadata acquisition and data migration.

Extensively used SQL Loader to load data from the Legacy systems into Oracle databases using control files and used Oracle External Tables feature to read the data from flat files into Oracle staging tables.

Involved in extensive Data validation by writing several complex SQL queries and involved in back-end testing and worked with data quality issues.

Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases, and process SSAS cubes to store data to OLAP databases.

Created ETL packages using OLTP data sources (SQL Server, Flat files, Excel source files, Oracle) and loaded the data into target tables by performing different kinds of transformations using SSIS.

Migrated SQL server 2008 to SQL Server 2008 R2 in Microsoft Windows Server 2008 R2 Enterprise Edition.

Developing reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.

Performed data validation on the flat files that were generated in UNIX environment using UNIX commands as necessary.

Strong understanding of Data Modeling (Relational, dimensional, Star and Snowflake Schema), Data analysis, implementations of Data warehousing using Windows and UNIX

Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement and identified and tracked the slowly changing dimensions and determined the hierarchies in dimensions.

Designed high level ETL architecture for overall data transfer using SSIS from the source server to the Warehouse and defined various facts and dimensions in the data mart including Fact less Facts and designed the Data Mart defining Entities, Attributes and relationships between them.

Environment: Power Designer, Teradata, Oracle, PL/SQL, MDM, SQL Server 2008, ETL, Netezza, DB2, SSIS, SSRS, SAS, SPSS, Datastage, Informatica, SQL, T-SQL, UNIX, Netezza, Aginity, SQL assistance etc.

Data Analyst

Avon Technologies Pvt Ltd, Hyd, India June 2014 to September 2015

Responsibilities:

Attended and participated in information and requirements gathering sessions and translated business requirements into working logical and physical data models for Data Warehouse, Data marts and OLAP applications.

Performed extensive Data Analysis and Data Validation on Teradata and designed.

Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.

Created and maintained Logical Data Model (LDM) for the project includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.

Integrated data from various Data sources like MS SQL Server, DB2, Oracle, Netezza and Teradata using Informatica to perform Extraction, Transformation, loading (ETL processes) Worked on ETL development and Data Migration using SSIS and (SQL Loader, PL/SQL).

Involved in Designed and Developed logical & physical data models and Meta Data to support the requirements using ERWIN.

Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle.

Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models and Performance query tuning to improve the performance along with index maintenance.

Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata and wrote and executed unit, system, integration and UAT scripts in a Data Warehouse project.

Collaborated with DevOps teams to automate infrastructure provisioning and application deployment using Terraform and GCP Deployment Manager.

Provided training and mentoring to junior team members on GCP best practices and technologies.

Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data Warehouse, and data mart reporting system in accordance with requirements.

Responsible for Creating and Modifying T-SQL stored procedures/triggers for validating the integrity of the data.

Worked on Data Warehouse concepts and dimensional data modelling using Ralph Kimball methodology.

Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.

Developed separate test cases for ETL process (Inbound & Outbound) and reporting.

Environment: Oracle, MS Visio, PL-SQL, Microsoft SQL Server, SSRS, T-SQL, Rational Rose, Data warehouse, OLTP, OLAP, ERWIN, Informatica 9.x, Windows, SQL, PL/SQL, SQL Server, Talend Data Quality, Oracle 9i/10g, Flat Files, Windows.

Contact this candidate