Data Engineer Sql Server

Location:

Georgetown, TX

Posted:

March 03, 2025

Contact this candidate

Resume:

Sai Shruthi Gaddam

**************@*****.***

913-***-****

https://www.linkedin.com/in/shruthi-yadav/

Sr.Data Engineer

Professional Summary

10+ years of professional experience in Information Technology as a Data Engineer with expertise in Database Development, ETL Development, Data Modeling, Report Development, and Big Data Technologies.

Experience in Data Integration and Data Warehousing using various ETL tools, including Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), and Talend.

Proficient in designing Business Intelligence Solutions with Microsoft SQL Server, leveraging SSIS, SQL Server Reporting Services (SSRS), and SQL Server Analysis Services (SSAS).

Extensively used Informatica PowerCenter and Informatica Data Quality (IDQ) as ETL tools for extracting, transforming, and cleansing data from various source systems to targets in both batch and real-time.

Experience working with Amazon Web Services (AWS) and its services like Snowflake, EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, CloudFront, CloudWatch, Data Pipeline, DMS, Aurora, and other AWS services.

Strong expertise in Relational Database Systems like Oracle, MS SQL Server, Teradata, MS Access, and DB2. Proficient in SQL, PL/SQL, SQL PLUS, TOAD, and SQL-LOADER, with experience writing, testing, and implementing triggers, stored procedures, functions, packages, and cursors.

Hands-on experience with AWS Snowflake Cloud Data Warehouse and AWS S3 Bucket for integrating data from multiple source systems, including loading nested JSON formatted data into Snowflake tables.

Extensive experience integrating Informatica Data Quality (IDQ) with Informatica PowerCenter.

Expertise in Data Mining and generating data visualizations using Tableau, Power BI, and Alteryx.

Proficient in Cloudera Ecosystems such as HDFS, Hive, Sqoop, HBase, Kafka, Data Pipeline, Data Analysis and processing with Hive SQL, Impala, Spark, and Spark SQL.

Worked with scheduling tools like Talend Administrator Console (TAC), UC4/Atomic, Tidal, Control-M, Autosys, CRON Tab, and Tivoli Workload Scheduler (TWS).

Used Flume, Kafka, and Spark Streaming to ingest real-time or near real-time data into HDFS.

Worked on AWS Data Pipeline to configure data loads from S3 into Redshift.

Led data migration from Teradata to AWS Snowflake Environment using Python and BI tools like Alteryx.

Expertise in building and architecting multiple Data Pipelines, end-to-end ETL/ELT processes for data ingestion and transformation.

Experience in moving data between GCP and Azure using Azure Data Factory.

Developed Python scripts to parse Flat Files, CSV, XML, and JSON files, extracting data from various sources and loading it into a data warehouse.

Automated migration scripts using Unix Shell Scripting, Python, Oracle/TD SQL, Teradata Macros, and Procedures.

Expert-level mastery in designing and developing complex mappings to extract data from diverse sources, including Flat Files, RDBMS Tables, Legacy System Files, XML Files, Applications, COBOL Sources, and Teradata.

Worked on JIRA for defect/issues logging and tracking and documented all work using Confluence.

Experience with ETL Workflow Management Tools like Apache Airflow and significant experience writing Python scripts to implement workflows.

Performed performance tuning of user queries by analyzing Explain Plans, recreating user driver tables with the right Primary Index, and scheduling Collection of Statistics, Secondary Indexes, and Join Indexes.

Expert knowledge in Fact-Dimensional Modeling (Star Schema, Snowflake Schema), Transactional Modeling, and Slowly Changing Dimensions (SCD).

Created clusters in Google Cloud and managed them using Kubernetes (K8s).

Used Jenkins to deploy code to Google Cloud, create new Namespaces, build Docker Images, and push them to Google Cloud Container Registry.

EDUCATION:

Bachelor’s degree in computer science from JNTU 2014.

PROFESSIONAL EXPERIENCE

Southern California Edison (SCE), California October 2023 – Present

Sr Data Engineer

Responsibilities:

•Designed and maintained scalable ETL/ELT pipelines for processing large datasets, including power grid operations, renewable energy metrics, and customer usage patterns.

•Developed workflows for ingesting and transforming data from sources such as IoT sensors, smart meters, and SCADA systems.

•Migrated legacy data systems from Teradata to AWS Redshift and Snowflake, optimizing storage and query performance to support high-scale analytics.

•Implemented cloud-based data solutions leveraging AWS S3, Glue, and Lambda for real-time and batch data processing.

•Processed and analyzed renewable energy and grid performance data using PySpark and Databricks, enabling predictive insights for load forecasting and peak demand management.

•Developed real-time streaming solutions with Kafka to monitor energy usage patterns and identify anomalies in power grid operations.

•Automated generation of compliance reports for regulatory bodies like CAISO, ensuring adherence to renewable energy integration and emissions standards.

•Supported Net Energy Metering (NEM) reporting by building robust data pipelines for customer solar and battery system data.

•Processed and integrated telemetry data from IoT devices, including smart meters and power transformers, to enhance grid reliability and identify maintenance needs.

•Centralized real-time grid data into a unified analytics platform to facilitate predictive maintenance and operational decision-making.

•Designed and implemented optimized data models, such as star and snowflake schemas, for billing systems and energy usage tracking.

•Improved SQL query performance in Teradata, AWS Redshift, and Snowflake, significantly reducing query execution times for large datasets.

•Enabled data-driven insights for renewable energy adoption programs, battery storage systems, and EV charging infrastructure projects.

•Supported energy efficiency initiatives by integrating customer energy usage data with demand-response systems for actionable insights.

•Set up monitoring systems using AWS CloudWatch and Splunk to track performance and availability of critical pipelines.

•Developed alerting mechanisms to identify and resolve issues in real-time, ensuring uninterrupted operational workflows.

•Automated deployment of data pipelines and infrastructure using Jenkins, Terraform, and Docker, reducing deployment cycles and minimizing manual interventions.

•Deployed Infrastructure as Code (IaC) to provision and manage cloud resources for scalable and consistent environments.

•Collaborated with business analysts, data scientists, and regulatory teams to define data requirements and deliver insights aligned with SCE’s strategic goals.

•Partnered with operational teams to optimize data workflows for renewable energy integration and grid modernization efforts.

•Streamlined integration of renewable energy data sources, such as solar and wind, into the grid system to optimize storage and distribution.

•Analyzed data from distributed energy resources (DERs) to enhance grid flexibility and support California’s decarbonization targets.

Environment: Informatica PowerCenter 10.x, Informatica Data Quality (IDQ), AWS Redshift, Snowflake, S3, Teradata, Oracle, Google Cloud Platform (GCP), MS SQL Server, Big Query, Python, Unix Shell Scripting, Tableau, Power BI, Apache Kafka, Apache Spark, HDFS, Hive, Sqoop, AWS Glue, AWS EMR, GitHub, Jenkins, Docker, Kubernetes (k8s), Apache Airflow, Postman, JIRA, Confluence

Chewy, Florida May 2022 – October 2023

Sr Data Engineer

Responsibilities:

•Worked extensively with AWS services like EC2, S3, and CloudFormation to enhance system scalability and reliability, resulting in improved data processing efficiency

•Developed Python scripts to parse XML and JSON files and load data into AWS Snowflake Data Warehouse, improving data accessibility and processing speed

•Designed and developed ETL processes in AWS Glue to migrate campaign data from external sources like S3 into AWS Redshift, resulting in more efficient data integration and analysis

•Performed several Banking Centre level performance analyses aimed at streamlining and optimizing the sales process and monitoring sales of certain high value checking products.

•Built a program with Python and Apache Beam, executed in Cloud Dataflow, to run data validation between raw source files and Big Query tables, ensuring data accuracy and consistency

•Built a Scala and Spark-based framework to connect data sources like MySQL, Oracle, PostgreSQL, and Big Query, improving data integration efficiency

•Documented Extract, Transform, and Load processes and developed Talend ETL processes for the Data Warehouse team using PIG and Hive, enhancing data processing accuracy

•Applied transformations using AWS Glue and loaded data back to Redshift and S3, ensuring data consistency and availability

•Analyzed and wrote SQL queries to extract data in JSON format through REST API calls, improving data retrieval efficiency

•Designed ETL pipelines from various relational databases to the Data Warehouse using Apache Airflow, streamlining data flow processes

•Extracted, aggregated, and consolidated Adobe data within AWS Glue using PySpark, improving data processing efficiency

•Developed SSIS packages to Extract, Transform, and Load ETL data into the SQL Server database from legacy mainframe data sources, enhancing data accessibility and integration

•Built data pipelines in Airflow on GCP for ETL-related jobs using different Airflow operators, streamlining data workflow automation

•Utilized Postman with HTTP requests to GET data from RESTful API and validate API calls, ensuring data accuracy and reliability

•Integrated different applications and relational databases using Informatica PowerCenter and Power Exchange, improving data integration capabilities

•Created Informatica workflows and IDQ mappings for Batch and Real Time, improving data processing efficiency and accuracy

•Developed PySpark code for AWS Glue jobs and EMR, enhancing data processing speed and reliability

•Created custom T-SQL procedures to import data from flat files into SQL Server database, streamlining data integration processes

•Developed ETL Python scripts for ingestion pipelines on AWS infrastructure (EMR, S3, Redshift, Lambda), improving data ingestion efficiency

•Monitored Big Query, Dataproc, and Cloud Dataflow jobs via Stack driver for all environments, ensuring system stability and performance

•Configured EC2 instances and IAM users and roles, and created an S3 data pipeline using Boto API to efficiently load data from internal sources, improving data accessibility and processing speed.

Environment: Informatica Power Centre 10.x/9.x, IDQ, AWS Redshift, Snowflake, S3, Postgres, Google Cloud Platform (GCP), MS SQL Server, Big query, Salesforce SQL, Python, Postman, Tableau, Unix Shell Scripting, EMR, GitHub.

UBS, New Jersey Nov 2019 – Feb 2022

Sr Data Engineer

Responsibilities:

•Completed the full Software Development Life Cycle (SDLC) including Business Requirements Analysis, Technical Design documentation, Data Analysis, and deployment, ensuring seamless delivery to business users

•Developed complex mappings using Informatica Power Centre Designer to transform and load data from Oracle, Teradata, and Sybase, improving data integration efficiency

•Analyzed and transformed source data from SQL Server, XML files, and Flat files using Informatica, ensuring accurate data loading into target tables according to business rules

•Created tables in Greenplum and loaded data through Alteryx for the Global Audit Tracker, enhancing data accessibility for audit processes

•Analyzed large and critical datasets using HDFS, HBase, Hive, HQL, PIG, SQLop, and Zookeeper, optimizing data processing and storage solutions

•Extracted, aggregated, and consolidated Adobe data within AWS Glue using PySpark, improving data accessibility for analysis

•Developed data engineering and ETL Python scripts for ingestion pipelines on AWS infrastructure (EMR, S3, Glue, Lambda), which enhanced data processing efficiency and reliability

•Modified existing data models using Erwin to enhance Datawarehouse projects, leading to improved data management

•Integrated Talend connectors into Redshift for BI development, supporting multiple technical projects simultaneously and improving project delivery

•Utilized g-cloud function with Python to load data into Big Query for on-arrival CSV files in GCS bucket, streamlining data processing

•Created an iterative macro in Alteryx to send JSON requests and download JSON responses from a web service, which improved data analysis efficiency by automating the data retrieval process

•Migrated data from transactional source systems to Redshift data warehouse using Spark and AWS EMR, enhancing data accessibility and processing speed for analytics teams

•Supported business teams with data mining and reporting by writing complex SQL queries using basic and advanced SQL, including OLAP functions like ranking, partitioning, and windowing functions, to improve decision-making processes

•Worked with EMR, S3, and EC2 services in the AWS cloud to migrate servers, databases, and applications from on-premises to AWS, resulting in increased scalability and reduced infrastructure costs

•Tuned SQL queries using Explain to analyze data distribution among AMPs and index usage, collect statistics, define indexes, revise correlated subqueries, and use hash functions, which improved query performance and reduced execution time

•Created, maintained, and customized System & Splunk applications, search queries, and dashboards using Python and AWS Glue, enhancing system performance and user experience

Environment: Informatica Power Centre 9.5, AWS Glue, Talend, Google Cloud Platform (GCP), PostgreSQL Server, Python, Oracle, Teradata, CRON, Unix Shel Scripting, SQL, Erwin, AWS Redshift, GitHub, EMR

Ace Hardware, IL June 2017 – Oct 2019

Data Engineer

Responsibilities:

•Gathered business requirements and designed logical and physical databases, improving data transformation and loading processes using SQL and performance tuning techniques

•Used SSIS to populate data from various sources, creating packages for different application data loading operations.

•Created complex reports including drill-down, drill-through, parameterized, matrix, and charts using reporting services, enhancing data analysis capabilities

•Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.

•Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.

•Developed Spark scripts using Python on AWS EMR for data aggregation, validation, and ad-hoc querying, improving data processing efficiency and accuracy

•Performed data analytics on Data Lake using PySpark on the Databricks platform, enhancing data insights and decision-making capabilities

•Designed and documented the entire architecture of Power BI POC, facilitating a clear understanding and streamlined implementation process

•Implemented and delivered MSBI platform solutions to develop and deploy ETL, analytical, reporting, and scorecard/dashboards on SQL Server using SSIS and SSRS, resulting in improved data management and reporting capabilities

•Designed and developed new Power BI solutions and migrated reports from SSRS, enhancing reporting efficiency and data visualization

•Developed and executed a migration strategy to move the Data Warehouse from Greenplum to Oracle, resulting in improved data processing efficiency and system reliability

•Loaded data into Amazon Redshift and used AWS Cloud Watch to collect and monitor AWS RDS instances, enhancing data accessibility and system performance

•Developed under scrum methodology and in a CI/CD environment using Jenkins, which streamlined development processes and reduced deployment times

•Developed UNIX shell scripts to run batch jobs in Autosys and load data into production, which improved job scheduling efficiency and reduced processing time

•Participated in the architecture council for database architecture recommendation, contributing to improved database design and performance

•Loaded data using Teradata utilities such as FastLoad, MultiLoad, and Tpump, improving data processing efficiency

•Analyzed SQL execution plans and recommended hints, restructuring, or introducing indexes and materialized views, enhancing query performance

•Deployed EC2 instances for Oracle databases, ensuring reliable and scalable database hosting

•Utilized Power Query in Power BI to pivot and un-pivot data models, enhancing data cleansing processes

Environment: MS SQL Server 2016, ETL, SSIS, SSRS, SSMS, Cassandra, AWS Redshift, AWS S3, Oracle 12c, Oracle Enterprise Linux, Teradata, Databricks, Jenkins, Power BI, Autosys, Unix Shell Scripting.

Maisa Solutions, India Dec 2015 – March 2017

Data Engineer

Responsibilities:

Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Data Import, and Data Export.

Worked on Design, Development, and Documentation of the ETL strategy to populate the data from the various source systems using the Talend ETL tool into the Data Warehouse.

Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages. Made use of Indexing, Aggregation, and Materialized views to optimize query performance.

Created Tableau dashboards/reports for data visualization, Reporting, and Analysis and presented it to the Business.

Implemented FTP operations using Talend Studio to transfer files between network folders as well as to FTP server using components like TFileCopy, TFileAcrchive, TFileDelete, TCreateTemporaryFile, TFTPDelete, TFTPCopy, TFTPRename, TFTPut,TFTPGet, etc.

Design and develop spark job with Scala to implement end-to-end data pipeline for batch processing

Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.

Knowledge of Tableau Administration Tool for Configuration, adding users, managing licenses and data connections, scheduling tasks, and embedding views by integrating with other platforms.

Experience in data profiling & various data quality rules development using Informatica Data Quality (IDQ).

Create new UNIX scripts to automate and handle different file processing, editing, and execution sequences with shell scripting by using basic Unix commands and ‘awk’, and ‘sed’ editing languages.

Integrate Collibra with Data Lake using Collibra connect API.

Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.

Create firewall rules to access Google Data proc from other machines.

Wrote Scala program for spark transformation in Dataproc, Impala, Spark, Unix Shell Scripting

Environment: MS SQL Server 2016, ETL, AWS Redshift, AWS S3, Oracle 12c, Oracle Enterprise Linux, Teradata, Databricks, Jenkins, PowerBI, Autosys, Unix Shell Scripting.

Amigos Software Solutions, Hyd, India July 2014 - Nov 2015

Big Data & Hadoop Engineer

Responsibilities:

•Analyzed Hadoop cluster and various big data analytic tools, including HBase database and Sqoop, to enhance data processing efficiency

•Built scalable distributed data solutions using Hadoop, improving data handling capabilities

•Implemented a nine-node CDH3 Hadoop cluster on Red Hat LINUX and loaded data from the LINUX file system to HDFS, enhancing data storage and retrieval processes

•Installed cluster, managed commissioning & decommissioning of data nodes, performed name node recovery, capacity planning, and slots configuration, and created HBase tables to store variable data formats of PII data, improving data management and security

•Deployed GCP-based Big Data solutions (Batch/Real-Time) using Big Query, BigTable, Cloud Storage, PubSub, Data Fusion, Dataflow, and Dataproc, enhancing data processing and analytics capabilities

•Worked on building dashboards in Tableau with ODBC connections from different sources like Big Query/ presto SQL engine.

•Implemented a script to transmit sysprin information from Oracle to HBase using Sqoop.

•Developed and implemented best income logic using scripts and UDFs, optimizing data processing and accuracy

•Implemented test scripts to support test driven development and continuous integration.

•Optimized performance queries using MS SQL Server, improving query execution speed and system efficiency

•Worked with application teams to install operating system, Hadoop updates, patches, and version upgrades, ensuring system stability and improved performance

•Managed data coming from different sources and loaded data from UNIX file system to HDFS, ensuring data integrity and accessibility

•Loaded and transformed large sets of structured, semi-structured, and unstructured data, improving data processing efficiency

•Coordinated cluster services through Zookeeper and managed Hadoop log files, enhancing system reliability and troubleshooting efficiency

•Installed Oozie workflow engine to run multiple Hive jobs, streamlining data processing workflows

•Analyzed large data sets using Spark and MS SQL Server to identify optimal aggregation methods, improving reporting efficiency

Environment: Hadoop, HDFS, Hive, Sqoop, HBase, Shell Scripting, Ubuntu, Linux RedHat, GCP(Data proc, Big query, big table, cloud storage, data flow & fusion).

Contact this candidate