Sql Server Data Quality

Location:

Cleveland, OH

Salary:

As per norms

Posted:

February 17, 2025

Contact this candidate

Resume:

SATYANARAYANA KALLEM

*******.********@*****.***

+1 -440-***-****

PROFESSIONAL SUMMARY:

8+ years of professional experience as a Data Engineer with an expert hand in Database Development, ETL Development, Data modeling, Report Development, and Big Data Technologies.

Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend.

Experience in Designing Business Intelligence Solutions with Microsoft SQL Server and using MS SQL Server Integration Services (SSIS), MS SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS).

Extensively used Informatica PowerCenter, Informatica Data Quality (IDQ) as ETL tool for extracting, transforming, loading and cleansing data from various source data inputs to various targets, in batch and real time.

Experience working with Amazon Web Services (AWS) cloud and its services like Snowflake, EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services.

Strong expertises in Relational Data Base systems like Oracle, MS SQL Server, TeraData, MS Access, DB2 designExtensive experience in integration of Informatica Data Quality (IDQ) with Informatica PowerCenter. and database development using SQL, PL/SQL, SQL PLUS, TOAD, SQL - LOADER. Highly proficient in writing, testing and implementation of triggers, stored procedures, functions, packages, Cursors using PL/SQL.

Hands on Experience with AWS Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into Snowflake table.

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.

Led the development and automation of data quality testing frameworks using pytest and qTest, resulting in a 40% increase in data accuracy for real-time processing systems.

Automated end-to-end testing of data pipelines in AWS Glue and PySpark, improving the efficiency of data validation processes for large-scale datasets.

Developed and deployed AWS Lambda functions for automated serverless data checks, ensuring real-time data quality monitoring and reducing manual intervention by 50%.

Designed and executed data integrity checks and data validation strategies for complex Snowflake data warehouses, reducing the occurrence of data discrepancies by 35%.

Managed the entire QA lifecycle for cloud-based data processing projects, including test planning, execution, and defect management using Jira and qTest.

Implemented Selenium-based test automation for web-based interfaces connected to data processing workflows, improving the coverage of UI testing for real-time data applications.

Extensive experience in integration of Informatica Data Quality (IDQ) with Informatica PowerCenter.

Extensive experience in Data Mining solutions to various business problems and generating data visualizations using Tableau, PowerBI, Alteryx.

Well knowledge and experience in Cloudera ecosystem such as HDFS, Hive, SQOOP, HBASE, Kafka, Data pipeline, Data analysis and processing with Hive SQL, IMPALA, SPARK, SPARK SQL.

Worked with different scheduling tools like Talend Administrator Console(TAC), UC4/Atomic, Tidal, Control M, Autosys, CRON TAB and TWS (Tivoli Workload Scheduler).

Experienced in design, development, Unit testing, integration, debugging and implementation and production support, client interaction and understanding business application, business data flow and data relations.

Using Flume, Kafka and Spark streaming to ingest real time or near real time data to HDFS.

Analysed data and provided insights with Python Pandas.

Worked on AWS Data Pipeline to configure data loads from S3 into Redshift.

Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI tools like Alteryx.

Experience in moving data between GCP and Azure using Azure Data Factory.

Developed Python scripts to parse the Flat Files, CSV, XML, JSON files and extract the data from various sources and load the data into data warehouse.

Developed Automated scripts to do the migration using Unix shell scripting, Python, Oracle/TD SQL, TD Macros and Procedures.

TECHNICAL SKILLS

Big Data Eco Systems

Hadoop, HDFS, Map Reduce, Hive, Yarn, SQOOP, HBase, Pig, Kafka, Flume, Spark, Scala, Oozie

Cloud Technologies

AWS, Microsoft AZURE, GCP

Programming

Python, C, C++, Java, PySpark, Scala

Data Warehousing

Cloudera, Integration suite

Applications

Salesforce, RightNow

Databases

MySQL, Oracle, PostgreSQL DynamoDB, MS SQL SERVER, Snowflake, Redshift,

MongoDB.

BI Tools

Business Objects XI, Tableau, PowerBI

Query Languages

SQL, PL/SQL, T-SQL

Scripting Languages

Unix, Windows PowerShell, Shell Scripting

RDBMS Utility

SQL Plus, SQL Loader

Reporting and ETL Tools:

Data Quality Testing:

Tableau, Power BI, AWS GLUE, SSIS, SSRS, Data Stage

Data Validation, ETL Testing, Regression Testing, Test Automation

EDUCATION:

Bachelor of computers from Osmania University, India – 2003

PROFESSIONAL EXPERIENCE

Client: PNC BANK. Cleaveland/Onsite Jul 2024 to Present

Role: Sr.Data Engineer

Description:

The PNC Bank Financial Services Group, Inc. is an American bank holding company and financial services corporation based in Pittsburgh, Pennsylvania.

Responsibilities:

The system is a full microservices architecture written in Python utilizing distributed message passing via Kafka with JSON as the data exchange format.

Create VPCs, subnets including private and public, NAT gateways in a multi-region, multi-zone infrastructure landscape to manage its worldwide operation.

Developed multiple ETL applications using Python, and Spark.

Provided production support for Python ETL application.

Developed ETL pipelines to perform analytics on data.

Responsible for managing and deployment of product WSGI server and Apache Server on Microsoft Azure Centos Server.

Developed oozie workflows to schedule spark and hive jobs.

Building Azure Data Lakes, which perform efficiently High Volume Data by using the functionalities of Azure Data Lake Analytics.

For ingestion of data from various on-prem API’s to a data lake or Blob Storage through Azure Data Factory

Developed rest APIs using Python with flask framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files..

Developed and implemented Python scripts to automate retrieval, parsing, and reporting of configuration parameters from Network Devices connected to customer networks.

Modified controlling databases using SQL generated via Python and Perl code, collected and analyzed data with Python programs using SQL queries from the database of data collected from the systems under test.

Created pl/pgsql functions to return tables, set of precursors, and texts to Java applications.

Created Row security policies on Tables to ensure data security while inserting, deleting, and selecting using data modification commands.

Created Users, User Attributes, Privileges, Groups, Roles, and Schemas and worked on setting search paths to Schemas.

Analyze and interpret SonarQube reports to identify areas for code improvement and optimization.

Inserted data into multiple tables using COPY.

Created, inserted, and extracted data from multi-dimensional arrays and composite types.

Created temporary tables in pl/pgsql programs to share data with Java programs.

Created Procedures/Functions to execute arbitrary SQL statements (Dynamic SQL) without a result set/with input parameters/result set.

Involved in writing application-level code to interact with APIs, and Web Services using AJAX, JSON, and hence building the type-ahead feature for zip code, Networth, city, and county lookup using jQuery, Ajax, and jQuery UI.

Developed front-end using Html, CSS, Javascript, React, Redux, Icinga Bootstrap.

Skilled in using collections in Python for manipulating and looping through different user-defined objects.

Participate in code reviews and actively contribute to discussions on code quality and improvements based on SonarQube feedback.

Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.

Automated different workflows, which are initiated manually with Python scripts and Unix shell scripting.

Experience working with Splunk, SCOM, and/or Nagios/Icinga in a large enterprise

Created and maintained technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.

Involved in the Web/Application development using Python, HTML5, CSS3, Icinga AJAX, JSON and JQuery.

Work on tasks related to DevOps, automation, and infrastructure using Groovy scripts.

Generated Python Django forms to record data of online users and used PyTest for writing test cases.

Prototype proposal for Issue Tracker website using Python/Django connecting MySQL as Database.

PROFESSIONAL EXPERIENCE:

Client: Charter Communication, St Louis, MO July 2022 to Jun 2024

Role: Sr.Data Engineer

Description: Charter Communications, Inc., is a telecommunications and mass media company with services branded as Spectrum.

Responsibilities:

Responsible for Design, Development, and testing of the database, Developed Stored Procedures, Views, and Triggers.

Participated in weekly release meetings with Technology stakeholders to identify and mitigate potential risks associated with the releases.

Architected AWS solutions encompassing Cloud migration, AWS EMR, DynamoDB, Redshift, and event processing utilizing lambda functions.

Used AWS EMR clusters for creating Hadoop and spark clusters. These clusters are used for submitting and executing Python applications in production.

Stored the time-series transformed data from the Spark engine built on top of a Hive platform to Amazon S3 and Redshift.

Generated a script in AWS Glue to transfer the data, utilized AWS Glue to run ETL jobs, run aggregation on PySpark.

Utilized Spark Structured Streaming for real-time data processing and analytics within distributed computing environments.

Worked on creating data pipelines with Airflow to schedule PySpark jobs for performing incremental loads and used for weblog server data.

Worked on SNS and SQS by pushing the events and calling the lambda functions.

Implemented Budget cuts on AWS, by writing Lambda functions to automatically spin up and shut down the Redshift clusters.

Using Amazon Identity Access Management (IAM) tool created groups & permissions for users to work collaboratively.

Led the development and automation of data quality testing frameworks using pytest and qTest, resulting in a 40% increase in data accuracy for real-time processing systems.

Automated end-to-end testing of data pipelines in AWS Glue and PySpark, improving the efficiency of data validation processes for large-scale datasets.

Developed and deployed AWS Lambda functions for automated serverless data checks, ensuring real-time data quality monitoring and reducing manual intervention by 50%.

Designed and executed data integrity checks and data validation strategies for complex Snowflake data warehouses, reducing the occurrence of data discrepancies by 35%.

Managed the entire QA lifecycle for cloud-based data processing projects, including test planning, execution, and defect management using Jira and qTest.

Implemented Selenium-based test automation for web-based interfaces connected to data processing workflows, improving the coverage of UI testing for real-time data applications..

Built pipeline using Spark Streaming to receive real time data from the Apache Kafka and stored stream data to HDFS.

Analyzed data stored in S3 buckets using SQL, PySpark and stored the processed data in Redshift and validated data sets by implementing Spark components.

Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis and transfer data between relational databases and Hadoop.

Involved in the development of spark applications using Scala and python for data transformations, cleansing, and validation using the Spark API.

Used Jenkins to work on distributed test automation execution over different environments as part of the Continuous Integration Process.

Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.

Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.

Used Spark SQL to process the vast amount of structured data and executed programs in python using Spark.

Worked on Cloudera distribution and deployed on AWS EC2 Instances.

Developed Spark Applications by using python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.

Built a continuous integration and continuous delivery (CI/CD) pipeline on AWS that helps in automating processes in the software delivery process.

Prepared scripts to automate ingestion process using Pyspark as needed through various sources such as API, AWS S3, Teradata and Redshift.

Used spark SQL to load data and created schema RDD on top of data which loads into hive tables and took care of structured using Spark SQL.

Created SQL scripts for tuning and scheduling and converted data from flat files into normalized data structures.

Worked on both batch processing and streaming data Sources. Used Spark Streaming and Kafka for streaming data processing.

Worked on writing different automation scripts to help developers to interact with SQS and SNS performance tuning of various processes running based on the SQS Queue.

Created a snowflake procedure to import the parquet files from an Amazon S3 bucket into a snowflake schema and utilized a Snowflake task to run the operation on a regular basis.

Used data profiling tasks and queries to perform a data analysis and profiling to find legacy data that needed to be cleaned up before migrations.

Responsible for developing a data pipeline with Amazon AWS to extract the data from weblogs and store it in HDFS.

Worked with different File Formats like text file, Parquet for HIVE querying and processing based on business logic.

Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.

Facilitated deployment of multi-clustered environment using AWS EC2 and EMR apart from deploying Dockers for cross-functional deployment.

Analyzed the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.

Deriving valuable insights from datasets through statistics and creative visualization Increasing data quality and operational efficiency by designing data pipelines, databases, processes, programs, and dashboards.

Worked on Agile Methodology in delivering agreed user stories on time for every Sprint.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, DynamoDB, Spark structed streaming, Amazon Glue, Athena,

Apache Spark, CI/CD, HBase, Airflow, Apache Kafka, HIVE, Map Reduce, Snowflake, python, SSRS.

Client: Sherwin-William, India Nov 2018 to Apr 2022

Role: Sr.Data Engineer

Description: Sherwin-Williams Company is an American Cleveland, Ohio–based company in the paint and coating manufacturing industry.

Responsibilities:

Designed and developed data pipelines using Azure Data Factory to ingest sales data from multiple retail stores and inventory data from third-party vendors. Implemented data cleansing and transformation logic to standardize the data and ensure consistency across data sources.

Worked on Data Migrating from different source systems to Azure Data Lake Storage.

Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements.

Created numerous pipelines in Azure using Azure Data Factory to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks, etc.

Developed a dynamic pipeline capable of handling various data sources and extracting data to multiple targets.

Implemented Azure Key Vault for secure storage and retrieval of connection strings, certificates and utilized the key vaults in Azure Data Factory during the creation of linked services.

Configured self-hosted integration runtime to copy files from an on-premises and loading them into ADLS Gen2 and Azure synapse analytics.

Created, and provisioned different Databricks clusters, notebooks, jobs, and autoscaling.

Created infrastructure for optimal extraction, transformation, and loading of data from a wide variety of data sources.

Creating pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark with Databricks.

Utilized PySpark and Spark SQL extensively for efficient data extraction, transformation, and aggregation tasks across multiple file formats for enabling the analysis and identification of data patterns.

Automated jobs using different triggers like Events, Schedules, and Tumbling in ADF.

Utilized Azure Logic Apps to create custom triggers and automate workflows based on external events and conditions.

Implemented notification mechanisms in Azure Logic Apps to send email notification in the event of pipeline failure.

Created and optimized Snowflake schemas, tables, and views to support efficient data storage and retrieval for analytics and reporting purposes.

Designed and implemented complex Snowflake stored procedures and triggers to automate data processing and improve data quality and accuracy.

Collaborated with cross-functional teams to design, develop, and optimize Snowflake data models and SQL scripts to support business intelligence and analytics initiatives.

Developed and maintained Snowflake SQL scripts to perform data analysis, extract and transform data, and load data into Snowflake data warehouses.

Designed and implemented efficient Snowflake joins between large datasets to support real-time data processing and analysis.

Performed performance tuning on Snowflake databases, including optimizing queries, indexing, and partitioning, to ensure optimal query performance and minimize query execution time.

Collaborated with cross-functional teams to understand data requirements and deliver solutions that align with business needs.

Utilized Azure DevOps for version control, continuous integration, and continuous deployment (CI/CD) of data solutions.

Implemented Azure DevOps pipelines for automating the deployment of data solutions, ensuring efficient and reliable delivery of data pipelines and workflows.

Environment: Hadoop, Azure Data Factory, Azure Databricks, Spark SQL, PySpark, SQL, Azure Data Lake Storage (Gen 2), Azure SQL Database, Azure Key Vault, Azure Logic Apps, Azure synapse analytics, Snowflake, Terraform, Azure DevOps, Azure Power Shell, Map Reduce, Hive, Spark, Python, Yarn, Tableau, Kafka, Sqoop, Scala, HBase.

Client: Vivint Solar. India Jan 2017 to Oct 2018

Role: Data Engineer

Description: Vivint Solar was an American provider of photovoltaic solar energy generation systems, primarily for residential customers.

Responsibilities:

development of large-scale, complex, cross-functional projects and experienced in designing, building and architecting data pipelines and end-to-end ETL and ELT processes for data ingestion and transformation.

Proficient in developing Python scripts for parsing XML, Json files and loading data in Azure Synapse Analytics Data Warehouse.

Using Azure Data Factory pipelines, terabytes of Medicare data were transferred from on-premises Mainframe, Teradata, and SQL databases to the cloud's Azure Data Lake. Hands-on experience in data ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing data in Azure Databricks.

Participating in the design, development, and testing of big data solutions on Azure.

Targeted root cause analysis of critical business and production problems by reviewing and analyzing various Python, MySQL, Azure SQL, and DB2 scripts and stored procedures.

Utilized SSIS to extract, transform and load data from various sources and implemented Azure Data Factory, Spark SQL, T-SQL, and U-SQL for Azure Data Lake Analytics.

Designed and implemented high-performance DB2 database solutions to support mission-critical applications

Worked on migrating two data-intensive applications from on-prem servers to Azure App Service environments while utilizing Azure SQL, Blob store, and Databricks to store and process the data.

Utilized the capabilities of Azure Functions to perform event-driven data transformations and data transfers from Oracle to Azure SQL DB.

Extensively used Azure DevOps to develop CI/CD pipelines for various projects. Proven ability in creating UNIX shell scripts for file transfer and manipulation.

Collaborated in data modeling, data mining, Machine Learning methodologies, advanced data processing, ETL optimization.

Developed a custom tool that could traverse through various dynamic directories containing patient record flat files, read data from the flat files, transform the data, and load it into the corresponding database using dynamic SQL. This reduces the necessary human intervention.

Collaborated with business partners and other data engineers to build complex database systems based on business requirements.

Developed custom tool to create near real-time data exchange between hospitals, clinics, pharmacies and patient. Programming Python, SQL, Java, Scala, R, HTML, CSS, AngularJS, Spring Boot, JS, JSP Data MySQL, SQL-Server, PostgreSQL, Oracle, Cosmos DB, ETL Pipelines, Teradata, MongoDB, Kafka, MapReduce, Azure Stream Analytics, Spark, Hive BI Tools Tableau, Power BI, Python Visualization Libraries Scripting Languages Python, Shell script, Pearl, R Cloud Tools Azure DevOps, Azure Active Directory, Azure Functions, Azure Data Lake, Azure Data Factory, Azure Webapps, Azure Service Bus, Databricks, ACR AI/ML Concepts Regression, Classification, Timeseries analysis, SVM, NLP, Pandas, Random Forest, Gradient-Descent, Gradient Boosting, Bayesian Statistics IDEs Jupyter, Eclipse, IntelliJ, Visual Studio, VS Code Methodologies Agile, Scrum, Waterfall Model, Iterative Development

Worked with DBA's and other engineers to document the normalization and denormalization techniques for various dimensional and relational databases. Developed time-based and event-based triggers for ADF pipelines, and created various types of reports using reporting services.

Built model training sets with labeling from varying claims data (CMS-BPCI, MA claims, CJR, and pharmacy) collected from various sources with the help of Azure Data Factory.

Extensively worked with pharmacy claims data, Medicare claims, and Medicare Advantage claims data.

Assisted in documentation and document review of data mapping and transformation techniques used for various business processes.

Client: Valassis Communications, India Jun 2015 to Dec 2016

Role: Data Engineer

Description: Valassis Communications, Inc. is a Livonia, Michigan-based company that provides media and marketing services in the United States, Europe, Mexico, and Canada.

Responsibilities:

Gathered requirements, system analysis, design, development, testing, and deployment.

Created a Python/Django-based web application using Python scripting and MySQL Workbench for the database, and HTML/CSS/JQuery and High Charts for data visualization of the served pages.

Developed user interface using CSS, HTML, JavaScript, and jQuery.

Created a database using MySQL, and wrote several queries to extract/store data.

Wrote Python modules to extract/load asset data from the MySQL source database.

Designed and implemented a dedicated MYSQL database server to drive the web apps and report on daily progress.

Provide feedback and recommendations to developers based on SonarQube analysis results to ensure code quality improvement.

Developed views and templates with Python and Django & view controller and templating language to create a user-friendly website interface.

Involved in creating a reusable component using React JS and Redux JS for DOM Manipulation.

Used Django framework for application development.

Created the entire application using Python, Django, MySQL, and Linux.

Used React JS to create Controllers to handle events triggered by clients and send a request to the server.

Enhanced existing automated solutions, such as the Inquiry Tool for automated Asset Department reporting and added new features, and fixed bugs.

Embedded AJAX in UI to update small portions of the web page, avoiding reloading the entire page.

Created the most important Business Rules which are useful for the scope of the project and- the needs of customers.

Improved performance by using a more modularized approach and using more in-built methods.

Wrote Python modules to view and connect the Apache Cassandra instance.

Involved in the development of Web Services using SOAP for sending and getting data from the external interface in the XML format.

Collaborate with other team members to integrate SonarQube findings into the development workflow for better code quality control.

Implemented PySpark using Python and utilizing data frames and temporary table SQL for processing of data.

Developed API modularizing existing Python modules with the help of pygmy libraries.

Wrote unit test cases for testing tools.

Coded test programs and evaluated existing engineering processes.

Collaborated with internal teams to convert end-user feedback into meaningful and improved solutions.

Environment: Python, C++, Django, Puppet, Jenkins, Pandas, Grafana/Graphite, React JS, GCP, MySQL, No SQL, Linux, HTML, CSS, jQuery, JavaScript, Apache, Linux, Git, Eclipse, PyTest, Oracle, Cassandra DB.

Client: Progrexion ASG. India Jun 2014 to Jun 2015

Role: Python Developer

Description: Progrexion is a wide range of credit repair services through its trusted consumer brands, Lexington Law Firm, Creditrepair.com, and folks.

Responsibilities:

Design and developed end-to-end ETL process from various source systems to Staging area, from staging to Data Mart.

Responsible for preparing the existing SQL platform for upgrades that were installed as soon as they were released.

Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.

Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.

Involved in writing SQL Queries and used Joins to access data from Oracle, and My SQL.

Developed Talend jobs to populate the claims data to data warehouse - star schema.

Experienced in writing expressions with in tmap as per the business need and Used ETL methodologies and best practices to create Talend ETL jobs.

Environment: Talend 5.5/, Oracle 11g, Teradata SQL Assistant, HDFS, MS SQL Server2012/2008, PL/SQL, Agile Methodology

Contact this candidate