Data Processing Sql Server

Location:

Allen, TX

Posted:

September 29, 2023

Contact this candidate

Resume:

SAI DEEKSHITH

AWS SNOWFLAKE DATA ENGINEER 940-***-**** *************@*****.***

PROFESSIONAL SUMMARY

Highly skilled Data Engineer with 9+ years of experience in AWS, Snowflake, and Big Data technologies.

Designed and implemented data pipelines using various AWS services such as EC2, S3, Redshift, Glue, Lambda functions, Step Functions, EMR, CloudWatch, DynamoDB, RDS, Kinesis, SNS, and SQS.

Successfully executed large-scale data solutions using EMR, Glue, Lambda, and Redshift.

Utilized Glue Catalogue, Crawlers, and Glue Dynamic Data Frame APIs to build efficient data pipelines.

Developed event-based pipelines using S3 Lambda trigger events.

Implemented on-demand tables on S3 files using Lambda functions and AWS Glue.

Built real-time streaming data pipelines using Kinesis and Spark Streaming.

Created scalable Data Warehouse solutions using Redshift Spectrum.

Executed distributed data pipelines at a large scale using EMR.

Experienced in working with various AWS services such as S3, EC2, EMR, SNS, SQS, Lambda, Redshift, Data Pipeline, Athena, AWS Glue, S3 Glacier, CloudWatch, CloudFormation, IAM, AWS Single Sign-On, Key Management Service, AWS Transfer for SFTP, VPC, SES, Code Commit, and Code Build.

Proficient in SnowSQL for complex data manipulation and the development of efficient data pipelines.

Expertise in optimizing query performance and scalability in Snowflake through partitioning strategies and multi-cluster warehouses.

Strong knowledge of real-time data ingestion and processing using virtual warehouses, caching, and Snowpipe.

Extensive experience leveraging advanced data analysis techniques such as window functions, arrays, regular expressions, and JSON parsing.

Highly skilled in Snowflake scripting for automating ETL processes and data pipelines.

Proficiency in Hadoop, Spark, Hive, MapReduce, and PySpark for big data processing.

Experience in developing and optimizing real-time data processing applications with Spark and Spark Streaming.

Skilled in job orchestration using IBM Tivoli, Control-M, Oozie, and Airflow.

Strong database development skills in Teradata, Oracle, and SQL Server.

Proficiency in version control systems like Git, GitLab, and VSS for code repository management and collaboration.

Demonstrated proficiency in handling semi-structured and unstructured (audio/video) data in JSON file format.

Solid understanding of networking concepts including BPCs, Subnets, DNS, VPC, and Gateways.

Developed Lambda functions with Boto3 to optimize EC2 resource costs by deregistering unused AMIs.

Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries.

Automated quality control processes using Python and Boto3 for continuous integration.

Designed and implemented reusable ETL/ELT data pipeline solutions.

Developed production data pipelines using Apache Airflow and Jenkins.

Improved performance in Redshift by implementing optimization techniques such as Distribution Keys, Sort Keys, and Partitioning.

Created data visualizations using Tableau.

Implemented data pipeline jobs using Databricks workflow API. TECHNICAL SKILLS

AWS Services AWS S3, AWS Lambda, AWS EC2, Amazon Redshift, AWS Sage Maker, AWS Glue, AWS Athena, AWS EMR, AWS RDS, AWS DynamoDB, AWS Kinesis, AWS Step Functions, AWS IAM, AWS CloudWatch, AWS AppFlow, AWS CloudFront, Route 53. Snowflake Data Processing, Data Analysis, Data Manipulation, Scripting, Time-Travel, Fail-Safe, Virtual Warehouses, Zero Copy Cloning, Auto Scaling. Datawarehouse Snowflake, Amazon Redshift.

ETL Tools Informatica, Talend, SSIS, IBM DataStage, Apache Spark, Blendo, Matillion. Hadoop Distribution Cloudera, Horton Works

Big Data Technologies HDFS, SQOOP, PySpark, hive, MapReduce, spark, spark streaming, HBASE. Languages Java, SQL, PL/SQL, Python, HiveQL, Scala, SnowSQL. Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS. Database Teradata, Oracle, SQL server.

Scheduling IBM Tivoli, control-m, oozie, airflow.

Version Control GIT, GitHub, VSS, SVN.

Methodology Agile, Scrum.

Data Visualization Power BI, Tableau, QlikView, Looker. IDE & Build Tools,

Design

Eclipse, Visual Studio.

EDUCATION

Bachelor’s in mechanical engineering from Vignana Bharathi Institute of Technology, India (2011).

Master’s in computer science From University Of North Texas (2013). WORK EXPERIENCE

Role: AWS Data Engineer March 2022 – Till Now

Client: International Business Machines (IBM) New York Responsibilities:

Designed and implemented data ingestion and storage solutions on AWS using S3, Redshift, and Glue.

Configured and managed multiple S3 buckets for secure data storage and organization.

Created and scheduled Glue Crawlers for automatic discovery and cataloging of data stored in various formats.

Developed serverless functions using Lambda for data transformations and business logic execution.

Provisioned and managed EC2 instances for running custom applications and services.

Utilized EC2 instances when more control over the computing environment was required.

Developed ETL workflows with AWS Glue to extract, transform, and load data into Redshift from various sources.

Demonstrated expertise in data modeling and schema design within Redshift to support complex analytical queries and ensure data accuracy and consistency.

Strong knowledge of query performance tuning in Redshift, employing distribution keys, compression, and workload management to achieve faster data retrieval and improved overall performance.

Leveraged Athena for ad-hoc querying and analysis on the data lake.

Enabled quick insights without the need for pre-defined schemas or data transformations.

Configured and deployed EMR clusters with Hadoop and Spark for batch data processing.

Set up and administered multiple RDS instances for relational database management.

Designed and implemented NoSQL data models in DynamoDB for high-velocity data storage.

Leveraged DynamoDB for low-latency data retrieval and scalability.

Utilized Kinesis for real-time data streaming and ingestion and processed continuous data streams from various sources in real-time.

Orchestrated complex workflows with Step Functions by defining state machines.

Managed AWS Identity and Access Management for secure user access and permissions.

Controlled access to AWS resources and maintained security standards.

Configured CloudWatch alarms and metrics to monitor the health and performance of AWS resources.

Implemented data integration and automation using AWS AppFlow to securely transfer data between AWS services and third-party applications.

Implemented CloudFront for improved data delivery and reduced latency.

Enhanced user experience when accessing data visualizations and reports.

Managed DNS configurations and routing using Route 53 for efficient application and service deployment.

Optimized query performance by selecting appropriate distribution keys to evenly distribute data across compute nodes, reducing data movement.

Defined sort keys to enhance data retrieval efficiency, enabling quick access to sorted data and reducing sorting overhead during queries.

Utilized WLM to prioritize and manage query execution, allocating resources based on query priorities for faster execution.

Restructured complex SQL queries to minimize joins and aggregations, leveraging Redshift's parallel processing for faster execution.

Performed regular VACUUM and ANALYZE operations on tables to reclaim space and maintain accurate statistics, optimizing query planning.

Analyzed query plans using EXPLAIN commands to identify bottlenecks and make necessary adjustments for improved performance.

Developed complex SnowSQL queries to extract, transform, and load data from various sources into Snowflake.

Improved query performance and data retrieval by implementing partitioning techniques in Snowflake.

Configured and effectively managed multi-cluster warehouses in Snowflake to handle high-concurrency workloads.

Defined roles and access privileges in Snowflake to ensure proper data security and governance.

Implemented Snowflake caching mechanisms, reducing data transfer costs, and improving query performance.

Utilized Snowpipe for real-time data ingestion into Snowflake, ensuring continuous data availability and automated data loading processes.

Utilized Snowflake's time travel feature for auditing and analyzing historical data.

Implemented data engineering solutions using Databricks platform, leveraging its powerful Apache Spark- based capabilities for scalable data processing, ETL workflows, and advanced analytics.

Developed data processing pipelines using Hadoop, including HDFS, Sqoop, Hive, MapReduce, and Spark.

Implemented Spark Streaming for real-time data processing and analytics.

Implemented scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow.

Designed and configured workflows for data processing and ETL pipelines.

Designed and developed database solutions using Teradata, Oracle, and SQL Server.

Utilized Git and GitLab for version control and collaboration. Environment: AWS S3, AWS AppFlow, AWS Glue, AWS SageMaker, AWS Redshift, AWS Athena, AWS CloudWatch, AWS IAM, AWS S3, AWS EMR, AWS SNS, AWS SQS, AWS Route53, Databricks, Apache Spark, Hive, Kinesis, HDFS, Sqoop, Kafka, Zookeeper, Oozie, PySpark, Ambari, MySQL, Cloudera, Teradata, Oracle, Python, SQL, Matillion, JIRA, IBM Tivoli, Control-M, Airflow. Role: Snowflake Data Engineer Nov 2020 – Feb 2022

Client: USDA Kansas City, Missouri

Responsibilities:

Designed and implemented efficient Snowflake stages for loading data from diverse sources into Snowflake tables. Managed and created various types of Snowflake tables, including transient, temporary, and persistent tables.

Optimized Snowflake warehouses by selecting appropriate sizes and configurations to achieve optimal performance and cost efficiency.

Developed complex SnowSQL queries to extract, transform, and load data from various sources into Snowflake.

Improved query performance and data retrieval by implementing partitioning techniques in Snowflake.

Configured and effectively managed multi-cluster warehouses in Snowflake to handle high-concurrency workloads.

Defined roles and access privileges in Snowflake to ensure proper data security and governance.

Implemented Snowflake caching mechanisms, reducing data transfer costs, and improving query performance.

Utilized Snowpipe for real-time data ingestion into Snowflake, ensuring continuous data availability and automated data loading processes.

Utilized Snowflake's time travel feature for auditing and analysing historical data.

Utilized regular expressions in Snowflake for pattern matching and data extraction tasks.

Automated data pipelines, ETL processes, and data transformations using Snowflake scripting.

Demonstrated proficiency in designing and implementing Machine Learning (ML) workflows with a focus on optimizing data access and processing for enhanced efficiency and accuracy.

Showcased skills in building ML pipelines that ensured fast data retrieval and processing, facilitating seamless integration of cutting-edge algorithms and models for data-driven insights.

Facilitated data consolidation, health information, analytics-driven decision-making using Ab Initio for data integration and analysis.

Designed and optimized ETL workflows using AWS Glue, extracting, transforming, and loading data from diverse sources into Redshift for efficient data processing.

Configured and fine-tuned Redshift clusters to achieve high-performance data processing and streamline querying.

Integrated AWS SNS and SQS to enable real-time event processing and efficient messaging.

Utilized AWS Athena for on-demand data analysis and querying on data stored in S3.

Designed and implemented data streaming solutions using AWS Kinesis, enabling real-time data processing and analysis.

Efficiently managed DNS configurations and routing using AWS Route53, ensuring the efficient deployment of applications and services.

Implemented robust IAM policies and roles to ensure secure user access and permissions for AWS resources.

Developed and optimized data processing pipelines using Hadoop ecosystem technologies such as HDFS, Sqoop, Hive, MapReduce, and Spark.

Implemented Spark Streaming for real-time data processing and advanced analytics.

Demonstrated expertise in scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow for executing data processing and ETL pipelines.

Designed and developed database solutions using Teradata, Oracle, and SQL Server, including schema design and optimization, stored procedures, triggers, and cursors.

Proficiently utilized version control systems like Git, GitLab, and VSS for efficient code repository management and collaborative development processes.

Implemented Time Travel functionality in data pipelines, enabling historical data analysis and providing the ability to track changes and recover data at any point in time.

Utilized advanced optimization techniques to enhance data processing performance, reducing query execution time and improving overall efficiency of data pipelines.

Developed and maintained a robust Metadata Manager system, ensuring accurate and comprehensive documentation of data sources, transformations, and lineage for improved data governance and compliance.

Proficient in designing and implementing event monitoring solutions for data engineering pipelines. Successfully set up real-time monitoring and alerting mechanisms to track data pipeline executions, job statuses, and data quality issues. Proactively resolved incidents by analysing event logs and performing root cause analysis, ensuring high data integrity and pipeline reliability. Environment: AWS, AWS S3, Redshift, EMR, SNS, SQS, Kinesis, Athena, Python, SQL, Glue, Sqoop, Ab Initio, CloudWatch, Event monitoring, IAM, Apache Spark, Hive, HDFS, Cloudera, Kafka, Zookeeper, Oozie, PySpark, SnowSQL, Ambari, Airflow, Control-M, JIRA, MYSQL, Teradata, Oracle, Time travel, Metadata Manager, Optimizer. Role: Big Data Developer May 2018 – Oct 2020

Client: Change Health Care Nashville, TN

Responsibilities:

Imported data from MySQL to HDFS regularly using Sqoop for efficient data loading.

Proficient in utilizing Talend ETL tool to design and implement complex data integration workflows, enabling seamless extraction, transformation, and loading of data from diverse sources into target systems for enhanced data analysis and business intelligence.

Demonstrated expertise in leveraging Talend's data integration capabilities to streamline data migration processes, ensuring efficient data synchronization and maintaining data quality, contributing to improved data governance and decision-making within the organization.

Performed aggregations on large volumes of data using Apache Spark and Scala, storing the results in the Hive data warehouse for further analysis.

Extensive experience with Data Lakes and big data ecosystems, including Hadoop, Spark, Hortonworks, and Cloudera and also efficiently loaded and transformed structured, semi-structured, and unstructured datasets.

Developed and executed Hive queries to analyse data and meet specific business requirements.

Leveraged HBASE integration with Hive to construct HBASE tables in the Analytics Zone.

Utilized Kafka and Spark Streaming for processing streaming data in specific use cases.

Created data pipelines using Flume and Sqoop to ingest customer behavioural data into HDFS for analysis.

Utilized big data analytics tools like Hive and MapReduce for Hadoop cluster analysis.

Implemented a data pipeline using Kafka, Spark, and Hive for ingestion, transformation, and analysis of data.

Wrote Hive queries and utilized Hive QL to simulate MapReduce functionalities for data analysis and processing.

Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for efficient data processing.

Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes.

Implemented CI/CD pipelines for building and deploying projects in the Hadoop environment.

Utilized JIRA for issue and project workflow management.

Utilized PySpark and Spark SQL for faster testing and data processing in Spark.

Employed Spark Streaming to process streaming data in batches for efficient batch processing.

Leveraged Zookeeper to coordinate, synchronize, and serialize servers within clusters.

Utilized the Oozie workflow engine for job scheduling in Hadoop.

Utilized PySpark in SparkSQL for data analysis and processing.

Used Git as a version control tool to maintain the code repository. Environment: Sqoop, MYSQL, HDFS, Apache Spark, Scala, Hive, Hadoop, Python, Cloudera, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, Talend, RDBMS, Python, PySpark, Ambari, JIRA. Role: Data Warehouse Developer March 2016 – April 2018 Client: JP Morgan Chase West Haven, CT

Responsibilities:

Proficient in implementing and optimizing ETL processes, utilizing SQL, PL/SQL, and stored procedures to extract, transform, and load data from diverse sources into the

data warehouse, ensuring data integrity, quality, and performance for critical decision-making.

Developed and scheduled automated processes, including creating jobs, using SQL Mail Agent, and setting up alerts and DTS/SSIS packages.

Demonstrated expertise in managing and updating Erwin models for logical and physical data modeling of Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB, aligning with user requirements.

Utilized TFS for source control and tracking environment-specific script deployments, facilitating collaboration and version control.

Efficiently exported Erwin data models to PDF format, publishing them on SharePoint for user access to up-to- date documentation.

Proficient in developing, administering, and managing databases such as Consolidated Data Store, Reference Database, and Actuarial Data Mart.

Strong skills in writing triggers, stored procedures, and functions using Transact-SQL(T-SQL), ensuring optimized database performance and data integrity.

Deployed scripts across different environments based on Configuration Management and Playbook requirements, ensuring consistent and reliable deployments.

Managed file groups, table/index associations, and conducted query optimization and performance tuning to enhance database performance.

Proven track record of optimizing data workflows, ensuring data quality, and driving data-driven insights through efficient utilization of C# and other data engineering tools, contributing to improved data reliability and overall business outcomes.

Tracked and resolved defects using Quality Center for effective issue management.

Maintained users, roles, and permissions within the SQL Server. Environment: SQL Server 2008/2012 Enterprise Edition, C#, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server 2007, Oracle 10g, visual Studio 2010 Role: Data Warehouse Developer Jan 2014 – Feb 2016 Client: Mayo Clinic Rochester, MN

Responsibilities:

Demonstrated experience in Agile Scrum methodology, actively participating in daily stand-up meetings and collaborating with cross-functional teams for successful project delivery.

Proficient in utilizing Visual SourceSafe for version control and code management in Visual Studio 2010, coupled with adept usage of project tracking tools like Trello for monitoring and facilitating project progress.

Expertise in designing and optimizing data warehouse solutions, leveraging relational databases (Oracle, SQL Server) to efficiently store and process large-scale datasets for analytics and reporting.

Created interactive reports in Power BI with drill-through and drill-down capabilities, drop-down menus, sorting, and subtotals, enabling comprehensive data exploration.

Leveraged the data warehouse to develop a Data Mart, feeding downstream reports and designing a user access tool for ad-hoc reporting and query analysis, empowering self-service data insights.

Deployed SSIS packages and orchestrated efficient job scheduling for streamlined data integration processes.

Experienced in architecting and constructing Cubes and Dimensions with diverse data sources and architectures, utilizing MDX scripting to enhance Business Intelligence capabilities.

Developed SSIS jobs for report automation and cube refresh packages.

Proficient in deploying SSIS packages to production and utilizing package configurations for environment independence.

Experienced with SQL Server Reporting Services (SSRS) for authoring, managing, and delivering paper-based and interactive web-based reports.

Developed stored procedures and triggers to ensure consistent data entry into the database.

Utilized Snowflake to securely share data without the need for data transfer or developing custom pipelines. Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation Server, GIT.

Contact this candidate