Azure Data Business Intelligence

Location:

Remote, OR

Posted:

October 04, 2023

Contact this candidate

Resume:

Tejasri

Email – **************@*****.***

LinkedIn: http://www.linkedin.com/in/tejasrireddyc7

Phone: 469-***-****

Professional Summary

* ***** ** ** ********** in Analysis, Design, Developing and Testing and Implementation of business application systems.

Experienced in Design, Development, Maintenance, Enhancements and Production Support of Business Intelligence Enterprise Applications.

Well versed with Bigdata on AWS cloud services i.e., EC2, S3, Glue, Anthena, DynamoDB and RedShift Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-SQL, and Pig Hive.

Data Engineer with working experience in AWS, Spark, Hadoop Ecosystem, Python for Data science, data pipeline and Tableau.

Strong experience in AWS S3, EMR, EC2, Glue, Lambda, IAM, Kinesis, RDS, Route 53, VPC, Code build, Code pipeline, Cloud watch and Cloud formation.

Have adequate experience in developing, designing, and architecting solutions in ETL and Data-Warehouse, Business Analytics, Business Intelligence, Big Data Analytics, Cloud Infrastructure and Security.

Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.

Excellent knowledge in building data engineering pipelines, automating, and fine-tuning both batch and real time data pipelines.

Responsible for creating complex Data lakes both in on-prem and cloud infrastructure.

Excellent knowledge on architecture of Distributed storage and parallel computing frameworks to deal with large datasets in data engineering space.

Experience in designing and developing production ready data processing applications in Spark using Python.

Strong experience creating Spark applications for performing various kinds of data transformations like data cleansing, de-normalization, various kinds of joins, data aggregations etc.,

Good experience fine-tuning Spark applications utilizing various concepts like Broadcasting, increasing shuffle parallelism, caching/persisting data frames, sizing executors appropriately to utilize the available resources in the cluster effectively etc.,

Experienced in working with Amazon Web Services (AWS) using S3, EMR, Redshift, Athena, Glue catalog etc.,

Good exposure with Agile software development process.

Worked on ETL Migration services by creating & deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog & queried from Athena.

Experience in Analytics & cloud migration from on-premises to AWS Cloud with AWS EMR, S3, & DynamoDB.

Experience working on Azure Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data Bricks, Data factory, Logic Apps and SQL Data warehouse and GCP services Like Big Query, Dataproc, Pub sub etc

Experience in converting the Stored Procedures logic into ETL requirements.

Well versed in writing complex queries using JOINS, Subqueries, Correlated Subqueries, PL/SQL procedures, functions, triggers and cursors.

Worked extensively on clusters hosted on Amazon Web Services (AWS) (S3, EC2, RDS, Redshift, and Elastic Map Reduce) to implement big data analytics on cloud computing platforms.

Experience in writing UNIX shell scripts.

Experienced in scheduling the ETL jobs on daily, weekly, monthly and yearly basis and managing packages the in SSISDB catalog with environments, automated execution with SQL agent jobs.

Technical Skills:

Big Data Technologies: HDFS, Map Reduce, HBase, Hive, Sqoop, Spark.

Languages: Java, SQL, python.

Cloud Computing: AWS, Azure

Databases: Oracle, MY SQL, Db2

ETL Tools: Talend Big Data (6.2, 6.4), Talend 7.0, Talend AWS.

Scripting Languages: Shell Scripting.

PROFESSIONAL EXPERIENCE

AIG Insurance, Houston, TX June 2021 – June2023

Sr. Data Engineer/ETL Developer

Responsibilities

Involved in building the ETL architecture and Source to Target mapping to load, transform, cleanse data into snowflake Data warehouse.

Designed and customized data models for Data warehouse supporting data from multiple sources on real time.

Designed ETL process using Talend Tool to load from Sources to Targets through data Transformations.

Implemented Spark using PySpark and utilizing Data frames and Spark SQL API for faster processing of data.

Involved in ingestion, transformation, manipulation, and computation of data using kinesis, SQL, AWS glue and Spark

Support the business development lifecycle (Business Development, Capture, Solution Architect and Pricing).

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Worked on Spark Data sources, Spark Data frames, Spark SQL and Streaming using PySpark.

Designed and implemented SQL queries for reporting and complex solution development.

Suggested solutions for performance improvement which helped in drastic improvement in performance.

Developed tableau dashboards for financial analysis.

Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena and Glue.

Hands on Experience in migrating datasets and ETL workloads with Python from On-prem to AWS Cloud services.

Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.

Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.

Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.

Developed PySpark based pipelines using spark data frame operations to load data to EDL using EMR for jobs execution & AWS S3 as storage layer.

Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.

Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.

Worked on automating the infrastructure setup, launching and termination EMR clusters etc.

Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.

Used Python for SQL/CRUD operations in DB, file extraction/transformation/generation.

Monitored the SQL scripts & modified them for improved performance using PySpark SQL.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Apache Spark, HBase, Apache Kafka, HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python.

Freddie MAC Mc Lean, VA Feb 2019 –June 2021

Sr. Data Engineer /ETL Developer

Responsibilities

Based on the requirement for knowing type of feed, source file format, business and transformations rules required for the process.

Developed jobs with bigdata components like HDFS and HIVE components to capture raw data into Hadoop system.

Designed partial restart ability of workflows and logging mechanism.

Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.

Developed jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning and also developed jobs for moving the outbound files to different locations.

Worked in AWS environment for development and deployment of Custom Hadoop Applications.

Developed data pipeline using Map Reduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.

Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.

Manage and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera &Hortonworks HDP.

Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift

Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.

Participated in Designing databases (schemas) to ensure that the relationship between data is guided by tightly bound Key constraints.

Writing PL/SQL stored procedures, function, packages, triggers, view to implement business rules into the Application level.

Extensive experience in Data Definition, Data Manipulation, Data Query and Transaction Control Language

Understanding the requirements by interacting with business users and mapping them to design and implement it following the AGILE Development methodology.

Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Experience in designing and creating Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions

Excellent in learning and adapting to new technologies.

Developed warmup programs to load recently logged in user profile information into MSSQL DB.

Performed manual testing and used logging tools like Splunk, putty to read the application logs for elastic search.

Experience working with Key Range Partitioning in IICS, handling File loads with concept of File list option, creating fixed with file format and more, file listener and more.

Experience integrating data using IICS for reporting needs.

Performed operations related to sizing, query optimization, backup and recovery, security and upgrades including administration functions as part as profiling under assistance.

Developed automated workflows for monitoring the landing zone for the files and ingestion into HDFS in Bedrock Tool and Talend.

Used Hive tables for storing the Metadata of files and maintaining the file patterns.

Environment: Snowflake, Pig, Spark, Scala, Hive, Kafka, Python, Airflow, JSON, Parquet, CSV, Code cloud, AWS.

Verizon, Irving, TX May 2018 – Feb 2019

Data engineer/ETL Developer

Responsibilities

Parsing the JSON data using the Snowpipe to load data into a variant table in snowflake.

Created Streams and Tasks to populate JSON data into a Flat tables in snowflake

Migrated the existing stored procedures into a java bases Snowflake stored procedures.

Data Analysis and Documentation, Participate in PRD reviews business impacts, feedback and final approvals.

Collaboration and Communication in an Agile Scrum development environment.

PL/SQL, SQL Tuning and Optimization of current and existing applications.

Collaborate with the business analysts to discuss issues in interpreting the requirements.

Perform Integration and System testing, and coordinate with UAT testing.

Preparing and executing Unit Test cases and documenting the execution results

Participate in Testing Effort estimates sessions and provide timely feedback on progress of the testing activity.

Extensively working on Azure Data Factory for pipelines, Dataflow, scheduling, Synapse Analytics, Azure Data warehouse, Azure SQL server and Azure Data sets.

Worked on data Design, Json Data modeling, develop schemas and test data implementation.

Diagnose ETL and database related issues, perform root cause analysis, and recommend corrective actions to management.

Work with a project team to support the design, development, implementation, monitoring, and maintenance of new ETL programs.

Validate tests by cross-checking data in backend on SQL Serve using SQL Queries, which involves verification of data mapping, data integration and data validations in database, tables, indexes, keys, triggers and data consistence check.

Execute system quality assurance activities on complex n-tier systems to ensure the quality Control of test plan design, test execution, defect tracking and implementation plans.

Execute data reports for overview of Alliance.org Medical transactions by using ETL Tools like SSIS, SQL Server and Informatics.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Apache Spark,, Hive, HDFS, SQL Navigator, Toad, Putty, Winscp.

Polaris, HYD, India July 2016 –feb2018

Data Engineer

Responsibilities

Migrated the on-Premises RDBMS to Amazon RDS and modified the Jaspersoft packages, Reports and other DB objects accordingly.

Migrated the on-Premises Big Data to AWS.

Support the business development lifecycle (Business Development, Capture, Solution Architect and Pricing).

Strong AWS knowledge on S3, EMR, ECS, EC2, Glue, Lambda, IAM, Kafka, RDS, Route 53, VPC, Code build, Code pipeline, Cloud watch and Cloud formation.

Adequate knowledge in Hadoop Ecosystem and its major components and Pyspark, Presto, Airflow, Hive, Ranger and Zeppelin.

Designed, built, and maintained data integration programs in a Hadoop and RDBMS environment, working with both traditional and non-traditional source systems, as well as RDBMS and NoSQL data stores for data access and analysis, Extensive experience with Python, including the creation of a custom ingest framework.

Used the Spark API to analyze Hive data in conjunction with the EMR Cluster Hadoop Yarn

AWS Cloud Formation templates were designed to create VPCs, subnets, and NAT to ensure the successful deployment of Web applications and database templates.

Creating S3 buckets and managing S3 bucket policies, as well as using S3 buckets and Glacier for storage and backup on AWS.

Designed and created ETL code installations, aiding in transitions from one data warehouse to another.

Implemented system enhancements to propose, design and develop solutions to fulfill requests and address problems.

Migrated Hive and MapReduce jobs from on-premise MapR to AWS cloud using EMR and Quble

Experience in installing, configuring, supporting, and managing Hadoop clusters using HDP and other distributions.

Expertise with AWS databases such as RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)

Designed and implemented TSQL queries for reporting and complex solution development.

Interpreted data models for conversion into ETL diagrams and code.

Handled SQL Server, SSIS and AWS training for the development team.

Environment: Spark, S3, Map Reduce, SQL, Airflow, SSIS, Ranger, Python, AWS Glue.

EDUCATIONAL QUALIFICATION

Bachelor’s in CSE from JNTUH 2016.

Contact this candidate