Sravan Kumar Bajjuri
Senior Data Engineer
Phone : 937-***-****
E-mail : *********************@*****.***
PROFESSIONAL SUMMARY:
Seasoned Data Engineer with over 5 years of expertise in architecting and implementing complex data solutions on Azure and AWS Clouds, specializing in designing robust data pipelines using Azure services.
Extensive experience as a Cloud Data Engineer, proficient in both Microsoft Azure and AWS technologies, including Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SQL Data Warehouse), Azure SQL Database, Azure Blob Storage, Azure Key Vaults, Azure DevOps, Azure Analytical Services, Polybase, Azure Cosmos NoSQL DB, Azure HDInsight, and AWS services such as EMR, Redshift, Glue, and Lambda.
Experience working with Azure Logic APP, Azure Functions (Lambda) and Azure Key Vaults.
Hands-on experience in Azure Analytics Services – Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLA), Azure SQL DW, Azure Data Factory (ADF), Azure Databricks (ADB) etc.
Experience in building ETL (Azure Databricks/ADF) and ELT (Azure Databricks/ADF) data pipelines.
Experience in building the Orchestration on Azure Data Factory and Azure Databricks for scheduling purposes using ADF Triggers and Databricks Workflows.
Experience in working with Azure Data Factory Pipelines, Data Flows, Linked Services and Datasets.
Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc.
Skilled in Data Integration and ETL processes, utilizing tools like Informatica, AWS Glue, and Azure Data Factory (ADF) to streamline data movement and transformation.
Well-versed in Version Control and CI/CD practices, utilizing Git, GitHub, Bitbucket, Jenkins, AWS Code
Experience in working with ADFS - dbutils fs, Notebook, Widgets, Mount, and Secret Scopes in Databricks.
Experienced in working with different data formats CSV, JSON, Parquet, JDBC and Delta.
Experience in working with Delta Tables and Delta File system using Azure Databricks.
Hands-on experience in scripting skills in Python, Linux, and UNIX Shell.
Strong knowledge in Spark ecosystems such as Spark core, Spark SQL, Spark Streaming libraries.
Developed Spark RDD and Spark Data Frame API for Distributed Data Processing.
Big Data - Hadoop (MapReduce & Hive), Spark (SQL, Streaming), Azure Cosmos DB, SQL Datawarehouse, Azure Data Factory.
Highly experienced in importing and exporting data between HDFS and Relational Systems like MySQL and Teradata using Sqoop.
Experience working on analytics data to visualize and analyze data and transform as per requirements.
Expertise in creating and modifying database objects like Tables, Indexes, Views, Triggers, Synonyms, Sequences and Materialized views using SQL.
Experience in data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, and Fact and Dimension tables.
Solid experience in Data Warehousing, Data Lake and LakeHouse best practices working with Metadata, repositories, and experience within a disciplined lifecycle methodology.
Experience in using various packages in python like pandas, NumPy, csv, json, pyodbc, os, xlrd etc.
Happy to work with the team who are in middle of the road with some Big Data challenges for both on- prem and cloud.
Working with relative ease with different working strategies like Agile, Waterfall and Scrum.
Experience in Agile Methodologies and extensively used Jira for Sprints and issue tracking.
Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.
TECHNICAL SKILLS:
Programming Languages
Python, PySpark, T-SQL, LINUX and UNIX Shell Scripting, Azure PowerShell
Cloud Technologies
Azure, AWS, GCP
Big Data Technologies
Hadoop, HDFS, HDInsight, Map Reduce, YARN, Pig, HBase, Spark, Zookeeper, Hive, Oozie, Sqoop, Flume, Kafka, Scala
Schedulers
Airflow, Oozie, TIDAL
IDE and Tools
Azure Data Studio, SQL developer, Teradata SQL Assistant, Microsoft Visual Studio, Visual Studio Code, Eclipse, IntelliJ, R Studio, Jupyter Notebook, GitHub
Methodologies
WaterFall, Agile/Scrum, SDLC
Version Controls
GitHub, Azure Repos, Bitbucket
Databases
Snowflake, Teradata, Oracle & MySQL, Microsoft SQL Server, RDBMS, MongoDB, Azure SQL, Azure Synapse, MS Excel
Operating Systems
Windows, Linux, Unix, Centos, Ubuntu
Azure Services
Azure Data Factory, Azure Data Bricks, snowflake, Logic Apps, Functional App
AWS
EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, SQS, DynamoDB, Redshift, PostgreSQL
EDUCATION DETAILS:
Master of Science in Information Technology.
PROFESSIONAL EXPERIENCE:
Access Healthcare, Spring Hills - FL Oct 2022 – Present
Senior Data Engineer
Responsibilities:
Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements.
Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move and amp Transform, Copy, filter, for each, Databricks etc.
Maintain and provide support for optimal pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
Performed data flow transformation using the data flow activity.
Used Polybase to load tables in Azure Synapse.
Implemented Azure, self-hosted integration runtime in ADF.
Improved performance by optimizing computing time to process the streaming data by optimizing the cluster run time.
Perform ongoing monitoring, automation, and refinement of data engineering solutions.
Scheduled, automated business processes and workflows using Azure Logic Apps.
Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure EventHub, and Service Bus Queue.
Created Linked services to connect the external resources to ADF.
Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
Used Azure DevOps pipelines to build and deploy different resources (Code and Infrastructure) in Azure.
Ensure the developed solutions are formally documented and signed off by business.
Worked with team members to resolve any technical issue, Troubleshooting, Project Risk & Issue identification, and management.
Worked on the cost estimation, billing, and implementation of services on the cloud.
Work closely across teams (Support, Solution Architecture) and peers to establish and follow best practices while solving customer problems.
Environment: Azure Data Factory, Azure Data Lake, Azure Synapse Analytics(DW), Azure Devops, Snowflake, PowerBI, SharePoint, Spark, PySpark, Hadoop, Hive, HDFS, PyTest, Kafka, MySQL, Eclipse, Jira, GitHub, Jenkins, PyCharm.
American International Group, New York - NY Feb 2021 - Jul 2022
Senior Data Engineer
Responsibilities:
Installing, configuring, and maintaining Data Pipelines.
Transforming business problems into Big Data solutions and defining Big Data strategy and Roadmap.
Designing the business requirement collection approach based on the project scope and SDLC methodology.
Worked on AWS Elastic Beanstalk for fast deploying of various applications developed with Java, Node.JS, and Python on familiar servers such as Apache.
Worked on AWS Cloud Formation and Terraform to create infrastructure on AWS as a code.
Authoring Python (PySpark) Scripts for custom UDF's for Row/ Column manipulations, mergers, aggregations, stacking, data labeling, and for all Cleaning and conforming tasks.
Writing Pig Scripts to generate MapReduce jobs and perform ETL procedures on the data in HDFS.
Develop solutions to leverage ETL tools and identify opportunities for process improvements using Informatica and Python.
Expertise working knowledge in Google cloud platform GCP (Big Query, Cloud Dataproc, Composer/Airflow).
Used Terraform in AWS Virtual Private Cloud to automatically setup and modify settings by interfacing with control layer.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query.
Data profiling and data wrangling of XML, Web feeds and file handling using python, Unix, and SQL.
Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python.
Used Sqoop to channel data from different sources of HDFS and RDBMS.
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.
Used SSIS to build automated multi-dimensional cubes.
Used Spark Streaming to receive real-time data from Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra.
Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
Validated the test data in DB2 tables on Mainframes and Teradata using SQL queries.
Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD, BTEQ, and Fast Load.
Worked on Dimensional and Relational Data Modelling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical, and Physical data modeling.
Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.
Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, MongoDB, T-SQL, and SQL Server using Python.
Environment: AWS glue, AWS Redshift, AWS Athena, Alteryx, AWS Data pipeline, SQL, Spark, Azure Cloud, Apache PySpark, Bigdata ECO systems, Hadoop, HDFS, Hive, PIG, Cloudera, MapReduce, Python, Power BI, Tableau, Terraform
Kroger, Cincinnati - OH Apr 2019 – Jan 2021
Data Engineer
Responsibilities:
Attended requirement calls and worked with Business Analyst and Solution Architects to understand the requirements.
Created pipelines in Azure using ADF to get the data from different source systems and transform the data by using many activities.
Design and developed Batch processing and real-time processing solutions using ADF, Databricks clusters and stream Analytics.
Experience in designing, developing, and implementing ETL pipelines using Azure Databricks.
Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2 using Azure Data FactoryV2.
Created reusable pipelines in Data Factory to extract, transform and load data into Azure SQL DB and SQL Data warehouse.
Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
Proficiency in using Apache Spark and PySpark to process large datasets, including data ingestion, transformation, and aggregation.
Proficiency in using Delta Lake with various data formats, including Parquet, Avro, JSON, and CSV, and experience in reading and writing data from/to Delta tables using Databricks notebooks and Spark SQL.
Experience in using Databricks Delta Lake, a scalable and performance storage layer for Delta tables, which provides ACID Transactions, schema enforcement, and time travel capabilities.
Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Experienced in developing audit, balance and control framework using SQL DB audit tables to control the ingestion, transformation, and load process in Azure.
Solid experience in Data Warehousing best practices working with Metadata, repositories, and experience within adisciplined lifecycle methodology.
Managing Databricks Notebooks Delta Lake with Python Delta Lake with Spark SQL
Developed and executed migration strategies to move workloads from on-premises or other cloud platforms to Azure, leveraging OCI for supporting components.
Used Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
Used Azure Devops to build and release different versions of code in different environments.
Well-versed with Azure authentication mechanisms such as Service principal, Managed Identity, Key vaults.
Created External tables in Azure SQL Database for data visualization and reporting purposes.
Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
Environment: Azure, Snowflake, Hadoop, PySpark, Django, Flask, Kafka, Tableau, SQL, JavaScript, MongoDB.