PROFESSIONAL EXPERIENCE
Client: BMO Financial Group, Quebec, Canada April 2023 - Present
Role: Azure Data Engineer
Description: BMO Financial Group is a leading diversified financial services provider in North America, offering a wide range of banking, wealth management, and investment solutions. I have contributed in designing and implementing scalable data pipelines, automating workflow processes, and optimizing data analytics infrastructure.
Responsibilities:
Skilled in designing and maintaining robust data pipelines to extract, transform, and load data for trading analytics.
Proficient in leveraging Python data frame libraries like Pandas or Polars for efficient data manipulation and analysis.
Expertise in automating data collection processes from a variety of sources, including websockets, REST APIs, and S3 storage.
Experience in optimizing data workflows for enhanced performance and scalability in a high-stakes environment.
Knowledgeable in cloud-based data services and storage solutions to support data integration and management.
Familiarity with using and integrating data from industry-standard sources like Bloomberg, LSEG/Refinitiv/Reuters, and FactSet.
Ability to collaborate with cross-functional teams, including data scientists and financial analysts, to derive insightful trading strategies from data.
Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards.
Designed and developed distributed data processing pipelines using Apache Spark with Scala, optimizing performance for batch and real-time workloads in a Hadoop ecosystem (HDFS, Hive, Impala).
Implemented complex SQL and Hive queries for data extraction and transformation, supporting analytics and reporting for capital markets use cases.
Built and maintained scalable Spark Streaming jobs to enable near real-time processing of financial transactions, improving decision latency for business-critical operations.
Collaborated across cross-functional teams, including business analysts and QA, to translate data requirements into actionable technical solutions, demonstrating clear and concise communication.
Performed root cause analysis and resolved production issues in a high-pressure environment, ensuring reliability and consistency in data delivery pipelines.
Worked on end-to-end solutions in a Linux-based big data environment, contributing to continuous improvement and performance tuning of Hadoop-based workflows used by banking clients.
Collaborated with DevOps teams to implement continuous integration and deployment (CI/CD) pipelines for machine learning models, utilizing relational database systems for data storage and management, and documenting all processes and configurations in Confluence to maintain compliance with industry standards.
Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, and Ansible and with Computer Science or a related information technology field, with a strong foundation in data management practices and policies and with a strong foundation in ELT (Extract, Load, Transform) processes
Delivered accurate data in a timely manner to key business users, efficiently handling BAU requests and ServiceNow Tickets.
Implemented and managed company data management practices, including performance tuning and troubleshooting to ensure high data quality.
Performed remediation of reporting hygiene issues and implemented continuous improvement strategies to enhance data accuracy and reporting effectiveness.
Strong knowledge of writing SQL using RDBMS like Sybase and DB2, with expertise in defect tracking tools such as Jira.
Experienced with version control tools like GIT, and adept at working in Agile teams to deliver high-quality results.
Problem-solving skills with the ability to work under time and resource constraints, finding simple and effective solutions.
Familiar with Power BI for reporting, and exposure to other tools like Tableau and BusinessObjects for comprehensive data visualization.
Proficient in scripting using Python, Unix, and Linux shell, with experience in BOXI report development and universe building.
Good understanding of user security in BusinessObjects (BO), with additional knowledge of CMC being a plus.
Motivated to expand technical and business knowledge, demonstrating a commitment to continuous improvement and learning.
Demonstrated entrepreneurial mindset by designing and implementing big data solutions using Apache Spark, optimizing data processing workflows, and reducing ETL times by 40% through efficient algorithms.
Environment: Spark 3.3, AWS S3, Redshift, Glue, EMR, IAM, EC2, Tableau, Jenkins, Jira, Python, Kafka, Agile.
Client: FedEx, Hyderabad, India Aug 2021 - Feb 2023
Role: Azure Data Engineer
Description: FedEx is global leader in logistics that provides unparalleled courier and delivery services worldwide, ensuring swift and secure transportation of goods. I have implemented modern Azure-based solutions and enhancing Java/Scala Spark processing.
Responsibilities:
Looked into existing Java/Scala spark processing and maintained, enhanced the jobs.
Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization. Understood the application's current Production state and the impact of new installation on existing business processes.
Working on query languages such as SQL, code languages such as Python or C# and scripting languages such as PowerShell, M-Query (Power Query), or Windows batch commands. Using python NLTK tool kit to be used in smart MMI interactions.
Ability to apply the spark Data Frame API to complete data manipulation with in spark session.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop. Part of the Data and reporting team creating insights and Visualization for the business to make decisions on.
Built scalable data infrastructure on cloud platforms, such as AWS and GCP, using Kubernetes and Docker. Well versed with various aspects of ETL processes used in loading and updating Oracle data warehouse. Created pipelines to load the data using ADF. Analyzed the SQL scripts and designed it by using Spark SQL for faster performance.
Creating job flow using Airflow in python and automating the jobs. Airflow will have separate stack for developing DAGs on and will run jobs on EMR or EC2 Cluster. Written queries in MySQL and Native SQL.
Used Azure Data factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day requirements, and loaded them to Azure Data Lakes. Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds. Involved in loading data from rest endpoints to Kafka.
Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers.
Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export. Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects.
Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications. Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
Automated AWS infrastructure to be deployed periodically using Docker images and CloudFormation templates.
Designed and implemented end-to-end enterprise AI solutions by utilizing best practices in programming languages like Python and Scala, optimizing for low-latency machine learning models to enhance system reliability and performance.
Led analysis and development of machine learning algorithms with a focus on programming for scalability and reliability, ensuring seamless integration with enterprise systems while maintaining adherence to industry best practices.
Environment: SQL Server, Informatica Cloud, Talend, Control-M, Denodo, Visual Studio, Bitbucket, Windows
Client: ICICI Lombard General Insurance, Hyderabad, India Apr 2020 - Jul 2021
Role: Application Developer/ Data Engineer
Description: ICICI Lombard General Insurance is a Private Non-govt company. It is engaged in the Insurance Industry. It is a Service Provider of mutual funds services, health insurance & insurance agents. It is offering customized insurance solutions in general, health and life domains, to corporate and individuals in India.
Responsibilities:
Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.
Enhanced by adding Python XML SOAP request/response handlers to add accounts, modify trades and security updates.
Involved in database migration methodologies and integration conversion solutions to convert legacy ETL processes into Azure Synapse compatible architecture. Created clusters to classify control and test groups.
Created Complex Stored Procedures, Slow Changing Dimension Type 2, Triggers, Functions, Tables, Views and other T SQL code and SQL joins to ensure efficient data retrieval. Worked on Angular JS to augment browser applications with MVC capability.
Build and deployed the code Artefacts into the respective environments in the Confidential Azure cloud.
Installed, configured, administered, monitored Azure, IAAS and PAAS, AzureAD, Azure VMs, Networking (VNet’s, Load Balancers, App Gateway, Traffic Manager,etc.). Extensively involved in all phases of Data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver business needs of different teams.
Implemented a Reusable plug & play Python Pattern (Synapse Integration, Aggregations, Change Data Capture, Deduplication and High Watermark Implementation. This process accelerated the development time and standardization across teams.
Worked with Spark Core, Spark ML, Spark Streaming and Spark SQL and data bricks.
Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream and Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
Implemented Kubernetes namespaces and RBAC (Role-Based Access Control) policies to enforce security and access controls in the data infrastructure. Extracting and Analyzing data from various sources. Data wrangling and cleanup using Python-Pandas.
Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo.
Environment: AWS EMR, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, YARN, JIRA, S3, Redshift, Athena, Shell Scripting, GitHub, Maven.
Client: Natco Pharma, Hyderabad, Telangana, India Jun 2018 - Mar 2020
Role: Data Engineer
Description: Natco Pharma Limited is an India-based vertically integrated, research and development (R&D)-focused pharmaceutical company. It is engaged in the business of pharmaceuticals, which comprises research and development, manufacturing and selling of bulk drugs and finished dosage formulations.
Responsibilities:
Experience in using different types of stages like Transformer, Aggregator, Merge, Join, Lookup, and Sort, remove duplicated, Funnel, Filter, Pivot for developing jobs. Build Jenkins jobs for CI/CD Infrastructure for GitHub repos.
Conducted performance tuning and optimization of Kubernetes and Docker deployments to improve overall system performance. Used Python based GUI components for the Front End functionality such as selection criteria.
Used Python to connect to MySQL using MySQL connectors and extracted various data for customer usage reports.
Performed load testing and optimization to ensure the pipeline's scalability and efficiency in handling large volumes of data.
Developing and maintaining Azure Analysis Services models to support business intelligence and data analytics requirements, creating measures, dimensions, and hierarchies for reporting and visualization.
Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.
Developed Spark applications using Scala and spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data uncover insight into the customer usage patterns and even.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. Developing scalable and reusable database processes and integrating them.
Created datasets from S3 using AWS Athena and created Visual insights using AWS Quicksight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes.
Successfully completed a POC for Azure implementation, with the larger goal of migrating on-premises servers and data to the cloud. Developed custom reports using HTML, Python and MySQL.
Environment: AWS Cloud, Hadoop, Spark, Hive, Teradata, Mload, Oozie, Spring Boot, JUnit, IntelliJ, Maven and Git Hub, Docker, Kubernetes
TECHNICAL SKILLS
BigDataEco-system: HDFS, MapReduce, Spark, Yarn, Hive, Pig, HBase, Sqoop, Flume, Kafka, Oozie, Zoo-Keeper, Impala
Hadoop Technologies: Apache Hadoop 2.x/1.x, Cloudera CDH4/CDH5, Horton works
Operating Systems:Windows, Linux (Ubuntu, Centos)
NoSQL Database: HBase, Cassandra and Mongo DB
Database: RDBMS, MySQL, Teradata, DB2, Oracle
Container/ Cluster Managers: Docker, Kubernetes
BI Tool: Tableau, PowerBI
Programming Languages: Java, Python, Scala
Cloud Environment: AWS (AmazonWebServices), Azure
Web Development: HTML, CSS, JavaScript
IDE Tools: Eclipse, Jupyter, Anaconda, PyCharm
Development Methodologies: Agile, Waterfall
EDUCATION
Bachelor’s in electronics and communications engineering from vignan’s lara institute of technology, India
Kavya Moram
Data Engineer
Email ID :************@*****.***
Contact NO :+1-438-***-****
OBJECTIVE
To leverage 7+ years of experience in data engineering to contribute to innovative projects that optimize data pipelines, enhance data quality, and drive actionable insights.
PROFELE SUMMARY
7+ years of programming experience involved in all phases of Software Development Life Cycle (SDLC). Over 5+ Years of BigData experience in building highly scalable data analytics applications.
Implemented Big Data solutions using Hadoop technology stack, including Pyspark, Hive, Sqoop, Avro and Thrift. Expert in utilizing Spark SQL, Spark streaming and developing Data frames using Spark with Snowflake
Worked in dealing with Windows Azure IaaS - Virtual Networks, Virtual Machines, Cloud Services, Resource Groups, Express Route, Traffic Manager, VPN, Load Balancing, Application Gateways, and Auto-Scaling. Worked on production support looking into logs, hot fixes and used Splunk for log monitoring along with AWS CloudWatch.
Has experience with story grooming, Sprint planning, daily standups, and software techniques like Agile and SAFe. Design and Develop ETL Processes in Azure data factory to migrate data from external sources like S3, Text Files into Azure synapse. Excellent problem solver possessing positive outlook with a blend of technical and managerial skills aimed to deliver the task before deadlines.
Practical experience using KTables, Global KTables, and KStreams in Apache Kafka and Confluent environments to work with Kafka streaming.
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
Good Hand-on Experience in ETL processing, Migration and data processing using AWS services such as EC2, Athena, Glue, Lambda, S3, Relational Database Service (RDS) and other data-based services of AWS.
Experienced in building Snow Pipes, migrating Teradata objects into Snowflake environment. Experience in Azure Marketplace where to search, deploy and purchase wide range of applications and services.
Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table. Hands-on experience working with DevOps tools such as Jenkins, Docker, and Kubernetes, Gocd, and Autosys scheduler.
Experience in Performance Monitoring, Security, Trouble shooting, Backup, Disaster recovery, Maintenance and Support of Linux systems.
Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.
Highly skilled in using visualization tools like Tableau, matplotlib, ggplot2 for creating dashboards. Strong Data Analysis skills using Business intelligence, SQL and MS Office Tools.