MOUNIKA ULLAM
Data Engineer
CONTACT
EDUCATION
TECHNICAL SKILLS
PROFILE SUMMARY
ABOUT ME
*******.*****@*****.***
University of Missouri, Kansas, USA
Masters / Computer science
Programming & Scripting: Python, Java,
Scala, SQL, Bash/Shell Scripting
Big Data Frameworks & Tools: Apache
Hadoop, Apache Spark, Kafka, Flink
Relational Databases: MySQL, PostgreSQL,
SQL Server.
NoSQL Databases: MongoDB, Cassandra,
DynamoDB.
Data Warehouses: Snowflake, Amazon Red-
shift, Google BigQuery, Azure Synapse.
Data Lakes: Azure Data Lake, Amazon S3,
Databricks Delta Lake.
AWS: S3, EMR, Redshift, Glue, Lambda, and
Kinesis.
Azure: Data Factory, Azure Synapse, Azure
Blob Storage.
GCP: BigQuery, Dataflow, Pub/Sub, Cloud
Storage.
ETL Tools: Apache Nifi, Talend, Informatica,
Matillion.
CI/CD Tools: Jenkins, CircleCI, GitLab CI/CD.
IaC (Infrastructure as Code): Terraform,
AWS CloudFormation.
Tableau, Power BI: For presenting data in-
sights.
Custom Visualization: Using Python libraries
like matplotlib, seaborn.
Monitoring: Grafana, Kibana, Prometheus.
Containers: Docker, Kubernetes for data
pipeline scalability.
As an experienced Data Engineer with 4+ years of expertise in designing, building, and optimizing data pipelines and architectures across AWS, Az- ure, and GCP, I seek to apply my skills in cloud-based data engineering to drive data-driven decision-making.
Dynamic and motivated IT professional with around 4+ years of experi- ence as a Big Data Engineer with expertise in designing data intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse/ Data Mart, Data Visualization, Reporting, and Data Quality solutions.
Experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.
Sound knowledge of database architecture for OLTP and OLAP applica- tions, Data Analysis and ETL processes and data modeling and dimen- sion modeling. Hands-on experience modeling data (Snowflake, Star- Schema) and developing pipelines for BI analytics and visualization us- ing Power BI, Tableau.
Experience in implementing big data solutions using Azure Data Lake, Data Factory, HDInsight, Data bricks, Delta Lake, Synapse. Experience in using Hadoop ecosystem and processing data using Tableau.
Used Spark streaming, HBase, and Kafka to work on real-time data in- tegration. Architect & implement medium to large scale Bl solutions on Azure using Azure Data Platform services (Azure Data Lake, Azure Data Factory, Data Lake Analytics, and Stream Ana- lytics).
Develop batch processing solutions by using Data Fac- tory and Azure Data bricks. Expertise in build-
ing CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in us- ing AWS CloudFormation, API Gateway, and AWS Lambda in automa- tion and securing the infrastructure on AWS.
Strong experience using Spark RDD API, Spark Data Frame/Dataset API, Spark-SQL, and Spark ML frameworks for building end to end data pipe- lines. Experience in Windows Azure Services like PaaS, IaaS and worked on storages like Blob (Page and Block), SQL Azure. Well experienced in deployment & configuration management and Virtualization.
Good experience in Agile and SCRUM methodologies.
Expertise in Creating, Debugging, Scheduling and Monitoring jobs us- ing Airflow and Oozie. Experienced with using most common Operators in Airflow - Python Operator, Bash Operator, Google Cloud Storage Download Operator, Google Cloud Storage Object Sensor, Google Cloud Storage to S3 Operator. Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups.
Proficient with Container systems like Docker and Container orchestra- tion like EC2 Container Service, Kubernetes, worked with Terraform.
Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
Proficiency in SQL across several dialects (we commonly WORK EXPERIENCE
Aug 2024 - Present
AWS Data Engineer - Evernorth Health Services, St. Louis, Missouri, USA Evernorth Health Services is a pharmacy benefit management and specialty managed care company. I created and managed the ETL (Extract, Transform, Load) processes to ensure data is processed and made available for analysis. Developed and managed the data warehousing solutions to consolidate data from different sources. Responsibilities:
Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime.
Implemented One time Data Migration of Multistate level data from SQL Server to Snowflake by using Python and Snow SQL. and developing ETL Pipelines in and out of data warehouse.
Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a Da- tabase table and load the transformed data to another data store AWS S3 for raw file storage.
Integrated Tableau with various data sources, including Hadoop Distributed File System (HDFS), Hive and relational da- tabases, ensuring seamless connectivity for comprehensive data visualization and reporting.
Achieved 70% faster EMR cluster launch and configuration, optimized Hadoop job processing by 60%, improved system stability, and utilized Boto3 for seamless file writing to S3 bucket.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Designed and developed Spark jobs to process data coming in different file formats like XML, CSV, and JSON.
Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.
Utilized C# and ADO.NET to establish connections to databases such as SQL Server, Oracle, and MySQL, enabling effi- cient data retrieval and manipulation. Involved in the automation process through Jenkins for CI/CD pipelines.
Utilized Amazon DynamoDB to manage real-time NoSQL databases for storing patient transaction data with low- latency access, ensuring fast retrieval of critical healthcare information and implemented AWS Snowflake for large-scale data transfers, maintaining data availability and consistency in healthcare systems to ensure timely and accurate patient care.
Designed and implemented ETL workflows using AWS Glue, Apache NiFi, and Talend to automate the extraction, transformation, and loading of healthcare data from multiple sources into cloud-based storage solutions.
Working on migrating Data to the cloud (Snowflake and AWS) from the legacy data warehouses and developing the infrastructure. Worked with AWS Terraform templates in maintaining the infrastructure as code.
Integrated Kubernetes with cloud-native services, such as AWS EKS and GCP GKE, to leverage additional scalability and managed services. Developed Kibana Dashboards based on the Log stash data and Integrated different source and tar- get systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.
Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake. Experience in data profiling, data mapping, data cleaning, data integration, meta data management and MDM (Master Data Management)
Design and build scalable data pipelines to ingest, translate, and analyze large sets of data Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level. Environment: AWS, CI/CD, Elasticsearch, EMR, ETL, GCP, HBase, HDFS, Hive, Jenkins, JS, Kafka, Kubernetes, lake, Lambda, MySQL, Oracle, Python, S3, Snowflake, Spark, SQL, Sqoop, Tableau Dec 2022 - Jul 2023
GCP Data Engineer - (Accenture) Nokia, Mumbai, India Nokia Corporation is a Finnish multinational telecommunication, information technology, and consumer electronics corpora- tion. Implemented the data security measures to protect sensitive information, adhering to industry standards and regula- tory requirements. Ensured the quality, integrity, and consistency of data through robust validation, cleansing, and normali- zation processes.
Responsibilities:
Involved in transforming the large data from Teradata to HDFS and vice versa using Sqoop incremental imports.
Monitored Spark cluster using Log Analytics and Ambari Web Ul. Transitioned log storage from Cassandra to Azure SQL Datawarehouse and improved the query performance. Written queries in MySQL and Na- tive SQL.
Ensured data quality and report accuracy by implementing validation scripts and schema checks in the pipeline feeding Data Studio. Implemented dynamic cluster provisioning in Dataproc with autoscaling policies, reducing idle cluster time and optimizing GCP costs. Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
Actively involved in designing and developing data ingestion, aggregation, and integration in the Hadoop environment.
Experienced in GCP features which include Google Compute engine, Google Storage, VPC, Cloud Load balancing, IAM.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks. Used Cloud Shell for troubleshooting production data pipelines in GCP
Developed data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (JSON) for visualization, and generating. Worked with Python ORM classes with SQL Alchemy to get update the data from data- base.
Leveraged Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics for data ingestion into vari- ous Azure Services, including Azure Data Lake, Azure Storage, Azure SQL and Azure DW.
Extensive use of cloud shell SDK in GCP to configure/deploy the services using GCP BigQuery.
Managed and monitored large-scale databases (10TB+), including index tuning, partitioning, and performance optimiza- tion in both OLTP and OLAP systems. Working with GCP cloud using in GCP Cloud storage, Data-proc, Data Flow, Big- Query, EMR, S3, Glacier and EC2 with EMR Cluster
Designed and deployed fault-tolerant streaming pipelines with windowing, watermarking, and side outputs in Dataflow.
Written Python DAGS in airflow which orchestrate end to end data pipelines for multiple applications. Environment: Analytics, API, Azure, Azure Data Lake, BigQuery, Cassandra, Cluster, Data Factory, EC2, EMR, Factory, GCP, HDFS, Hive, JS, Lake, MySQL, PySpark, Python, S3, Scala, SDK, Services, Spark, Spark SQL, SQL, Sqoop, Teradata, VPC Apr 2020 - Nov 2022
Data Engineer - HID Global, Mumbai, India
HID Global is an American manufacturer of secure identity products. Utilized the big data technologies like Hadoop, Spark, or similar frameworks for handling large datasets. Developed and implemented the data models to support business intelli- gence and analytics needs.
Responsibilities:
Developed ETL's using PySpark. Used both Data frame API and Spark SQL API. Utilizing AWS Lambda and DynamoDB, a security framework was created to provide for fine-grained access control for items in AWS S3.
Exploring with the PySpark to improve the performance and optimization of the existing algorithms in Hadoop us- ing Spark Context, Spark-SQL, Data Frame, and Pair RDD's. Developed Spark applications for the entire batch proc- essing by using PySpark. Involved in requirement analysis, design, coding, and implementation phases of the project.
Employed Hadoop scripts to manipulate and load data from the Hadoop File System.
Worked on Kafka streaming on subscriber side, processing the messages and inserting them into the db and Apache Spark for real-time data processing. Created Databricks Job workflows which extracts data from SQL Server and upload the files to SFTP using PySpark and Python.
Have used T-SQL for MS SQL Server and ANSI SQL extensively on disparate databases.
Developed a fully automated Continuous Integration (CI/CD) system using Git, Jenkins, MySQL and custom tools devel- oped in Python and Bash. Optimized batch and streaming migration processes by leveraging Dataflow and Pub/Sub, enabling near real-time synchronization of legacy systems with cloud infrastructure.
Implemented Apache Airflow for workflow automation and scheduling tasks and created DAGs tasks.
Implemented Kubernetes namespaces and RBAC (Role-Based Access Control) policies to enforce security and access controls in the data infrastructure. Documented and standardized Elasticsearch usage patterns, improving team on- boarding speed and reducing query-related incidents by 40%.
Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (Google Cloud). Implemented DAGs for scheduling complex workflows, integrating with AWS services (S3, Redshift, Lambda) and external API. Environment: Airflow, Apache, API, AWS, CI/CD, DynamoDB, Elasticsearch, ETL, Factory, Git, Jenkins, Kafka, Kubernetes, lake, Lambda, MySQL, PySpark, Python, Redshift, S3, SAS, Spark, Spark SQL, SQL