LIKITHA DAGGULA
Charlotte, North Carolina ***** 407-***-**** **************@*****.***
Summary
Innovative Big Data Engineer with 5 years of experience known for high productivity and efficiency in task
completion. Possess specialized skills in Hadoop ecosystem, Spark programming, and data modeling that
contribute to solving complex data challenges. Excel in analytical thinking, problem-solving, and
communication, leveraging these soft skills to effectively collaborate with cross-functional teams and deliver
insightful data solutions.
Skills
Application Development, Testing and Data Processing and Analysis: Apache Spark,
Deployment Apache Kafka, AWS EMR
Programming languages: Python, Scala, SQL, Big data technologies: Hadoop, HBase, Spark,
Java Hive, Scikit-learn
Databases: MySQL, PostgreSQL, MongoDB, Continuous integration and Performance Tuning
Cassandra Data Warehousing: Amazon Redshift, Google
ETL Tools: Apache NIFI, Microsoft SQL Server, BigQuery, Snowflake
Apache Airflow, Integration Services (SSIS), Version Control Systems: Git, GitHub, GitLab,
Informatica, Talend, AWS Glue Bitbucket
Data visualization tools: Tableau, Power BI
Experience
Senior Big Data Engineer 02/2022 to Current
Gainwell technologies Irving, TX
Researched advancements in technology related to Big Data processing, storage, and analytics.
Contributed to the design and implementation of efficient solutions for managing large-scale Big Data
workloads.
Utilized ETL techniques to extract data from multiple sources and populate the target system through
customized jobs.
Created and deployed NoSQL databases including Cassandra, MongoDB, and HBase for efficient
storage of large-scale data.
Executed the development and implementation of Spark applications using Python and Scala.
Developed and maintained data pipelines to ingest, store, process, and analyze large datasets in AWS
S3 buckets.
Conducted comprehensive testing for all components within the Big Data architecture.
Identified potential problems with performance and scalability by monitoring the production systems.
Automated deployment processes for deploying applications across various cloud environments,
including containerized applications on OpenShift, resulting in streamlined deployment workflows.
Optimized data handling capabilities by creating and deploying high-performance real-time applications
with Kafka Streams or Spark Streaming.
Implemented automated monitoring of data flows using CloudWatch and Lambda functions, integrated
with OpenShift for enhanced operational insights.
Designed and managed data integration workflows using Apache NiFi to ensure seamless data
movement across different platforms.
Implemented data governance and security policies to protect sensitive data and ensure compliance
with industry standards.
Collaborated with cross-functional teams to integrate machine learning models into production data
pipelines, leveraging tools such as AWS SageMaker and TensorFlow.
Conducted root cause analysis and debugging of data pipeline issues, ensuring timely resolution and
minimal impact on data processing operations.
Engaged in continuous learning and professional development to stay current with emerging trends and
best practices in Big Data technologies and tools.
Big Data Developer 06/2020 to 01/2022
Premier Inc Charlotte, North Carolina
Performed analysis of large datasets using complex SQL queries and advanced Python scripting (e.g.,
Pandas, NumPy), identifying data patterns, trends, and anomalies to inform system requirements and
business strategies.
Architected and implemented Azure Storage solutions, including Blob Storage for unstructured data,
Azure Files for shared storage, Azure Queue for message queueing, and Table Storage for NoSQL
storage, ensuring high availability, redundancy, and cost-effectiveness for diverse application
requirements.
Configured Kafka clusters with custom Zookeeper setups and developed custom consumer applications
using Kafka Streams API to facilitate real-time data ingestion and processing from various sources,
ensuring low-latency and fault-tolerant data streaming solutions.
Developed and deployed Big Data applications leveraging Hadoop ecosystem components (Hadoop,
MapReduce, HDFS, Hive, Pig) and Apache Spark, designing and implementing ETL pipelines for
processing and analyzing petabyte-scale datasets.
Optimized SQL queries on relational databases such as Oracle, SQL Server, and MySQL by using
indexing, query rewriting, and partitioning techniques to improve query performance and reduce latency.
Developed and implemented automation scripts for Azure services using PowerShell, Python, and Bash,
automating cloud infrastructure tasks such as provisioning, configuration management, and deployment,
utilizing Azure CLI and Azure DevOps.
Debugged existing Java applications using integrated development environments (IDEs) like IntelliJ and
Eclipse, employing debugging tools, logging frameworks (e.g., Log4j), and performance profiling to
identify and resolve application bugs and performance bottlenecks.
Automated deployment processes for Kafka clusters and custom consumer applications using CI/CD
pipelines (e.g., Jenkins, GitLab CI/CD), leveraging containerization with Docker and orchestration with
Kubernetes to ensure scalable and reliable application deployments across various cloud environments.
Implemented and maintained Azure Storage solutions with performance tuning and cost optimization
strategies, including lifecycle management policies for Blob Storage, automated backup and restore for
Azure Files, and monitoring with Azure Monitor and Azure Storage Analytics.
Big Data Intern 06/2018 to 07/2019
Avon Technologies Hyderabad, Telangana
Leveraged Agile methodologies to efficiently progress development lifecycle from initial prototyping to
enterprise-quality testing and final implementation.
Designed and executed advanced data pipelines to transfer both structured and unstructured data into
HDFS.
Created and integrated specialized user-defined functions to expand the functionality of HiveQL queries.
Developed and optimized algorithms to analyze and manage substantial data volumes from different file
systems.
Successfully deployed and managed Apache Spark applications on YARN clusters for efficient
execution of distributed computing tasks.
Fine-tuned parameters based on analysis to enhance the performance of MapReduce jobs.
Analyzed big data sets using R and Python libraries including SciPy and NumPy.
Achieved performance improvements in MapReduce jobs by optimizing Apache Hadoop clusters.
Continuously monitored and adjusted system configurations to ensure optimal performance of data
processing tasks.
Education
Master of Science: Computer Science 12/2020
Southeast Missouri State University Cape Girardeau, MO
Bachelor of Technology: Computer Science And Engineering 05/2019
Jawaharlal Nehru Technological University India
ACADEMIC PROJECTS:
Designed an ETL pipeline using Apache NiFi and Talend to ingest, transform, and load retail data into
the Amazon Redshift data warehouse.
Real-Time Data Processing System: A real-time data processing pipeline was designed with Apache
Kafka and Apache Spark for the analysis of streaming social media data.
Built a machine learning pipeline using Python and Scikit-learn for predictive analytics of student
outcomes and integrated with Apache Airflow for data processing automation.
www.linkedin.com/in/likithadaggula