Big Data Engineer

Location:

Chantilly, VA

Posted:

August 29, 2024

Contact this candidate

Resume:

LIKITHA DAGGULA

Charlotte, North Carolina ***** 407-***-**** **************@*****.***

Summary

Innovative Big Data Engineer with 5 years of experience known for high productivity and efficiency in task

completion. Possess specialized skills in Hadoop ecosystem, Spark programming, and data modeling that

contribute to solving complex data challenges. Excel in analytical thinking, problem-solving, and

communication, leveraging these soft skills to effectively collaborate with cross-functional teams and deliver

insightful data solutions.

Skills

Application Development, Testing and Data Processing and Analysis: Apache Spark,

Deployment Apache Kafka, AWS EMR

Programming languages: Python, Scala, SQL, Big data technologies: Hadoop, HBase, Spark,

Java Hive, Scikit-learn

Databases: MySQL, PostgreSQL, MongoDB, Continuous integration and Performance Tuning

Cassandra Data Warehousing: Amazon Redshift, Google

ETL Tools: Apache NIFI, Microsoft SQL Server, BigQuery, Snowflake

Apache Airflow, Integration Services (SSIS), Version Control Systems: Git, GitHub, GitLab,

Informatica, Talend, AWS Glue Bitbucket

Data visualization tools: Tableau, Power BI

Experience

Senior Big Data Engineer 02/2022 to Current

Gainwell technologies Irving, TX

Researched advancements in technology related to Big Data processing, storage, and analytics.

Contributed to the design and implementation of efficient solutions for managing large-scale Big Data

workloads.

Utilized ETL techniques to extract data from multiple sources and populate the target system through

customized jobs.

Created and deployed NoSQL databases including Cassandra, MongoDB, and HBase for efficient

storage of large-scale data.

Executed the development and implementation of Spark applications using Python and Scala.

Developed and maintained data pipelines to ingest, store, process, and analyze large datasets in AWS

S3 buckets.

Conducted comprehensive testing for all components within the Big Data architecture.

Identified potential problems with performance and scalability by monitoring the production systems.

Automated deployment processes for deploying applications across various cloud environments,

including containerized applications on OpenShift, resulting in streamlined deployment workflows.

Optimized data handling capabilities by creating and deploying high-performance real-time applications

with Kafka Streams or Spark Streaming.

Implemented automated monitoring of data flows using CloudWatch and Lambda functions, integrated

with OpenShift for enhanced operational insights.

Designed and managed data integration workflows using Apache NiFi to ensure seamless data

movement across different platforms.

Implemented data governance and security policies to protect sensitive data and ensure compliance

with industry standards.

Collaborated with cross-functional teams to integrate machine learning models into production data

pipelines, leveraging tools such as AWS SageMaker and TensorFlow.

Conducted root cause analysis and debugging of data pipeline issues, ensuring timely resolution and

minimal impact on data processing operations.

Engaged in continuous learning and professional development to stay current with emerging trends and

best practices in Big Data technologies and tools.

Big Data Developer 06/2020 to 01/2022

Premier Inc Charlotte, North Carolina

Performed analysis of large datasets using complex SQL queries and advanced Python scripting (e.g.,

Pandas, NumPy), identifying data patterns, trends, and anomalies to inform system requirements and

business strategies.

Architected and implemented Azure Storage solutions, including Blob Storage for unstructured data,

Azure Files for shared storage, Azure Queue for message queueing, and Table Storage for NoSQL

storage, ensuring high availability, redundancy, and cost-effectiveness for diverse application

requirements.

Configured Kafka clusters with custom Zookeeper setups and developed custom consumer applications

using Kafka Streams API to facilitate real-time data ingestion and processing from various sources,

ensuring low-latency and fault-tolerant data streaming solutions.

Developed and deployed Big Data applications leveraging Hadoop ecosystem components (Hadoop,

MapReduce, HDFS, Hive, Pig) and Apache Spark, designing and implementing ETL pipelines for

processing and analyzing petabyte-scale datasets.

Optimized SQL queries on relational databases such as Oracle, SQL Server, and MySQL by using

indexing, query rewriting, and partitioning techniques to improve query performance and reduce latency.

Developed and implemented automation scripts for Azure services using PowerShell, Python, and Bash,

automating cloud infrastructure tasks such as provisioning, configuration management, and deployment,

utilizing Azure CLI and Azure DevOps.

Debugged existing Java applications using integrated development environments (IDEs) like IntelliJ and

Eclipse, employing debugging tools, logging frameworks (e.g., Log4j), and performance profiling to

identify and resolve application bugs and performance bottlenecks.

Automated deployment processes for Kafka clusters and custom consumer applications using CI/CD

pipelines (e.g., Jenkins, GitLab CI/CD), leveraging containerization with Docker and orchestration with

Kubernetes to ensure scalable and reliable application deployments across various cloud environments.

Implemented and maintained Azure Storage solutions with performance tuning and cost optimization

strategies, including lifecycle management policies for Blob Storage, automated backup and restore for

Azure Files, and monitoring with Azure Monitor and Azure Storage Analytics.

Big Data Intern 06/2018 to 07/2019

Avon Technologies Hyderabad, Telangana

Leveraged Agile methodologies to efficiently progress development lifecycle from initial prototyping to

enterprise-quality testing and final implementation.

Designed and executed advanced data pipelines to transfer both structured and unstructured data into

HDFS.

Created and integrated specialized user-defined functions to expand the functionality of HiveQL queries.

Developed and optimized algorithms to analyze and manage substantial data volumes from different file

systems.

Successfully deployed and managed Apache Spark applications on YARN clusters for efficient

execution of distributed computing tasks.

Fine-tuned parameters based on analysis to enhance the performance of MapReduce jobs.

Analyzed big data sets using R and Python libraries including SciPy and NumPy.

Achieved performance improvements in MapReduce jobs by optimizing Apache Hadoop clusters.

Continuously monitored and adjusted system configurations to ensure optimal performance of data

processing tasks.

Education

Master of Science: Computer Science 12/2020

Southeast Missouri State University Cape Girardeau, MO

Bachelor of Technology: Computer Science And Engineering 05/2019

Jawaharlal Nehru Technological University India

ACADEMIC PROJECTS:

Designed an ETL pipeline using Apache NiFi and Talend to ingest, transform, and load retail data into

the Amazon Redshift data warehouse.

Real-Time Data Processing System: A real-time data processing pipeline was designed with Apache

Kafka and Apache Spark for the analysis of streaming social media data.

Built a machine learning pipeline using Python and Scikit-learn for predictive analytics of student

outcomes and integrated with Apache Airflow for data processing automation.

www.linkedin.com/in/likithadaggula

Contact this candidate