Snehitha Bobba
Email: ad3nmr@r.postjobfree.com
Mobile: +1-309-***-****
PROFESSIONAL SUMMARY:
Having 5+ years of IT experience in designing, developing, and delivering of software using wide variety of technologies in all phases of the development life cycle. Expertise in Python/Big data technologies as developer, proven ability in project-based leadership, teamwork, and good communication skills.
Over 4+ years of strong experience with AWS, Snowflake, Spark, Kafka.
Very Strong Functional programming and Object-oriented concepts with complete SDLC experience - Requirements gathering, Conceptual Design, Analysis, Detail design, Development, Mentoring, System and User Acceptance Testing.
Hands-on development and implementation experience in Big Data Management Platform (BMP).
Strong experience in Micro service architecture and its design patterns.
Strong exposure to snowflake and redshift.
Strong understanding of snowflake cost optimizations.
Designed, deployed, and managed Snowflake data warehouses, ensuring optimal performance, scalability, and reliability.
Databricks was used to implement ETL techniques, converting unstructured data into insightful data and enabling further analysis.
Created and managed Databricks notebooks in Python, Scala, or SQL for data display, analysis.
Databricks Jobs were scheduled and managed to automate repetitive data processing operations and ensure on-time execution.
Databricks Streaming was used to implement real-time data processing, managing continuous streams of data for instant insights.
In depth knowledge on container frameworks like Kubernetes, Docker, etc.
In depth knowledge on Spark core and Spark SQL API.
Developed Batch processing jobs using Java Spark and MapReduce and Hive.
Good Knowledge and experience in Hadoop Administration.
Experience in scripting for automation, and monitoring using Shell scripts.
Created Kafka producers and consumers to stream data in real time while ensuring fault-tolerant and efficient message delivery.
Implemented and managed schema registry to ensure data consistency and compatibility among producers and consumers.
Developed and maintained Kafka topics and enhanced partitioning techniques for effective parallel processing and data distribution.
In-depth understanding of Snowflake Schema, Database, and Table structures.
Developing transformation logic through snow pipeline.
Developed and utilized Snowflake user-defined functions and stored procedures to encapsulate complex logic, enhancing code modularity and maintainability.
Experience in migrating the data to the cloud data ecosystem.
Team player with good management, analytical, communication and interpersonal skills.
Technical professional with management skills, excellent business understanding and strong communication skills.
Developed a real-time data processing system, reducing the time to process and analyze data by 50%.
Designed and implemented a data archiving strategy that reduced storage costs by 30%.
ACADEMIC BACKGROUND
Texas A&M University, Commerce
Master’s Degree,
Information Systems
GPA: 3.6/4
Keshav Memorial Institute of Technology and Science
Bachelor's degree,
Information Technology
GPA: 3.2/4
Creative
Excellent analytical and logical skills to solve problems in logical manner, and resolve them decisively
Quick Learning
Steep learning curve so open to new technologies
Team Player
Good communication and interpersonal skills with ability to multi-task and work independently and within a team environment.
Performance Tuning
Used cutting edge tools and technology to Improve system performance
TECHNICAL SKILLS:
Big Data Ecosystems
Hadoop, MapReduce, HDFS, Hive, Spark, Kafka, Impala, Snowflake, Databricks.
Operating Systems
Windows, Linux, UNIX
Languages
Python, JAVA, C++, SQL
Shell scripting
UNIX Shell Script.
Frame Works
Flask, Apache Spark
Databases
MySQL, Aurora
IDE’s
PyCharm, IntelliJ
PROFESSIONAL EXPERIENCE:
Target (Salt Lake City) Jan 2022 - Present
Snowflake Data Engineer
Designed and implemented ETL pipelines for ingesting and processing large volumes of data from various sources, resulting in a 25% increase in efficiency. Built and maintained data warehousing solutions using Snowflake, allowing for faster data access and improved reporting capabilities. Developed and optimized complex SQL queries and stored procedures to extract insights from large datasets.
Responsibilities:
Experience in Snowflake virtual warehouse and building Snow pipe.
Conducted performance tuning exercises, optimizing Snowflake configurations and query execution plans for faster and more efficient data retrieval.
Implemented query optimization techniques to enhance performance, taking advantage of Snowflake's unique features such as automatic clustering and indexing.
Developed and maintained robust data models within Snowflake, optimizing for performance and efficiency in large-scale data warehousing environments.
Developed and optimized Spark jobs within Databricks for efficient distributed data processing.
Parameterized Databricks notebooks for reusability, allowing easy adaptation to different datasets and scenarios.
Within Databricks notebooks, new widgets were developed to improve user interaction and offer dynamic controls for data analysis.
Utilized Databricks features for data sharing and collaboration, enabling seamless teamwork and knowledge sharing among team members.
Implemented complex SQL transformations within Databricks notebooks, optimizing queries for improved performance.
Good experience in Extracting, Loading, and Transforming (ETL) data sources for medium and large enterprise data warehousing.
Hands on experience on clustering, cloning, data sharing and metadata management in snowflake.
Have deep knowledge on snowflake pricing and snowflake administration concepts.
Very good understanding of RDBMS topics, ability to write complex SQL/PLSQL
Good understanding of snowflake caching mechanisms.
Configured and managed Apache Kafka clusters for optimal performance, scalability, and fault tolerance.
Implemented topic compaction to efficiently manage storage and retention of data in Kafka topics.
Created internal and external stage for data loading to snowflake tables.
Built an outlier logic on top of snowflake tables.
Built an analytical application on top of snowflake tables with different drill-down levels.
Configured and optimized Databricks clusters to balance performance and cost, ensuring scalability based on workload demands.
In order to ensure data integrity and dependability, Delta Lake was implemented for versioning and managing massive data lakes inside the Databricks environment.
Configured and optimized Databricks clusters to balance performance and cost, ensuring scalability based on workload demands.
Developed natural language processing (NLP) solutions using transformers, word embeddings, vector embeddings and sentiment analysis for text classification, named entity identification, and language translation.
Finetuned the existing model with the company data to train a model on sentimental analysis to analyse the customer feedback.
Monitored fine-tuned models in production environments and iteratively fine-tuned them based on performance feedback and changing data distributions to maintain optimal performance over time.
Implemented techniques such as Word Embeddings to vectorize text data for natural language processing (NLP) tasks.
Applied tokenization on the data to covert raw data into sequence of tokens. And these tokens are used to train the model.
Infor(Hyderabad) Jun ‘18 – Feb ‘21
Big Data Engineer
The goal of this project is to build data lake for capturing the granular data generated from various sources about customers. In order to conduct more advanced analysis on customer behavior and the social trends. Access to this data via Spark will also enable client to generate reports and take business driven solutions CUSTOMER360. At presently working on building a DataMart by pulling the data from the Data Lake into the Redshift.
Responsibilities:
Built a framework to load data from files and databases using spark into Data Lake using python. Data lake has three layers Staging, Raw, Golden Record.
Used Spark SQL and accessed external hive meta-store (MySQL Instance) and processed hive tables on daily basis with incremental data and achieved batch updates.
Configured the EMR clusters in the VPC cloud, with different subnets, routing tables, ACLS to communicate with in the cluster and closed all the unnecessary ports.
Built a model to capture the CDC using the Sqoop for importing data from databases into data lake.
Worked with oozie to submit jobs to the cluster and Job dependency between various stages.
Optimize the Spark processing by looking into spark executor memory and cores usage in the Spark UI and Yarn Resource Manager which reduced time to process by half time.
Written Lambdas in order to Create Transit EMR clusters for processing the data based on time and file arrival and time-based events using the AWS CloudWatch
Used columnar data storage format like ORC and Parquet for efficient storage and better performance while processing
Used S3 as storage for the data, so we can terminate the data after it is processed.
Implemented s3-dist-cp to achieve the greater speed in moving the data from s3 to HDFS and vice versa
Enteniselvy used AWS RDS instances to set the hive metastore and hue database for users’ login outside the cluster.
Implemented Data-quality on the data and rejected the records that didn’t satisfy the given condition.
Performed the jobs with Spark core, SparkSQL, Spark Streaming, and Data frames, transformations, actions.
Expert in spark windowing functions for the time series data and implemented various UDF.
Designed Spark schema and data selection queries that are involved in data ingestion process.
Extensive experience in troubleshooting and debugging spark applications in testing environment and in production.
Imported data from the Relational Databases and worked on Data Warehousing concepts such as star schema.
Used bitbucket as the version control tool. Extensive experience in release management of the branch and git sync and merge issues
Good experience in writing the shell scripts for job dependencies and clean ups
Good knowledge of debugging the job, look into the corresponding logs to take the decision to rerun it or rectify the error before the next run
Involved in daily scrums.
Designed a Framework to achieve inserts updates and deleted (Frame work to achieve updates) while writing data from Spark.
AWS Skills:
EC2, EMR, RDS, S3, VPC, Lambda, AWS Step, SES, SNS, Cloud-Watch, AWS CLI, EBS, RedShift, Athena
Used all the services in various Applications
Environment: Python, Apache Spark Core, Spark Streaming, Spark SQL, Kafka, Hive, MySQL, IntelliJ IDE, Git, Agile