Data Engineer Engineering

Location:

Pearland, TX

Posted:

January 26, 2025

Contact this candidate

Resume:

RamyaSravani Matta

*****************@*****.*** 217-***-**** GitHub LinkedIn

PROFESSIONAL SUMMARY

To work with maximum potential in a challenging and dynamic environment, with an opportunity of working with diverse groups of people and enhancing my professional skills with learning and experience for career growth. Data Engineering and Analytics professional with 3+ years of experience, specializing in designing scalable ETL pipelines, automating workflows, and processing large-scale datasets to drive actionable insights. Skilled in Python, SQL, and big data technologies such as PySpark, Hadoop, and Apache Airflow, with extensive expertise in cloud platforms like AWS (S3, Glue, Lambda, RDS). WORK EXPERIENCE:

Midh Technologies Inc

Data Engineer

Jan 2024 - Till Date

Designed and implemented scalable ETL pipelines, transforming raw datasets into analytics-ready formats for research projects.

Collaborated with senior developers to design and develop robust Python applications, ensuring adherence to best coding practices.

Developed a multi-threaded web scraping script using Python, Selenium, and BeautifulSoup to extract job listings from dynamic web pages, utilizing HTML and CSS selectors for element identification and data extraction.

Designed and implemented a robust data pipeline to parse and analyze job descriptions, leveraging HTML structure for content extraction and CSS selectors to accurately locate and retrieve relevant elements.

Designed and implemented robust data pipelines to parse and analyze job descriptions, leveraging NLP techniques for resume parsing and requirements matching, ensuring accurate content extraction and relevance analysis.

Utilized spaCy and NLTK to tokenize, stem, and extract key skills, requirements, and entities from job descriptions and resumes for improved data matching and recommendation accuracy.

Developed Python scripts utilizing Selenium and WebDriver libraries to automate the extraction of requirements and listings from dynamic web pages, reducing manual data collection.

Optimized Selenium-based web scraping workflows by resolving StaleElementReferenceException issues using explicit waits and dynamic content adjustments.

Designed and implemented Python scripts to load and manage cleaned data into a MySQL database using MySQL-connector.

Designed comprehensive logging mechanisms in Python scripts to monitor data processing pipelines. Captured and logged key metrics, such as row counts and error details, while handling CSV data, MySQL database transactions, and MongoDB operations. This approach ensured real-time debugging, improved traceability, and reduced data pipeline downtime.

Implemented multi-threaded processing module using python threading library to handle multiple tasks at same time.

Built scalable and modular scripts deployable in CI/CD pipelines for continuous data collection and integration.

Integrated Apache Kafka for efficient messaging and real-time data streaming between various components of the data pipeline.

Developed NLP-driven models to extract insights from unstructured text data, enabling effective clustering and classification of resumes and job descriptions.

Utilized tools like JIRA for task tracking and management in collaborative settings. SMG INFOTECH PRIVATE LIMITED Hyderabad, India

Data Analytics Engineer

Jun 2020 – Jul 2022

Designed and implemented end-to-end data pipelines for post-campaign analytics in the email marketing domain.

Utilized iceberg tables on HDFS alongside Apache Spark and Airflow to streamline and automate the processing of large-scale campaign data.

Applied data modelling techniques to design schemas that optimize storage and retrieval efficiency.

Worked with Parquet and Avro data formats to streamline data handling and enhance performance.

Managed and processed tables containing up to 9 billion records daily, utilizing advanced ETL techniques to ensure efficient data extraction, transformation, and loading processes.

Collaborated closely with data scientists to ensure data quality and optimize data transformations for insightful campaign performance evaluations.

Technical Skills:

Big Data Technologies: PySpark, Spark SQL, Hadoop, MapReduce, HDFS,

Programming: Python (Pandas, NumPy), Scala

Databases: SQL, ORACLE, Teradata, MySQL, MongoDB

Data Querying: Hive, Trino (formerly Presto)

Data Workflow Management: Apache Airflow, Apache Oozie, AWS Glue

Data Storage and Management: Apache Iceberg, Snowflake, AWS S3

Data Engineering: ETL Workflows, Data Warehousing, Pipeline Optimization

Cloud Platforms: AWS Glue, CloudFormation, AWS (S3, Lambda, RDS)

Data Visualization: Tableau, Power BI

Front-End Technologies: HTML, CSS, JavaScript

Project Tools: JIRA, Confluence

Version Control: GitLab, Slack, GitCI/CD, Jenkins Educational Qualification:

Masters in data Analytics at the University of Illinois at Springfield, Illinois.

Bachelors in information technology from Gudlavalleru Engineering College, Andhra Pradesh, India Academic projects:

California House Price Prediction

Developed a project called “California House Price Prediction” aims to predict median house prices in California using a machine learning approach. The analysis uses the California housing dataset, which includes features such as geographical coordinates, housing median age, population, and median income. By leveraging this dataset, the project provides a predictive model and insights valuable for real estate forecasting. SMS Spam Detection

Developed a machine learning model to classify “SMS Spam Detection” using Natural Language Processing

(NLP) techniques. Preprocessed text data, engineered features, and applied algorithms like Naïve Bayes and Support Vector Machines to achieve high accuracy. Visualized insights using Matplotlib and Seaborn and deployed the solution for real-time classification.

Contact this candidate