Computer Science Data Engineer

Location:

Arlington, TX

Posted:

June 10, 2025

Contact this candidate

Resume:

Srujan Chinta

Arlington, Texas • **************@*****.*** • +1-682-***-**** • linkedin.com/in/srujanchinta EDUCATION

The University of Texas at Arlington Arlington, Texas M.S. Computer Science, GPA: 3.90/4.0 Aug 2023 – Dec 2025 National Institute of Technology, Calicut Kozhikode, Kerala B.Tech. Computer Science and Engineering, GPA: 7.86/10 Jul 2019 – May 2023 SKILLS

• Languages: Python, SQL, Bash, JavaScript, C++, YAML, Jinja

• Data Engineering: dbt, PySpark, Airflow, Kafka, AWS Glue, Delta Lake, Batch/Streaming Pipelines, Query Optimization, Data Modeling, Analytics Engineering, dbt Testing

• Cloud & Databases: Snowflake, PostgreSQL, MySQL, AWS (S3, EC2, Lambda, Redshift, Glue, Athena, Quick- Sight, EventBridge, SNS, SSM, CloudFormation), Azure (ADF, Databricks, Synapse), GCP (BigQuery, Com- poser)

• Visualization: Looker Studio, Power BI, QuickSight

• DevOps & Tools: Docker, Git, Linux, Shell Scripting

• AI & NLP: LangChain, LLaMA 3, Regex, NLP, Tabulate, Streamlit (planned)

• APIs: OpenRouteService API

EXPERIENCE

Research Assistant – AI for Supply Chain Optimization Arlington, Texas University of Texas at Arlington April 2025 – Present

• Built conversational AI app using LangChain and LLaMA 3 to automate facility location and shipping cost opti- mization.

• Developed Python tools with regex to extract variables from natural language for optimization models.

• Integrated OpenRouteService API for geospatial distance and cost calculations.

• Designed ReAct-style agent for multi-turn interaction and error recovery.

• Technologies: Python, LangChain, LLaMA 3, OpenRouteService, Streamlit (planned), Tabulate, Regex, NLP Data Engineer Jan 2022 – May 2023

Ceyline Shipping Services Pvt. Ltd. Navi Mumbai, India (Remote)

• Automated ingestion of 10K+ daily shipment records by developing an S3-triggered AWS Lambda function in Python that parsed uploaded CSVs and inserted clean data into RDS MySQL, reducing manual effort by over 90%.

• Converted raw CSV datasets into partitioned Parquet files using Athena CTAS and INSERT INTO queries, reduc- ing S3 storage costs by 65% and improving query performance by 70%.

• Scheduled Glue Python Shell jobs using EventBridge for daily transformations, ensuring timely updates to dash- boards and SLA reports.

• Implemented SNS notifications to alert the team of Lambda failures, reducing missed data loads by 25% and improving pipeline visibility.

• Practiced deploying sample AWS infrastructure with CloudFormation templates to provision S3, Lambda, and SNS, building foundational infrastructure-as-code skills. Data Engineer Intern Jun 2021 – Dec 2021

Ceyline Shipping Services Pvt. Ltd. Navi Mumbai, India (Remote)

• Developed an ETL pipeline using Python and Apache Airflow to process 10K+ shipment records per day. 1

• Optimized SQL queries used in logistics dashboards, reducing average execution time by 25%.

• Automated S3 ingestion workflows using Boto3, ensuring reliable delivery of 50+ daily log files with zero data loss.

• Applied data transformations (type casting, standardization, enrichment) and quality checks (nulls, duplicates, value ranges), exporting clean data to Parquet format for reporting and analytics. PROJECTS

Azure Data Engineering Pipeline – Adventure Sales Data Technologies: Azure Data Factory, Databricks, Synapse, PySpark, Power BI, Delta Lake

• Designed a scalable ETL pipeline using Azure Data Factory to ingest data from CSV and GitHub into Azure Data Lake Gen2, following the Medallion Architecture (Bronze, Silver, Gold).

• Transformed and optimized data in Azure Databricks (PySpark), leveraging Delta Lake for ACID compliance and schema evolution.

• Built external tables and views in Azure Synapse Analytics, enabling efficient querying and structured analysis.

• Integrated Power BI with Synapse Analytics, creating real-time dashboards for business intelligence.

• Optimized data storage and performance, reducing query execution time by 50% and improving scalability and governance.

Stock Market Real-Time Pipeline

Technologies: Kafka, AWS S3, Glue, Athena, EC2, Python, SQL

• Developed an end-to-end real-time data pipeline to process and analyze stock market data using Apache Kafka and AWS services.

• Implemented Kafka Producers & Consumers to stream real-time stock market data, ensuring low-latency data ingestion.

• Stored processed data in AWS S3 and leveraged AWS Glue Crawlers & Glue Catalog for automated schema inference and metadata management.

• Integrated AWS Athena for efficient querying and analysis of real-time stock data using serverless SQL.

• Deployed and managed Kafka on AWS EC2, ensuring high availability and scalable data streaming.

• Optimized data processing and storage, reducing query execution time by 50% and enabling real-time analytics for financial insights.

Netflix Data Analytics Pipeline

Technologies: dbt, Snowflake, Amazon S3, SQL, Looker Studio, Power BI, Git

• Built an end-to-end ELT pipeline to analyze Netflix movie metadata and user ratings using dbt, Snowflake, and Amazon S3.

• Ingested raw CSV datasets into Amazon S3 and loaded them into Snowflake using COPY INTO for scalable data warehousing.

• Developed layered dbt models (raw, staging, dim, fact, mart) using Jinja and YAML for modular, reusable trans- formations.

• Created analytical tables including movie_analysis, genre_rating_distribution, user_engagement_summa and tag_relevance_analysis for BI and dashboarding.

• Applied 14+ dbt tests to validate data quality, covering null checks, uniqueness, referential integrity, and tag relevance logic.

• Connected final models to Looker Studio and Power BI to create dashboards visualizing top-rated movies, genre trends, and user behavior.

• Used GitHub for version control and dbt CLI for orchestration (dbt run, dbt test, dbt compile) to enable CI-ready, reproducible workflows.

Contact this candidate