Sampath Kumar Kolichalam
Salt Lake City, UT 330-***-**** *******.******@*****.*** LinkedIn GitHub SUMMARY
Data Engineer with 3+ years' experience designing scalable, production-ready data pipelines using PySpark, Apache Airflow, and Hadoop. Expertise in ETL transformations, CDC implementations, and streaming data pipelines to transform raw data into analytic-ready formats. Proven track record in improving data quality and system performance while integrating robust security measures and containerized deployments.
EDUCATION
Kent State University, Kent, Ohio Aug 2023 - Dec 2024 Master of Science, Computer Science (GPA: 3.8/4.0) Lingaya's Vidyapeeth, Delhi, India Aug 2019 - Jun 2023 Bachelor of Technology, Computer Science & Engineering (GPA: 9.2/10) WORK EXPERIENCE
JOBSNPROFILES LLC DATA ENGINEER May 2024 - Present
• Engineered a batch parser pipeline on a 40-Node Hadoop cluster to process 30K+ resumes/day, achieving a 25% improvement in throughput and demonstrating scalable ETL transformations.
• Constructed a Kafka-based real-time resume parser on AKS to handle 20K+ resume events per session, reflecting strong experience in streaming data pipelines.
• Developed REST APIs using Spring Boot Microservices and SQL Server, reducing authentication API latency by 35%.
• Instituted robust security measures with RBAC, encryption, and input validation, thereby enhancing API resilience with JWT/OAuth.
• Optimized Apache Airflow and PySpark jobs for ETL tasks, reducing task failures by 40% and improving scheduling reliability for data processing workflows.
• Automated resume validation and deduplication via database agents, improving data quality and match accuracy by 30%.
• Delivered actionable insights using Power BI and Azure Monitor, increasing pipeline observability and system uptime by 45%. NITYA SOFTWARE SOLUTIONS DATA ENGINEER Jan 2022 - Jul 2023
• Built ETL pipelines using Azure Data Factory and SQL Server to automate billing and payroll processes across departments.
• Reduced operational manual effort by 45% through the standardization of data templates and automation of transformation logic.
• Enhanced invoice data quality by 35% using PySpark-based cleansing and validation scripts on Azure Databricks, aligning with effective ETL transformations.
• Automated daily ingestion jobs with Azure Functions, boosting data freshness and report accuracy by 40%.
• Optimized data pipelines with PySpark and SQL to improve dashboard load speed by 30%, illustrating proficiency in batch processing.
• Created interactive Power BI dashboards in collaboration with finance teams, reducing manual reporting time by 50%.
• Collaborated with analysts and developers in Agile sprints, accelerating feature delivery by 40% using Azure DevOps. TECHNICAL SKILLS
• Programming & Scripting: Java, Python, Pandas, SQL, Linux, Bash, Shell Scripting, Scala
• Databases & Storage: MySQL, PostgreSQL, MongoDB, Spark SQL, Apache Solr, Delta Lake
• Big Data Engineering: Hadoop, PySpark, Airflow, Kafka, Databricks, Snowflake, ETL/ELT Workflows, Change Data Capture, Apache Hudi, Apache Griffin
• DevOps & Cloud: Microsoft Azure (Blob Storage, Data Factory, Functions, SQL Server, ADLS Gen2, Synapse, Fabric, AKS), AWS, Docker, Kubernetes, Git, GitHub Actions, CI/CD, AWS Skillset, AWS Deequ
• Monitoring, Analytics & Visualization: Azure Monitor, Azure Log Analytics, Microsoft Power BI
• Applied AI & LLMs: Prompt Engineering, Embeddings, Vector DB (Qdrant), Semantic Search, Multi Agents, LLMs
• Tools & API's: REST API's, SpringBoot, MicroServices, RBAC, JWT, Pytest, DBT, Agile, SDLC, JIRA, Thunder Client CERTIFICATIONS
• Microsoft Azure AI Professional Certification:Microsoft Learning (Cloud Computing, Azure Services, Azure AI).
• MySQL Developer Certification:Simplilearn (Relational DB's, Data Modeling, Joins, Query Optimisation).
• Spring Boot & Microservices Developer:Udemy (Spring Boot, REST APIs, JWT, JUnit, Thunder Client).
• Docker Foundations Professional Certification:Docker, Inc (Docker CLI, Docker Compose, Containerization).