HEMANTH TANDRA
Data Engineer
Location: St Louis, MO Mobile: 512-***-**** Email: ***************@*****.*** LinkedIn SUMMARY:
Data Engineer with 4 years of experience building and optimizing data pipelines, leveraging R, Python, and SQL in Agile and SDLC environments. Skilled in Big Data tools (Hadoop, Spark) and cloud platforms (AWS, Azure) to enable scalable data solutions. Expertise in machine learning models and data visualization using Tableau and Power BI for actionable insights. Proficient in ETL processes (SSIS) and (MongoDB, MySQL), with an intense data transformation and reporting foundation. SKILLS: CERTIFICATIONS:
Methodology: SDLC, Agile, Waterfall Digital Cloud Leader Programming Language: R, Python, SQL Associate Cloud Engineer IDE’s: PyCharm, Jupyter Notebook, Visual Studio Code AZ-900 Fundamentals Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, Pig Machine Learning Theory: Linear; Logistic Regression, Decision Trees, KNN, Clustering, XGBoost Cloud Technologies / ETL Tools: SSIS, AWS, Azure, GCP Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow Reporting Tools: Tableau, Power BI, SSRS
Database: MongoDB, MySQL
Data Science Skills: Data Preprocessing, Data Transformation, Data Modelling, Data Analysis, Data Visualization, NLP Other Tools: Git, MS Office, Anaconda, Jira, Jenkins EDUCATION:
Master of Science in Computer Science Jan 2023 - May 2024 Southeast Missouri State University Saint Louis, MO Bachelor in Computer Science May 2017 - May 2021
Nalla Malla Reddy Engineering College, Hyderabad, Telangana, India EXPERIENCE:
ServiceNow, MO Nov 2023 – Current Data Engineer
Championed Agile SDLC projects using JIRA Agile (Scrum), orchestrating cross-functional teams and capitalizing on AWS Glue to bolster data pipeline reliability, achieving a 15% reduction in data downtime.
Pioneered end-to-end ETL workflows with SSIS and AWS Lambda, revolutionizing data extraction, transformation, and loading processes, curtailing manual effort by 40%.
Boosted data processing speed by 30% by streamlining Python and SQL pipelines through innovative ETL methods.
Enhanced scalability and reliability by implementing Docker and Kubernetes for pipeline containerization.
Devised and sustained a data lake on Amazon S3, coupled with Redshift for scalable analytics, facilitating cross-platform data sharing and minimizing query latency by 30%.
Formulated data warehousing solutions using AWS Athena, boosting analytics query performance by 25%.
Expedited the processing of large datasets using Hadoop, MapReduce, and Apache Spark via AWS EMR, accelerating data analysis cycles by 25%.
Crafted dynamic data visualizations using Power BI and Tableau, delivering actionable insights that enhanced decision-making efficiency by 20%.
Implemented unit tests and participated in code reviews to ensure code quality and reliability.
Utilized CI/CD practices to automate deployment and integration processes, ensuring timely updates and enhancements to data systems. Infosys, Hyderabad, India March 2021 – Nov 2022 Associate Systems Engineer
Spearheaded the design and management of both batch and streaming data ingestion pipelines using Google Dataflow and Google Pub/Sub, integrating data seamlessly into GCP to fortify data reliability.
Architected and secured data storage solutions in Google Cloud Storage Buckets, enabling efficient retrieval through SQL-based metadata queries and ensuring data integrity.
Optimized GCP BigQuery analytics workflows, reducing costs by 20% and enhancing query performance.
Monitored and fine-tuned system performance with Google Stackdriver, maintaining 99.9% uptime.
Harmonized data pipelines by leveraging metadata from Google Firestore, creating a more adaptable and scalable architecture.
Automated deployment pipelines using CI/CD workflows in GCP, enhancing release reliability with robust rollback strategies.
Collaborated on root cause analysis for production incidents, utilizing Python and SQL to diminish incident reoccurrence by 15%. Infinite Infolab, India Dec 2019 – March 2021 Data Engineer
Orchestrated the development and management of complex ETL pipelines using Azure Data Factory, SSIS, and Python, slashing data ingestion times by 30%.
Established a scalable data lake on Azure Data Lake Storage (ADLS), managing over 50TB of data and optimizing storage costs through strategic data management.
Enhanced data efficiency by optimizing query performance with Azure Synapse and SQL Database and developing predictive analytics in Azure Databricks, reducing training time and boosting model accuracy.
Investigated and ameliorated performance issues with Azure Metrics Advisor and SQL-based profiling, enhancing pipeline efficiency by 15%.
Processed and refined unstructured data using Python NLP libraries, improving data readiness and reducing redundancy by 30%.
Visualized data dynamically using Power BI and Tableau, providing insights that bolstered operational decision-making by 20%.
Administered containerization of pipeline components using Docker and Kubernetes, ensuring robust and scalable deployments across Azure environments.