Thirumala Velpoor
*************@*****.*** 913-***-**** www.linkedin.com/in/thirumala-v
PROFESSIONAL SUMMARY
Experienced Data Engineer with about 5 years of expertise in building and streamlining data pipelines and architectures for large-scale data processing. Proficient in managing complex datasets, enhancing data quality, and applying advanced analytics to deliver actionable insights. Hands-on with modern cloud platforms and big data tools, and committed to delivering efficient, reliable data solutions in Agile environments. SKILLS
Programming Languages: Python, SQL, Scala
Big Data Technologies: Spark, Databricks, Cloudera, Hive, Impala, HBase, Kafka, Oozie, Airflow Cloud Platforms: Azure, AWS, GCP
Databases: SQL Server, Oracle, MySQL, Teradata
ETL / Data Engineering: Databricks, Azure Data Factory, Kafka, Sqoop, Flume, Oozie, Pig Data Visualization / BI Tools: Power BI, Tableau, Data Studio Data Formats: JSON, Parquet, AVRO, XML, CSV
DevOps / CI-CD: Jenkins, Git, Bitbucket, Terraform, Ansible, Docker, Kubernetes Web Technologies: JDBC, JSP, Servlets, Struts (Tomcat, JBoss) Database Design / Data Warehousing: Database Design, Data Warehouse Design IDE / Notebooks: Eclipse, IntelliJ, PyCharm, Jupyter, Databricks Notebooks Methodologies: Agile, Scrum, Iterative Development, Waterfall Operating Systems: Windows, Linux, Unix
Others: Machine Learning & Analytics, Data Modeling, Data Quality & Governance PROFESSIONAL EXPERIENCE
Client: Walmart, Dallas, Texas, USA May 2024 - Present Role: Azure Data Engineer
Responsibilities:
• Worked on gathering security (equities, options, derivatives) data from different exchange feeds and storing historical data.
• Designed and executed Jenkins pipelines for automated unit testing, integration testing, and static code analysis.
• Utilized T-SQL for MS SQL Server and ANSI SQL to work across diverse database systems.
• Built responsive user interfaces with React for dynamic web applications, ensuring seamless user experiences.
• Worked with WebSocket and Cloud Native technologies (WCNP) to create real-time data streaming applications.
• Collaborated with cross-functional teams to integrate React components with backend services, enhancing application performance and usability.
• Created data pipelines, ensuring data integrity and generating reports to reflect data passage.
• Storing different configs in a NoSQL database, MongoDB DB, and manipulating the configs using PyMongo.
• Using Azure Cluster services, Azure Data Factory V2 ingested a large amount and diversity of data from diverse source systems into Azure Data Lake Gen2.
• Used Kafka functionalities like distribution, partition, and replicated commit log service for messaging systems by maintaining feeds. Involved in loading data from REST endpoints to Kafka.
• Used Continuous Delivery Pipeline. Deployed microservices, including provisioning Azure environments and de- developed modules using Python scripting and Shell Scripting. Involved in the entire lifecycle of the projects, including Design, Development, Deployment, Testing and Implementation, and support.
• Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects.
• Spearheaded HBase setup and utilized Spark and Spark SQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy.
• Build Jenkins jobs for CI/CD Infrastructure for GitHub repos.
• Worked on migration of data from on-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
• Developed tools using Python, Shell scripting, and XML to automate tasks.
• Used Pig as an ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS and assisted the Manager by providing automation strategies, Selenium/Cucumber Automation and JIRA reports.
• Ensured data quality and accuracy with custom SQL and Hive scripts and created data visualizations using Python and Tableau for improved insights and decision-making.
• Set up base Python structure with the create Python-App package, SRSS, PySpark.
• Responsible for loading the data from the BDW Oracle database, Teradata into HDFS using Sqoop. Implemented
• AJAX, JSON, and JavaScript to create interactive web screens.
• Used PowerBI as a front-end BI tool to design and develop dashboards, workbooks, and complex aggregate calculations. Developed Monitoring and notification tools using Python.
• Developed a Front-End GUI as a stand-alone Python application.
• Experienced in adjusting the performance of Spark applications for the proper batch interval time, parallelism level, and memory tuning. Exploring with PySpark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's. Client: Citi Bank, Hyderabad, India Jan 2022 - Jul 2023 Role: Application Developer/ Data Engineer
Responsibilities:
• Analyzing the functional requirement documents from Business.
• Conducted performance tuning and optimization of Kubernetes and Docker deployments to improve overall system performance.
• Worked on SQL and PL/SQL for backend data transactions and validations
• Performed load testing and optimization to ensure the pipeline's scalability and efficiency in handling large volumes of data.
• Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
• Responsible for Building and Testing of applications. Experience in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing and configuring various packages in python (Teradata, MySQL, MySQL connector, PyMongo and SQLAlchemy).
• Involved in monitoring and scheduling the pipelines using Triggers in Azure DataFactory.
• Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS.
• Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers.
• Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export.
• Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical re- requirements, resolve technical problems and suggest Big Data based analytical solutions.
• Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance. Inte- grated data storage solutions.
• Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, and Ansible. Client: Colryut Group, Hyderabad, India May 2020 - Dec 2021 Role: Data Engineer
• Responsibilities: Developed methods for Create, Read, Update and Delete (CRUD) in Active Record.
• Designed and implemented a fault-tolerant data processing framework leveraging Kubernetes and Docker, reducing downtime and increasing system reliability by 25%.
• Well-versed with various aspects of ETL processes used in loading and updating Oracle data warehouse.
• Presented the project to faculty and industry experts, showcasing the pipeline's effectiveness in providing real-time insights for marketing and brand management. Written queries in MySQL and Native SQL.
• Build and deployed the code Artefacts into the respective environments in the Confidential Azure cloud.
• Used Azure Data Factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day requirements, and loaded them to Azure Data Lakes.
• Developed business logic using Kafka & Spark Streaming and implemented business transformations. Supported Continuous storage in ADLS and configured Snapshots and wrote entities in spark along with named queries to interact with database. Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker.
• Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements,
• design, development, deployment, and analysis of the application.
• Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods.
• Analyzed the SQL scripts and designed it by using Spark SQL for faster performance. Designed GIT branching strategies, merging per the needs of release frequency by implementing GIT flow workflow on Bit bucket.
• Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and net- work devices using Apache Flume and stored the data into HDFS for analysis. EDUCATION
Masters in Computer and Information Systems
University of Central Missouri, Warrensburg, Missouri CERTIFICATIONS
AWS Certified Data Engineer
Google Cloud Data Engineer Professional Certificate