Yashwanth Reddy Billa
Email: *********.****@*****.***
PH: 720-***-****
Sr Data Engineer
Professional Summary:
Experienced STIBO STEP Developer with 9+ years in Data Engineering, Master Data Management (MDM), and Application Development. Strong expertise in STIBO STEP implementation, customization, data modeling, and integration with enterprise systems. Adept at designing, developing, and optimizing MDM solutions while ensuring data quality, accuracy, and governance. Hands-on experience in Azure Data Factory, EventHub, JSON exports, SQL, and performance tuning. Proficient in Java, JavaScript, Python, XML, and SQL for configuring business rules and automating data workflows.
Proficient in Java, Spring Boot, JavaScript, XML, SQL, and integration tools like SFTP/WINSCP. Strong background in MDM concepts, system enhancements, production support, and Agile methodologies. Adept at troubleshooting, error handling, and improving data governance policies within enterprise environments.
Based on my experience, I have over 5+ years of hands-on expertise with the Stibo Master Data Management (MDM) system, including implementation and configuration. My work in Master Data Management spans more than 5 years, where I’ve successfully led data integration, governance, and quality initiatives. I have 8+ years of experience with SQL, extensively using it for querying and managing relational databases. Additionally, I bring 6+ years of experience with Python for scripting, automation, and data processing, along with 6 years of experience in Java.
My background in data integration and transformation spans 7 years, during which I’ve developed and managed data pipelines for both on-premises and cloud environments. I have 6 years of experience in data migration, ensuring seamless transitions between systems, and 5 years working on data governance and data quality, where I’ve enforced policies and frameworks to maintain data integrity across platforms.
Hands-on experience in building the infrastructure necessary for the best data extraction, transformation, and loading from a range of data sources using NoSQL and SQL from AWS & Big Data technologies (Dynamo, Kinesis, S3, HIVE/Spark).
Strong knowledge in working with Amazon EC2 to provide a complete solution for computing, query processing, and storage across a wide range of applications.
Expertise in configuring and coding within the STEP Platform, including business rules in JavaScript and Python.
Spearheaded the implementation and customization of the Stibo Systems STEP Platform, developing and optimizing master data workflows to improve data governance and streamline operations.
Defined and documented MDM governance policies in collaboration with the Data Governance team, ensuring compliance with industry standards and reducing data-related risks.
Experienced in configuring Spark Streaming to receive real-time data from Apache Kafka and store the stream data to HDFS and expertise in using spark-SQL with various data sources like JSON, Parquet, and Hive.
Collaborated cross-functionally with Data Governance, Business Intelligence & teams to ensure data accuracy, compliance, and governance across the organization.
Led the configuration and coding of business rules within the STEP platform using JavaScript and Python, ensuring seamless integration and operation across data processes.
Extensively used Spark Data Frames API over the Cloudera platform to perform analytics on Hive data and used Spark Data Frame Operations to perform required Validations in the data.
Expertise in developing production-ready Spark applications utilizing Spark-Core, Data Frames, Spark-SQL, Spark-ML, and Spark-Streaming API.
Expert in using Azure Databricks to compute large volumes of data to uncover insights into business goals.
Developed Python scripts to do file validations in Databricks and automated the process using Azure Data Factory.
Experienced in integrating data from diverse sources, including loading nested JSON-formatted data into Snowflake tables, using the AWS S3 bucket and the Snowflake cloud data warehouse.
Configured Snow pipe to pull the data from S3 buckets into the Snowflake table and stored incoming data in the Snowflake staging area.
Expertise in developing Python scripts to build ETL pipelines and Directed Acyclic Graph (DAG) workflows using Airflow.
Participated in the evaluation and selection of new MDM tools and platforms, providing key insights that guided technology investment decisions for the organization.
Orchestration experience in scheduling Apache Airflow DAGs to run multiple Hive and spark jobs, which independently run with time and data availability.
Practical understanding of Data modeling (Dimensional & Relational) concepts like Star-Schema Modelling, Fact, and Dimension tables.
Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
Strong experience writing complex map-reduce jobs including the development of custom Input Formats and custom Record Readers.
Good exposure to usage of NoSQL databases column-oriented Cassandra and MongoDB (Document Based DB).
Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, and Spark SQL for Data Mining, Data Cleansing, Data Munging, and Machine Learning.
Expertise in the development of various reports, and dashboards using various Power BI, and Tableau.
Excellent Communication skills, Interpersonal skills, problem-solving skills, and a team player. Ability to quickly adapt to new environments and technologies.
Data Engineering: BigQuery, Dataflow, Dataproc, DBT, Airflow (Cloud Composer), Spark, Hadoop, Hive, MapReduceProgramming: Python (PySpark, Pandas), SQL, YAML, JavaData Tools: Dataprep, Cloud Storage, Data StudioOrchestration & Automation: Airflow DAGs, Apache Oozie, CI/CD (DevSecOps)
Professional Experience
Sr. STIBO STEP Developer
Lowes, Mooresville, NC April 2023 to Present
Responsibilities:
Configured and customized STEP STIBO workflows to streamline master data processes and improve governance.
Developed business rules and validations within STEP to ensure high data accuracy and consistency.
Integrated STEP STIBO with various ERP and CRM systems to enhance data visibility across the organization.
Provided extensive production support by troubleshooting STEP STIBO-related issues, minimizing downtime and improving business continuity.
Monitored and resolved interface issues related to SFTP/WINSCP processes, ensuring seamless data transfers.
Led cross-functional collaboration to enhance the implementation of MDM best practices within the STEP environment.
Experience in working with Spark applications like batch interval time, level of parallelism, and memory tuning to improve processing time and efficiency.
Implemented Azure Data Factory and EventHub integrations to support ETL processes for subscribing systems.
Created data models aligned with Auto Care standards to enhance data visibility and governance.
Developed business rules in JavaScript to automate data validation and transformations.
Improved data quality and accuracy by implementing automated monitoring metrics.
Configured SQL-based exports ensuring correct formatting for Salesforce and DB2 integration.
Used Azure DevOps and VSTS (Visual Studio Team Services) for CI/CD, Active Directory for authentication, and Apache Ranger for authorization.
Experience in working with Spark applications like batch interval time, level of parallelism, and memory tuning to improve processing time and efficiency.
Used Scala for amazing concurrency support, and Scala plays a key role in parallelizing processing of the large data sets.
Created branching strategies while collaborating with peer groups and other teams on shared repositories.
Developed various interactive reports using Power BI based on Client specifications with row-level security features.
Environment: Azure (Data Lake, HDInsight, SQL, MDM, Data Factory), STEP, Databricks, Cosmos DB, Git, Blob Storage,Power BI, Scala, Hadoop, Spark, PySpark, Airflow,Google Cloud Platform, BigQuery, Airflow, DBT, Dataflow, Dataproc, Python (PySpark, Pandas), YAML, Tableau.
Data Engineer-Master Data Management (MDM)
Chewy Dania Beach, FL September 2021 to March 2023
Responsibilities:
Designed and implemented XML-based transformations for STEP workflows, enhancing automation and data consistency.
Developed complex JavaScript-based business rules, automating data validation and reducing manual efforts.
Led production support efforts, addressing JIRA tickets efficiently to maintain SLA adherence and minimize disruptions.
Implemented Azure Data Factory and EventHub integrations to support ETL processes for subscribing systems.
Created data models aligned with Auto Care standards to enhance data visibility and governance.
Developed business rules in JavaScript to automate data validation and transformations.
Improved data quality and accuracy by implementing automated monitoring metrics.
Configured SQL-based exports ensuring correct formatting for Salesforce and DB2 integration
Created and maintained comprehensive documentation for incident and problem management, improving resolution tracking and knowledge sharing.
Conducted root cause analysis for recurring issues and implemented long-term solutions to enhance system stability.
Enhanced STEP configuration to support data model changes and improve integration with external systems.
Used JavaScript to automate error handling and logging mechanisms in the MDM process, ensuring that exceptions were captured and addressed promptly, improving overall system reliability.
Scheduling Spark/Scala jobs using Oozie workflow in Hadoop Cluster and generated detailed design documentation for the source-to-target transformations.
Designed the reports and dashboards to utilize data for interactive dashboards in Tableau based on business requirements.
Environment: AWS EMR, S3, EC2, MDM, Lambda, Apache Spark, STEP, Spark-Streaming, Spark SQL, Python (Pandas, PySpark), YAML, Scala, Shell scripting, Snowflake, AWS Glue, Oracle, Git, Tableau, BigQuery.
Big Data Engineer
Early Warning's, Scottsdale, AZ February 2019 to August 2021
Responsibilities:
Designed and implemented robust MDM integration strategies within the STEP STIBO ecosystem, improving data synchronization across systems.
Developed complex SQL queries to extract, analyze, and optimize MDM datasets for reporting and analytics.
Implemented Azure Data Factory and EventHub integrations to support ETL processes for subscribing systems.
Created data models aligned with Auto Care standards to enhance data visibility and governance.
Developed business rules in JavaScript to automate data validation and transformations.
Improved data quality and accuracy by implementing automated monitoring metrics.
Configured SQL-based exports ensuring correct formatting for Salesforce and DB2 integration.
Configured and fine-tuned error handling and logging mechanisms in Java-based applications, ensuring proactive issue detection and resolution.
Partnered with business teams to enhance master data governance policies, aligning them with organizational compliance standards.
Scheduled and managed batch jobs using Apache Airflow and Oozie workflows, ensuring seamless execution of data operations.
Led initiatives to automate data quality checks, improving overall data accuracy and reliability.
Developed customized scripts to enhance data import/export capabilities within the STEP STIBO platform.
Developed and maintained a library of custom Airflow DAG templates and operators, which improved consistency and code quality across the team.
Led a team of three data engineers in designing and implementing a complex data ingestion and processing pipeline for a new data source, which reduced time to insights by 50%.
Analyzed the data in HDFS using Apache Hive, which is a data warehouse software that facilitates querying and managing large datasets.
Converted Hive queries into PySpark transformations using PySpark RDDs and Data Frame API.
Monitored the data pipeline and applications using Grafana.
Configured Zookeeper to support distributed applications.
Used functional programming concepts and the collection framework of Scala to store and process complex data.
Used GitHub as a version control system for managing code changes.
Developed visualizations and dashboards using Tableau for reporting and business intelligence purposes.
Environment: S3 buckets, red shift, Apache flume, MDM, PySpark, Oozie, Tableau, Scala, Spark RDDs, Hive, HiveQL, HDFS, HQL, Scala, Zookeeper, Grafana MapReduce, Sqoop, GitHub.
ETL Developer
Genpact, Hyderabad, India July 2016 to November 2018
Responsibilities:
Developed and optimized ETL processes for MDM data transformations, ensuring accurate and efficient data processing.
Utilized Java and Python scripts to validate, cleanse, and migrate large-scale datasets, reducing processing errors.
Implemented automation solutions using SFTP/WINSCP and Git, streamlining deployment and version control workflows.
Participated in Agile development cycles, collaborating with cross-functional teams to deliver features on time.
Conducted performance tuning for ETL jobs, enhancing execution efficiency and reducing processing time.
Provided support for production issues, ensuring rapid resolution and minimal business impact.
Developed and managed STEP STIBO connectors to enable seamless data exchanges between multiple applications.
Worked with Data modeler in developing STAR Schemas.
Involved in analyzing the existence of the source feed in the existing CSDR database.
Handling a high volume of day-to-day Informatica workflow migrations.
Review Informatica ETL design documents and work closely with development to ensure correct standards are followed.
Worked on SQL queries to query the Repository DB to find the deviations from Company's ETL Standards for the objects created by users such as Sources, Targets, Transformations, Log Files, Mappings, Sessions, and Workflows.
Leveraging the existing PL/SQL scripts for the daily ETL operation.
Experience in ensuring that all support requests are properly approved, documented, and communicated using the QMC tool. Documenting common issues and resolution procedures
Extensively involved in enhancing and managing Unix Shell Scripts.
In converting the business requirement into a technical design document.
Documenting the macro logic and working closely with Business Analyst to prepare BRD.
Involved in setting up SFTP setup with the internal bank management.
Building UNIX scripts in cleaning up the source files.
Involved in loading all the sample source data using SQL loader and scripts.
Data Analyst- Python
Karvy Data Management Private Limited, India October 2014 to June 2016
Responsibilities:
Experience working on projects with machine learning, big data, data visualization, R and Python development, UNIX, and SQL.
Performed exploratory data analysis using NumPy, matplotlib, and pandas.
Expertise in quantitative analysis, data mining, and the presentation of data to see beyond the numbers and understand trends and insights.
Experience analyzing data with the help of Python libraries including Pandas, NumPy, SciPy, and Matplotlib.
Created complex SQL queries and scripts to extract and aggregate data to validate the accuracy of the data and Business requirements gathering and translating them into clear and concise specifications and queries.
Prepared high-level analysis reports with Excel and Tableau. Provides feedback on the quality of Data including identification of billing patterns and outliers.
Worked on sort & filters of tableau like Basic Sorting, basic filters, quick filters, context filters, condition filters, top filters, and filter operations.
Identify and document limitations in data quality that jeopardize the ability of internal and external data analysts' ability; wrote standard SQL Queries to perform data validation; created excel summary reports (Pivot tables and Charts); and gathered analytical data to develop functional requirements using data modeling and ETL tools.
Read data from different sources like CSV files, Excel, HTML pages, and SQL and performed data analysis and wrote to any data source like CSV file, Excel, or database.
Experience in using Lambda functions like filter, map, and reduce with pandas Data Frame and performing various operations.
Used Pandas API for analyzing time series. Creating regression test framework for new code.
Developed and handled business logic through backend Python code.
Environment: Python, SQL, UNIX, Linux, Oracle, NoSQL, PostgreSQL, and python libraries sch as PySpark, and NumPy.
Education:
Bachelors- Jawaharlal Nehru Institute of Technology, Hyderabad, 2010-2014