Data Engineer Engineering

Location:

Frisco, TX, 75034

Posted:

July 16, 2025

Contact this candidate

Resume:

PRANAY REDDY YELTI

Data Engineer

Phone: +1-469-***-****

E-Mail: **************@*****.***

LinkedIn:www.linkedin.com/in/yeltipranay

PROFILE SUMMARY:

Analytical and process-oriented Data Engineer with a demonstrated history of 10 years of expertise in Data Integration and Data Analytics in the financial services industry with solid understanding of Data warehousing technology with Informatica PowerCenter. Data Analysis, Data Mining, Data Visualization, Data Pipelines and Business Objects as Reporting tool.

•Experienced with the entire Software Development Lifecycle (SDLC) of applications: gathering requirements, root cause analysis of issues, conceptual and detail design, documentation, development, verification, and testing.

•Expertise in handling Data sources likes Oracle, DB2, MySQL, SQL Server databases.

•Conducted training sessions for end-users and team members on ServiceNow functionalities, improving user adoption and satisfaction rates.

•Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.

•Developed and maintained data pipelines using Google Cloud Dataflow and Apache Beam, enabling real-time data processing for analytics and reporting.

•Implemented ITIL best practices for incident, problem, and change management within data engineering projects, enhancing service delivery and minimizing downtime.

•Configured ServiceNow modules to align with organizational needs, including asset management and configuration management, to improve operational efficiency.

•Fostered effective communication between data engineering and IT operations teams, ensuring alignment on service level agreements (SLAs) and improving overall service quality.

•Led change management initiatives for data pipeline updates and infrastructure changes, utilizing ITIL methodologies to assess risk and ensure minimal impact on production environments.

•Designed and optimized data models in Google BigQuery, resulting in improved query performance and reduced costs by X% through effective data partitioning and clustering.

•Designed and implemented change management workflows in ServiceNow, ensuring that changes to data pipelines and infrastructure were thoroughly assessed and documented.

•Designed and deployed private cloud infrastructure using OpenStack, enabling scalable resource allocation for data processing workloads and enhancing operational efficiency.

•Experience with designing ELT transformations/data modeling and expertise in SQL, PL/SQL.

•Effective in working with business units to create business analysis, system requirements and project plans. Have extensively worked in developing ETL for supporting Data Extraction,

•Quickly adapt to new environment.

•Managed and optimized compute resources with OpenStack Nova, ensuring high availability and performance of data processing applications.

•Implemented scalable storage solutions using OpenStack Swift and Cinder, resulting in improved data accessibility and faster retrieval times for large datasets.

•Experience in creating PySpark scripts using IntelliJ IDE and executing them.

•Collaborated with IT and business teams to define requirements and ensure successful ServiceNow implementations that meet data engineering needs.

•Experience in developing web applications by using Python, C++, XML, CSS, HTML.

•Experience in Software Development, Analysis Datacenter Migration, Azure Data Factory (ADF) V2. Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Expertise in using Azure Data Factory for complex migration work and creating Data Pipeline, Data Flows, Data bricks to perform actions.

•Created and managed a knowledge base for data engineering best practices, documentation, and troubleshooting guides, improving team efficiency and reducing repetitive issues.

•Implemented efficient ETL workflows with Google Cloud Storage, Cloud Functions, and BigQuery, automating data ingestion and transformation for multiple data sources.

•Established data security protocols using Google Cloud IAM and VPC Service Controls, ensuring compliance with industry standards and safeguarding sensitive data.

•Experience in creating solutions for real-time data streaming solutions using Apache Spark/Spark Streaming, Spark SQL & Data Frames, and Kafka.

•Event Hubs and Stream Analytics: Developed real-time data processing pipelines using Azure Event Hubs and Azure Stream Analytics, capturing streaming data and enabling low-latency processing for time-sensitive applications.

•Established and monitored key performance indicators (KPIs) for data engineering services in line with ITIL frameworks, ensuring adherence to defined service levels.

•Cost Management and Optimization: Monitored and optimized cloud spending with Azure Cost Management, identifying resource inefficiencies and implementing cost-saving measures for data storage and processing workloads.

•IoT Data Processing: Processed IoT data streams using Azure IoT Hub, integrating data with Azure Data Lake and Azure Synapse for advanced analytics and reporting.

•Provided training and guidance to team members on ITIL processes and principles, fostering a culture of continuous improvement and service excellence.

•Serverless and Scalable Solutions: Leveraged serverless architectures (e.g., Azure Functions) for scalable, cost-effective solutions that dynamically adjust to demand, reducing overhead in data-intensive workflows.

•Leveraged ITIL-aligned tools (e.g., ServiceNow, JIRA) for managing service requests and incidents, streamlining workflows and enhancing visibility into project status.

•Experience in integrating Flume and Kafka in moving the data from sources to sinks in real-time.

•Proficient at using Spark APIs to explore, cleanse, aggregate, transform and store machine sensor data.

•Good knowledge of Data warehouse concepts and principles - Star Schema, Snowflake, SCD, Surrogate keys.

•ExperienceimplementingCI/CD(continuousintegrationandcontinuousdeployment)andautomationforMLmodel deployment using Jenkins, Git, Docker, and Kubernetes.

•Implemented integration between ServiceNow and various data sources (e.g., databases, APIs) to streamline data ingestion and enhance reporting capabilities.

•Developed dashboards and reports in ServiceNow to provide real-time insights into data engineering metrics, enabling data-driven decision-making for stakeholders.

•DevelopedApacheMavenProjectmanagementtoolPOMfiletoautomatethebuildprocessfortheentireapplicationsuchas managing project libraries, compiling, preparing warfiles, and deploying.

•Working knowledge of issue tracking Software such as JIRA and Buganizer and version control systems such as git and Bit bucket.

TECHNICAL SKILLS:

Data Integration

Azure Data Factory, Informatica Power Center 10.5/10.1.1 (Designer, Workflow Manager, Workflow Monitor)

Operating System

Windows, LINUX, UNIX

Languages

PL/SQL, SQL, and Python

Databases

Oracle 11g/12c, SQL Server, MySQL, MS Access, and Postgres.

Methodologies

Dimension Modeling, ER Modeling, Star Schema Modeling, Snowflake Modeling

Cloud

Azure Data Lake, Azure, ADF, Azure Data Bricks and Snowflake

Tools & Utilities

Informatica PowerCenter 10.5(ETL Tool), MDM, SQL Developer, DB Viewer.

Data Visualization

Tableau, Power BI

Scripting languages

Unix Shell Scripting, Python Scripting

Version Control

GIT, IntelliJ, Eclipse

PROFESSIONAL EXPERIENCE:

Data Engineer.

Vizio, Dallas, TX. Aug 2023 to till date

•Experience in Waterfall Model, Agile methodologies such as Scrum and Test-Driven Development.

•Performed Unit tests, integrated testing, and validated results.

•Gathering the functional requirements and determining the scope of the project by working with business.

•Develop efficient spark programs in Python to perform batch processes on huge unstructured datasets.

•Implemented ADF pipelines for data-driven workflows in cloud for orchestrating and automating data movement and data transformation.

•Managed Large-Scale Data Pipelines on Cloud Infrastructure: Designed and deployed large-scale ETL pipelines on cloud IaaS platforms (e.g., AWS EC2, Azure VMs, Google Compute Engine) to handle data from multiple sources with minimal latency.

•Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

•Experience working with Informatica Power exchange integration with IICS to read data from Condense Files and load into Azure SQL Datawarehouse environment.

•Role-Based Access Control (RBAC): Implemented and managed RBAC policies on Azure, ensuring secure access to sensitive data in compliance with organizational and regulatory standards.

•Provisioned and Optimized Resources: Provisioned and scaled infrastructure resources (CPU, memory, storage) based on load, optimizing cost and performance in dynamic data processing environments.

•Created reusable ADF Pipelines

•Involved in creating Databricks ETL pipelines Using notebooks, Spark Data frames, Spark SQL, and Python scripting.

•Azure Monitor and Log Analytics: Configured monitoring and alerting systems using Azure Monitor and Log Analytics, providing visibility into data workflows and proactively identifying and resolving performance issues.

•Leveraged Java’s multithreading capabilities to enhance data processing tasks, allowing for concurrent processing of large datasets and reducing overall execution time.

•Data Governance with Azure Purview: Managed data governance through Azure Purview, ensuring data cataloging, lineage tracking, and compliance with regulatory requirements.

•Collaborated with data science teams to deploy machine learning models on Google AI Platform, integrating predictive analytics into data processing workflows.

•Automated Infrastructure Management with Terraform/CloudFormation: Implemented Infrastructure as Code (IaC) using Terraform and CloudFormation for consistent, version-controlled deployment of data infrastructure.

•By utilizing various ADF operations, such as IF Condition, For Each, Lookup, Get Metadata, Set Variable, Filter, and Switch, pipelines that adhere to business rules have been implemented.

•Streamlined Data Workflows with PaaS Solutions: Leveraged PaaS offerings (e.g., AWS Glue, Google BigQuery, Azure Data Factory) to develop, automate, and orchestrate data workflows, reducing development and maintenance time.

•Working knowledge and experience of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.

•Developed data models and implemented data storage solutions in SQL and NoSQL databases (e.g., PostgreSQL, MongoDB), ensuring data integrity and accessibility.

•Serverless Data Processing: Implemented serverless solutions (AWS Lambda, Azure Functions, Google Cloud Functions) for processing high-volume streaming data, improving efficiency and reducing infrastructure costs.

•Developed and maintained scalable, distributed ETL pipelines using Apache Spark (Core, SQL, Streaming) to process structured and semi-structured data from multiple sources including RDBMS, Kafka, and cloud storage (S3, ADLS).

•Engineered data pipelines to support machine learning model training, validation, and scoring using Spark MLlib and Python-based frameworks (Scikit-learn, XGBoost).

•Built feature engineering workflows using PySpark and Scala to preprocess high-volume transactional and time-series data for ML applications.

•Optimized Spark jobs by implementing partitioning strategies, broadcast joins, and caching, reducing overall pipeline execution time by up to 60%.

•Designed and implemented microservices in Golang for data ingestion and transformation, enabling modularity and scalability in data pipelines.

•Leveraged Scala's strong type system to reduce runtime errors and increase reliability in data processing applications.

•Utilized Scala's functional programming features (e.g., map, flatMap, reduce, Option, Try) to write concise, testable, and production-ready data processing logic.

•Built custom data ingestion frameworks in Scala to standardize onboarding of new data sources across structured (CSV, Parquet) and unstructured (JSON, XML) formats.

•

•Data Modeling in Fabric: Developed and maintained Fabric data models to support robust data analytics, ensuring data quality, governance, and compliance in enterprise-level environments.

•ETL/ELT Pipeline Optimization: Identified and optimized ETL/ELT bottlenecks to improve processing efficiency and reduce cost, leveraging cloud-native analytics for continual performance tuning.

•Load D-Stream data into Spark RDD and do in memory data Computation to generate output response.

•Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.

•Database Management and Optimization: Worked with managed PaaS databases like Amazon RDS, Azure SQL, and Google Cloud SQL to ensure high availability, scalability, and disaster recovery for data storage solutions.

•Developed monitoring solutions using OpenStack Telemetry (Ceilometer) to track resource usage and performance, proactively addressing issues to ensure uptime.

•Data Integration from SaaS Applications: Integrated and transformed data from popular SaaS applications (e.g., Salesforce, HubSpot, NetSuite) into data warehouses for analytics and business insights.

•Collaborated with DevOps teams to integrate CI/CD pipelines with OpenStack deployments, streamlining the development and deployment of data engineering solutions.

•API-Driven Data Ingestion: Created data ingestion workflows using SaaS provider APIs, standardizing data import processes for scalable and repeatable use cases.

•Wrote Python, Spark and PySpark scripts to build ETL pipelines to automate data ingestion, update data to relevant databases and tables.

•Worked with different file formats like text, Avro, ORC and parquet.

•Data Security and Compliance: Ensured compliance with data governance policies (GDPR, HIPAA) when transferring and processing sensitive data from SaaS platforms, prioritizing data security and privacy.

•Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.

•Implemented security best practices in OpenStack environments, including role-based access control (RBAC) and encryption, to protect sensitive data.

•Real-Time Analytics Pipelines: Developed real-time analytics pipelines using cloud-native tools like Google Dataflow, Azure Stream Analytics, and AWS Kinesis, enabling low-latency insights and reporting.

•Cloud Data Warehousing Solutions: Built and managed data warehouses (AWS Redshift, Google BigQuery, Azure Synapse) to store, organize, and optimize data for analytical querying and business intelligence.

•Worked in Agile teams to deliver data engineering solutions, participating in sprint planning and retrospectives to improve workflows and deliverables.

•Managed the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra.

•Worked with highly unstructured and semi-structured data and processed based on the customer requirement.

Environment: Spark, Python, Shell, SQL, Azure, ADLS, Azure Data Factory (ADF).

Data Engineer.

Optum, Nashville, TN. Jan 2022 – July 2023

•Responsible for preparing interactive Data Visualization reports using Tableau Software from different sources.

•Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store.

•Analyzed the SQL scripts and designed it by using PySpark SQL for faster performance.

•Experience in Developing Spark applications using Spark/PySpark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage, consumption patterns, and behavior.

•Azure Data Factory (ADF): Designed and managed end-to-end ETL/ELT workflows in Azure Data Factory, enabling seamless data integration from multiple sources and automating data pipelines for real-time analytics.

•Automated Scaling and Cost Management: Used cloud cost management tools to monitor and automate the scaling of resources, optimizing operational costs while ensuring performance.

•Experience and skilled dimensional modeling, forecasting using large-scale datasets (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension).

•Developed scripts to transfer data from FTP server to the ingestion layer using Azure CLI commands.

•Created Azure HD Insights cluster using PowerShell scripts to automate the process.

•Involved in loading the real-time data to NoSQL database like Cassandra.

•Partnered with Data Scientists to operationalize ML models by embedding them in Spark jobs and orchestrating them using Airflow and Databricks.

•Created automated pipelines for model performance monitoring and retraining, improving model lifecycle management and deployment efficiency.

•Data Processing with Azure Databricks: Leveraged Azure Databricks for scalable data processing, using Spark-based frameworks to handle large volumes of data with high performance and efficiency.

•Worked on Azure suite, Azure SQL Database, Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, Azure Analysis Service.

•Data Lake Storage: Implemented Azure Data Lake Storage Gen2 for secure, scalable data storage solutions, optimizing file structures and partitioning for efficient data retrieval.

•Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL, and Python scripting.

•In a test environment, I have created Apache Spark tasks in Python for quicker data processing and used Spark SQL for querying.

•Built real-time data pipelines using Spark Structured Streaming to ingest and transform data from Kafka for downstream machine learning and analytics use cases.

•Integrated Spark with data lake architectures on AWS and Azure, facilitating cost-effective storage and analytics solutions.

•Designed and implemented Scala-based libraries for common ETL operations including validation, error handling, schema evolution, and auditing.

•Integrated Scala-based Spark jobs with workflow orchestration tools like Apache Airflow and Azure Data Factory to automate and schedule pipelines.

•Used case classes and pattern matching in Scala to model complex domain-specific datasets and apply rule-based transformations.

•Leveraged Golang’s goroutines and channels to handle concurrent data processing tasks, significantly reducing execution time for large datasets.

•Monitoring and Logging with Cloud Tools: Configured monitoring and logging tools (AWS CloudWatch, Azure Monitor, Google Cloud Logging) to ensure the reliability, availability, and performance of data workflows.

•Created RESTful APIs using Golang for seamless integration of data services, facilitating data access and manipulation for frontend applications.

•Data Warehousing with Azure Synapse: Built and maintained data warehouses using Azure Synapse Analytics, optimizing storage and performance for complex analytical queries and reports.

•Used Polybase for ETL/ELT process with Azure Data Warehouse to keep data in Blob Storage with almost no limitation on data volume.

•Integrated Golang applications with cloud services (e.g., GCP or AWS) for data storage and processing, enhancing data accessibility and reliability.

•Microsoft Fabric: Utilized Microsoft Fabric for unified data warehousing and lakehouse architectures, enabling seamless integration with Power BI and other Microsoft tools for real-time analytics and reporting.

•Implemented unit and integration tests in Golang to ensure code quality, and deployed applications using containerization technologies like Docker.

•Built Azure Data Warehouse Table Data sets for Power BI Reports.

•Actively participated in code reviews and collaborative development processes, contributing to best practices in Golang development and fostering a culture of quality.

•Worked on form control devices like disruption and GIT and used Source code organization client.

•Mechanical assemblies like GitHub.

Environment: Spark, HIVE, Python, Pyspark, Hadoop, Azure Data Factory (ADF), MYSQL.

ETL Developer

Intergraph Corporation, Hyderabad, India. Jan 2015 to July 2021

•Interacted with Business Analysts to get Functional requirement of new platform paymentnet4.

•Worked on Functional design, High Level and Detailed Level Mapping documents to extract data from SQL server and load in Oracle and Teradata.

•Involved in designing the robust Architecture & design for the new systems and get it reviewed & approved with all Stake Holders

•Worked closely with DS admins in migration of DataStage from Version 8.1 to 8.5

•Worked on Creating UNIX directories, access according to best practice guide lines

•Worked with Oracle, SQL and Teradata DBAs to add the TNS entries and also to Establish ODBC connection for the interaction of DataStage with Oracle database.

•Developed some parallel jobs for extract data and performed Data Profiling, Standardizing and Investigating as source is SQL server and old platform Payment3 with different business rules and hierarchies.

•Involved in building the base fact tables, aggregate fact tables, dimension tables, Data mart tables, change requests and defect issues related to the Data Warehouse project

•Used Control-M as a scheduling tool to handle event based scheduling and time based scheduling.

•Worked on DataStage admin activities like creating ODBC connections to various Data sources, Server Start up and shut down, Creating Environmental Variables, Creating Data Stage projects and Message Handler files, Dsparams File.

•Developed DataStage jobs to load the data into tables using Multi- load, T-Pump and fast load utilities to load data into Teradata Base tables.

•Implemented robust ETL (Extract, Transform, Load) pipelines in Java, utilizing libraries like Apache Camel and Spring Batch to automate data workflows.

•Designed and developed RESTful APIs using Java Spring Boot to facilitate seamless data access and integration with various data sources.

•Leveraged Java’s multithreading capabilities to enhance data processing tasks, allowing for concurrent processing of large datasets and reducing overall execution time.

•Created shared container and after job sub-routine to record each job start time and end time to log in the Audit tables

•Was involved and implemented in Performance tuning of DataStage jobs with proper DataStage partitioning techniques and Node configuration

•Involved in the Production On Call Support during Migration event

•Raised EURC, WRM requests to get the required access for various tools

•Used Clear Case and Clear Quest to check in the DataStage code and Proper change management.

Environment: IBM InfoSphere DataStage 8.5, IBM InfoSphere DataStage 8.1.0, Oracle 11i/10g, DB2, AIX 6.1, SQL, PL/SQL, SQL Server 2005, TOAD, SQL Developer, Teradata V2R5, Teradata SQL Assistance, Control-M, UML, Crystal Reports, Altova Map Force, Erwin 4.1, Mainframes, Windows XP.

Contact this candidate