Data Engineer

Location:

Austin, TX

Posted:

September 10, 2025

Contact this candidate

Resume:

VIJAY YARABOLU

United States +1-913-***-**** *****@************.*** LinkedIn GitHub

SUMMARY

Data and Snowflake, CCAR, decision environments cloud-Engineer Basel intelligence native Airflow, III)and with . Proven data leading 4+ and through platforms years track Python, end-of advanced record to-experience with across end in a data optimizing analytics strong financial integration designing background and services batch and BI and dashboards and deploying and governance in streaming data manufacturing modeling, large-using initiatives workflows, scale Tableau cloud domains. ETL across and reducing pipelines, migration, Power Proficient globally infrastructure real-BI. and distributed Adept in time regulatory Apache streaming at collaborating costs, teams. Spark, compliance and architectures, Kafka, enhancing in (AWS, Agile SOX, EDUCATION

Master of Science in Computer Science University of Central Missouri, Warrensburg, MO, USA May 2025 SKILLS

Methodologies: Programming IDE’s: Big ETL Cloud AWS Pipeline, AWS DevOps Data ML Packages: Visualization Databases: Version Operating & Data Tools: Secrets Services: Analysis Deep PyCharm, Technologies: Tools: Control: Amazon Ecosystem: NumPy, System: SSIS, Learning: MySQL, Manager, Tools: Tools: Language: Amazon Docker, Jupyter Apache SDLC, Athena, Git, Pandas, Windows, MS-Tableau, GitHub, Hadoop, AWS, Excel AWS Scikit-Agile, Kubernetes, SQL S3, Notebook, NiFi, AWS Python, Matplotlib, AWS Azure, (IAM, Server, XLOOKUP, learn, Waterfall Apache GitLab Linux, Power MapReduce, Step Glue, Amazon Snowflake, R, TensorFlow, Visual HBase, Functions, Jenkins, MacOS SQL, BI Kafka, Amazon SciPy, VLOOKUP, DynamoDB, studio SAS Cassandra, Hive, Talend, CI/Scikit-GCP AWS Redshift, CD Keras, code Apache VBA, learn, Lake Apache AWS DynamoDB, PyTorch, Macros, AWS Spark, Formation, Seaborn, CloudTrail, Airflow, Lambda, Pig Pivot LSTM, MongoDB, TensorFlow Informatica AWS Tables)Amazon AWS MLP, CloudWatch, Key NLP, PostgreSQL Tableau, EMR, Management (NLTK, Amazon AWS Power spaCy, CodePipeline, RDS, Service BI, Word2Vec) Jupyter Amazon (KMS)Notebook AWS Kinesis,, Amazon CodeCommit, AWS QuickSight Data EXPERIENCE

State Street, USA Data Engineer Jan 2025 - Present

• • • • • • • Designed processing and Developed positive Architected enabling Conducted default Migrated query Delivered insights Administered metadata reporting performance prediction into rates regulatory legacy management, automated and credit real-500+ scalable trading by systems. and optimized time data 15% GB/risk and tuned and and desk data day infrastructure data BI modeling in supporting and 25% dashboards credit high-of performance, distributed ingestion large-lake post-regulatory reduction volume risk and platforms scale trade, Basel analytics anomaly from and and loan financial databases data credit, in loan III fraud executive-Oracle operational using and stress governance on pipeline detection detection and derivatives data petabyte-and Hadoop, including testing, risk level SQL ETL activity, costs using datasets, workflows aligned reporting Server CCAR scale pipelines Delta transactions, MySQL, across Python and mortgage-modeling, to with improved Lake, operational solutions AWS risk leveraging MongoDB, using (Pandas, SOX, Databricks reporting RDS enhancing backed and CCAR, Apache batch using and NumPy, trading risk Apache HBase, processing securities Google and and PySpark, (KPIs. Spark, institutional AWS) FINRA Scikit-reconciliation decision Kafka and BigQuery, integrated Airflow, and DynamoDB, Hive, learn)latency compliance and intelligence. investment risk and Spark, achieving R, Talend, systems. by controls. SQL, with Tableau, 40% Streaming, enforced standards. and HDFS portfolios. NiFi, for 40% SAS, providing portfolio and and improvement access improving reducing Informatica, Amazon real-analytics controls, false- time loan S3, in HCL Tech, India Data Engineer Jan 2020 - Jul 2023

• • • • • • • Designed data, environments. Built structured tracking. Architected fault Developed cross-Automated supply Implemented reducing Led regulatory including cloud tolerance, distributed reducing platform chain, deployment and cost, requirements, and migration batch and and quality optimized end-production maintained data and data semi-deployed and data to-reducing visibility assurance, end latency real-structured time and processing scalable modernized CI/time efficiency, cloud-by RESTful data CD infrastructure and by 70% data and pipelines governance ETL native 25% data, operational and frameworks APIs ingestion production, and pipelines legacy and enabling minimizing data with defect using cost increasing data platforms Flask, initiatives, efficiency pipelines using rate Git, by using supporting end-warehouses manual 20% integrating GitHub, PySpark, trends. to-Hadoop, throughput using using by end across ensuring 40%intervention Jenkins, cross-analytics Apache AWS Apache to 50+ multiple . HDFS, AWS domain (100% microservices S3, by and Airflow, Spark, cloud, for Hive, 30% EC2, in enterprise shell adherence data compliance, production and and Redshift, for Pig, Talend, scripting, integration AWS developed real-across Kafka, clients. and Glue to time Glue) releases. product aligned MES, enterprise and NiFi, to initiatives. Power operational and process ERP, HBase, streamlining with benchmarking, Snowflake, BI and data dashboards IoT Agile managing analytics intelligence telemetry security and data improving DevOps systems, and workflows over for standards and tracking across performance 5TB/behavioral scalability, standards, boosting day across client KPIs and of

Contact this candidate