Naresh Tokala
Dallas, USA **************@*****.*** +1-682-***-**** www.linkedin.com/in/nareshtokala Professional Summary
Hands-on Data Engineer with 4+ years of experience designing, building, and optimizing ETL pipelines, data warehouses, and large- scale data solutions across AWS and distributed systems. Skilled in SQL, Python, Spark, and data modeling, with expertise in automating pipelines, query optimization, and BI support. Proven track record of delivering scalable, reliable, and analysis-ready datasets to power financial planning, analytics, and reporting. Strong background in data quality, statistical analysis, and operational excellence, with experience leading complex, modular projects in fast-paced environments. Professional Experience
Data Engineer State Farm Insurance Remote, USA May 2024 – Present
Designed and developed automated ETL pipelines using Python, Spark, SQL, and AWS Glue, ensuring timely ingestion and transformation of financial data.
Built and optimized data warehouses on Amazon Redshift, designing star/snowflake schemas to support FP&A analytics and reporting.
Deployed end-to-end workflows using AWS (S3, Glue, Lambda, EMR), improving data processing efficiency by 35%.
Optimized complex SQL queries with indexing and partitioning, reducing query runtimes by 40% for large-scale datasets.
Implemented data validation and quality frameworks, ensuring >99% accuracy across production datasets.
Collaborated with Finance and BI teams to deliver curated datasets for Tableau/Power BI dashboards, enabling faster insights.
Actively participated in code reviews, design discussions, and sprint planning, contributing to scalable and maintainable solutions.
Data Analyst Virtusa Hyderabad, India Sep 2020 – July 2023
Developed and maintained scalable data pipelines using Python, SQL, and Spark, processing millions of financial records daily.
Designed and implemented data models and warehousing solutions (PostgreSQL, Oracle, Redshift) for enterprise BI applications.
Migrated legacy on-premise systems to AWS (S3, RDS, Redshift), reducing infrastructure costs while improving scalability.
Built batch and streaming pipelines with Spark and Kafka to support both historical and real-time analytics.
Supported FP&A teams with reporting datasets, enhancing forecasting accuracy and decision-making.
Applied PySpark to validate financial metrics and reporting accuracy.
Delivered enhancements through Agile practices, ensuring continuous improvement and timely delivery. Education
Master of Science in Data Science – University of Texas at Arlington Technical Skills
Programming & Data Processing: Python (Pandas, PySpark), SQL (advanced), Scala (basic), Shell Scripting Data Engineering & Big Data: Apache Spark, Kafka, Hadoop (HDFS, MapReduce), Apache Airflow, ETL Pipelines, Data Modeling (Star/Snowflake Schemas)
Databases & Warehousing: Amazon Redshift, MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, Redis, Google BigQuery, Azure Synapse
Cloud Platforms: AWS (S3, Glue, Redshift, RDS, EMR, Lambda, IAM, CloudWatch), Azure Data Services, Google Cloud
(BigQuery, Dataflow)
Business Intelligence & Analytics: Power BI, Tableau, Excel, Jupyter Notebook, Exploratory Data Analysis (EDA), Statistical Analysis
DevOps & Automation: Jenkins, Git/GitHub, Docker, Kubernetes, Terraform, OpenTofu, CI/CD for Data Pipelines Data Quality & Testing: PyTest, Unit Testing, Automated Data Validation, Data Quality Frameworks Core Knowledge: Data Warehousing, BI Development, Query Optimization, Distributed Systems, Probability & Statistics