VIGNESH PUNYALA
Data Engineer
***************@*****.*** +1-513-***-**** https://www.linkedin.com/in/vignesh-punyala
PROFESSIONAL SUMMARY
Data Analyst with 3+ years of experience working with large structured and semi-structured datasets using SQL, Python, PySpark, Snowflake, Databricks, and AWS. Experienced in collecting, cleaning, validating, and analyzing data from multiple sources to support reporting, business decisions, campaign measurement, and operational insights.
Strong background in identifying data patterns, reconciling source discrepancies, defining business metrics, and preparing analytics-ready datasets for analysts and stakeholders. Experienced in documenting data logic, supporting cross-functional teams, and translating business questions into reliable data outputs, reports, and clear findings.
CORE SKILLS
Data Analysis:Data collection, data cleaning, data validation, exploratory data analysis, trend analysis, pattern identification, KPI reporting, descriptive statistics, data integrity, data reconciliation
Tools & Technologies:SQL, Python, PySpark, Snowflake, Databricks, AWS S3, Excel, CSV, JSON, Parquet
Reporting & Visualization:Report preparation, stakeholder reporting, charts and graphs, summary tables, business data presentation, metric definitions, analytics-ready datasets
Documentation & Collaboration:Process documentation, methodology documentation, business requirement gathering, cross-functional collaboration, stakeholder communication, Jira, Confluence
Data Quality:Completeness checks, consistency checks, deduplication, anomaly detection, validation rules, SLA monitoring, quality control frameworks
PROFESSIONAL EXPERIENCE
Data Engineer — Comcast (Partner Data Solutions) Jan 2023-Present
• Collected, cleaned, transformed, and validated large datasets from multiple internal and partner data sources, including CSV, JSON, and Parquet files, to support campaign measurement, reporting, and business analysis.
• Analyzed large-scale advertising and exposure datasets to identify patterns, data discrepancies, and reporting trends used by analysts and business stakeholders for decision-making.
• Built and maintained SQL and PySpark workflows to standardize raw data, remove duplicates, reconcile mismatched records, and improve overall data accuracy and consistency.
• Prepared analytics-ready datasets and summary outputs for analysts, business users, and partner teams to support recurring reports, campaign performance analysis, and operational reporting.
• Worked closely with analysts, product stakeholders, and partner teams to understand reporting needs, clarify data definitions, and translate business questions into reliable data outputs.
• Maintained documentation for data sources, business rules, transformation logic, validation checks, and workflow methodology to improve transparency and repeatability.
• Implemented data quality checks for completeness, consistency, and integrity, helping improve trust in downstream reporting and reducing rework caused by source data issues.
• Used SQL, Snowflake, and Databricks to perform data exploration, trend review, and metric validation across high-volume datasets.
• Supported report preparation by delivering clean, structured datasets, summary tables, and clearly defined metrics that could be used in presentations and business reviews.
• Improved processing efficiency and reporting turnaround time by optimizing data transformation logic and standardizing repeated analysis workflows.
PROJECTS
NBCU Campaign Exposure Delivery – Dynata, Kantar, InMarket
• Collected and processed campaign exposure data from multiple sources to support third-party measurement and reporting needs.
• Cleaned and standardized audience and household-level datasets by applying privacy filters, deduplication logic, and schema alignment.
• Analyzed output data to ensure completeness and consistency before delivery, helping teams identify gaps and reporting issues early.
• Prepared structured output datasets and supporting documentation used by analysts and partner teams for campaign measurement and reporting.
STB and SMART TV ACR Data Analysis
• Consolidated and standardized multi-source STB and SMART TV ACR datasets to support downstream analysis and consistent reporting.
• Reviewed exposure data for anomalies, duplicates, and metric mismatches, improving data integrity across analytics use cases.
• Defined common metric logic and reporting-ready schemas that reduced discrepancies across teams and improved consistency in business reporting.
Pharma Campaign Exposure Reporting for Veeva Crossix
• Integrated and reconciled internal ad exposure data with third-party viewership and identity mapping datasets to support campaign measurement.
• Performed source-to-source validation and mapping checks to ensure exposure records aligned with reporting requirements.
• Prepared clean, analysis-ready datasets for downstream measurement, helping stakeholders review campaign reach and exposure trends with confidence.
Exposure Data Quality and Vendor Reporting
• Built validation checks to review completeness, consistency, and delivery readiness of outgoing datasets shared with external measurement vendors.
• Investigated failed or inconsistent outputs, identified root causes, and corrected source or transformation issues before release.
• Supported recurring reporting processes by improving data quality controls and reducing downstream errors.
EDUCATION
• Master of Science in Computer Science, Wright State University
• Bachelor of Science – SRM University Computer Science
CERTIFICATIONS
• Databricks Certified Data Engineer – Professional
• AWS Certified Data Engineer
• Academy Accreditation – Generative AI Fundamentals
Accomplishments
• Reduced data pipeline runtime and compute costs by ~30% through Spark and storage optimization.
• Improved data reliability and trust by implementing automated quality checks and SLA monitoring.
• Delivered AI-enabled analytics solutions adopted by internal analytics and engineering teams.
Achievements
• Recognized for driving measurable performance improvements, reducing end-to-end data pipeline runtimes by 30%+ and improving system reliability through proactive optimization and automation.
• Delivered high-impact, self-service data solutions, enabling analysts and stakeholders to access trusted datasets and insights faster, resulting in a 40–50% reduction in turnaround time for data and analytics requests.