Post Job Free
Sign in

data engineer

Location:
Pittsburgh, PA
Posted:
May 19, 2025

Contact this candidate

Resume:

Lokesh Vinjamury

412-***-**** *****************@*****.***

https://www.linkedin.com/in/lokesh-vinjamury-a79592213/

PROFESSIONAL SUMMARY

Results-driven Data Architect with 5+ years of experience designing, building, and optimizing scalable data solutions across cloud platforms (Azure, GCP, AWS). Proven track record of architecting robust ETL/ELT pipelines and unified data models to support business intelligence, marketing analytics, and digital transformation. Expertise in integrating and harmonizing data from diverse sources—Google Analytics, CRM systems (Salesforce, HubSpot), CDPs (Segment, Amplitude), and ad platforms (Google Ads, Meta, LinkedIn)—into centralized data warehouses (Snowflake, BigQuery, Redshift). Adept at implementing attribution models, UTM-based tracking systems, and marketing mix modeling to drive data-informed decisions and improve ROI. Skilled in data governance, real-time streaming, campaign performance analytics, and CI/CD automation.

SKILLS

Cloud Platforms & Services:

Public Cloud Like Azure (Data Lake, Synapse, SQL DB, Data Factory, AKS, Kubernetes), AWS (Glue, Redshift, Lambda, Kinesis), GCP (Big Query, Dataflow, Pub/Sub), Salesforce, Jira

Language:

Python (Pandas, NumPy, PySpark), Django, NoSQL, SQL/T-SQL (SQL Server, Redshift, BigQuery), Java, Scala, Microservices, DynamoDB, Aerospike, API. RDS, MongoDB

Big Data Technologies:

Apache Spark, Kafka, Hive, HBase, Flink, Hadoop (HDFS, MapReduce), EMR

ETL Tools:

Cloud Architecture & CI/CD:

Data Warehousing & BI:

Data Processing:

Marketing & Analytics:

Data Governance:

Azure Data Factory, AWS Glue, Google Cloud Dataflow, SSIS, Apache Airflow, Informatica

Jenkins, GitHub Actions, Azure DevOps, Terraform, CloudFormation, Datadog, Cloudwatch.

Azure Synapse Analytics, Amazon Redshift, Google BigQuery, Power BI, Tableau, Looker

Azure Databricks, Apache Beam, Apache Sqoop, Kinesis, Pub/Sub, Cassandra

Marketing Attribution Models (Multi-touch, last/First Touch, UTM & Pixel Tracking, CDM & CRM.

GDPR, HIPAA, CCPA, Schema validation.

WORK EXPERIENCE

PNC Financial Services Data Engineer

05/2024 to Till Date

Pittsburgh, Pennsylvania, USA

Responsibilities

Developed and execute comprehensive ETL test cases to ensure data integrity across end-to-end data workflows, covering raw ingestion, transformation, and warehousing by 30%.

Design, develop, and optimize relational databases using SQL, T-SQL, and procedural SQL (PL/SQL) to support scalable and high-performance data storage and retrieval to maximize work productivity up to 35%

Acted as SME for modern data platforms, providing technical guidance during architecture design, development, and deployment stages in cloud environments GCP reduced cloud infrastructure cost by 15.

Collaborated with cross-functional teams (data analysts, architects, business stakeholders) to gather requirements and design scalable data solutions which will help in project delivery up to 25%.

Developed and maintained ETL/ELT pipelines using tools like DBT, Informatica IICS, and Airbyte, integrating structured and semi-structured data from diverse sources which will improve data freshness by 30%.

Developed CI/CD frameworks for data workflows across dev, QA, and prod environments, ensuring seamless deployments and decreasing release cycles by 40%.

Automated deployment and testing using CI/CD pipelines and version-controlled workflows (Git, Jenkins, Terraform) for infrastructure and code which will cut down deployment error by 70%.

Integrated robust monitoring and alerting systems (e.g., Stackdriver, Cloud Monitoring, custom Slack bots), leading to a 50% drop in production incidents through early detection and auto-recovery logic.

Apply dimensional modeling techniques (Star Schema, Snowflake Schema) to design fact and dimension tables that accurately represent business processes for better data models up to 30%.

Developed reusable data models and subject areas in DBT and BigQuery to support campaign performance, user engagement, and attribution analysis which will reduce time to insight by 40%.

Built and orchestrated scalable ELT pipelines using Informatica IICS, Airbyte, and Python, enabling real-time data ingestion and harmonization — improved pipeline efficiency by 45%.

Partnered with architects to assess GCP-native services like Cloud Dataflow, BigQuery, Pub/Sub, Cloud Composer, and Dataproc based on scalability, latency, and cost-efficiency for 25% of Architecture design.

Developed end-to-end data solutions by integrating multiple GCP (Google Cloud Platform) services including Dataflow (Apache Beam), BigQuery, and Cloud Pub/Sub for real-time and batch processing of large-scale datasets and with dbt (Data Build Tool) which give 30% effect.

Developed modular and reusable DBT models in BigQuery to support dimensional modeling and performance-optimized reporting layers.

Created cohort and segmentation analyses to evaluate campaign engagement by audience type, helping improve targeting and driving a 22% uplift in LTV.

Work closely with BI analysts, data engineers, and data scientists to ensure models meet analytical needs by 50%.

Implemented customer segmentation pipelines using behavioral and demographic data, improving personalized marketing efforts and increasing campaign engagement by 25%.

Built scalable data models and automated workflows to support multi-touch attribution models, leading to a 30% increase in budget efficiency across paid media channels.

Designed and promoted MDM strategies to unify customer and product master records, improving data reliability across departments by 45%.

Analyzed paid media performance and user journeys to uncover optimization opportunities, increasing conversion rates by 18% across retargeting campaigns.

Implementing data quality checks, schema validation/schema design and ensuring consistency across pipelines. Using Dataform to define and enforce data transformations for the data integrity, data cleaning and validation to improve quality by 30% by using Big Data analytics tools.

Implemented data pipelines to support multi-touch attribution models, leading to a 28% improvement in channel-level ROI insights and budget reallocation decisions.

Allegheny Health Network Data Engineer / SQL Developer- Azure, GCP

03/2023 to 04/2024

Pennsylvania, USA

Responsibilities

Structuring BigQuery, Cloud Storage, and Cloud/Azure SQL/T-SQL for efficient data storage and retrieval to support AI model training and analytics to give 30% maximization in healthcare/Pharmacy domain.

Boosted data product delivery and data transformation by having a efficiency by 40% by leveraging GCP services (BigQuery, Dataflow, Cloud Storage) for seamless integration and real-time processing.

Led assessments of existing data infrastructure and advocated modern data management solutions, reducing technical debt by 35% and improving data pipeline efficiency.

Developed connector-based integrations with APIs from Google Analytics, Salesforce, and HubSpot, enabling real-time data syncing and reducing lag in reporting by 45%.

Identified architectural gaps in legacy data platforms and defined a future-state roadmap, accelerating cloud migration efforts by 40%.

Built real-time data pipelines using Kafka, Kinesis, and DataFlow to stream events from product and marketing systems, improving data freshness and decision-making speed by 55%.

Championed adoption of data governance best practices, leading to a 30% improvement in compliance with data quality and privacy standards (GDPR, HIPAA).

Recommended and implemented data governance frameworks covering MDM, security, and privacy, improving data consistency and compliance posture by 40%.

Performed ETL testing across large datasets, tracking data consistency from source to data warehouse using SQL, ADF, and Databricks which will impact work productivity by 35%.

Optimize data retrieval mechanisms for BI tools (Power BI, Looker, Tableau) to provide 35% more positive effect.

Integrated NoSQL databases like MongoDB, DynamoDB, and CosmosDB into hybrid data solutions, enhancing performance for high-velocity use cases by 35%.

Implementing best practice for continuous integration, continuous deployment, and continuous training (CI/CD/CT) for ML models. Version control, testing, and automation of ML workflows for R&D and increase the workability by 25%.

Modernized legacy ETL systems by migrating to Snowflake and Databricks, reducing query run time by 60% and significantly improving data accessibility for analysts.

Provide well-structured data for analytics, reporting, and machine learning applications to give more positive impact up to 35%.

Interpreted business needs to design strategic data integration roadmaps, reducing cross-system data silos and improving time-to-insight by 45%.

Designed event-driven architectures using Kafka and serverless services, enabling real-time alerting and analytics and reducing latency by 45%.

Established, developed, and maintained ETL/ELT workflows using Talend and AWS Glue, improving data extraction and purging efficiency by 30%.

Implement data validation, anomaly detection, and reconciliation processes to ensure high-quality data up to 40%

Conducted performance testing, troubleshoot issues of ETL pipelines, measuring execution time impacts of code changes in production to make a smooth transformation of data with 30% more effectively.

Piramal Enterprises Limited Junior Data Engineer

01/2020 to 12/2022

Bangalore USA

Responsibilities

ETL workflows with Cloud Composer (Apache Kafka, Spark, and Azure Data Factory), increasing data sieving efficiency by 30%.

Monitoring and troubleshooting cluster issues using AWS (Amazon Web Service) CloudWatch, CloudTrail, and EMR logs for the cluster management which will impact work up to 30%.

Collaborate with platform engineering team, data science, and business teams to define the architecture of end-to-end data pipelines to scalable up to 40%.

Define and implement data governance frameworks, ensuring standardization and consistency across data assets by 40%.

Leverage Airbyte, DBT, and Python to design high-performance, scalable, and resilient data pipelines up to 40% more effetivity.

Enhanced data pipelines by implementing complex SQL queries and PL/SQL stored procedures to help having a 25% impact on ETL process.

Evaluate and optimize data processing techniques, leveraging distributed computing (Spark, BigQuery, Snowflake) by 25%.

Use Terraform and infrastructure-as-code (IaC) practices for cloud resource provisioning for automate deployment up to 45%.

Work with business and compliance teams to enforce data access policies, classifications, and retention strategies for up to 30%.

Ensure compliance with GDPR, CCPA, HIPAA, and other regulatory standards for 25% more security compliance.

Conduct training sessions, workshops, and knowledge-sharing initiatives to enhance data literacy across the organization up to 25% more positively.

Utilize Agile project management tools like JIRA, Trello, or Azure DevOps to track sprint progress by 40%.

Provide regular updates to stakeholders, ensuring transparency in project status, blockers, and risks to reduce error by 35%.

Promote Agile environment and Scrum best practices, ensuring iterative development and quick feedback loops by 20%.

EDUCATION

Robert Morris University Moon Township, Pennsylvania Jan 2023 – Dec 2024

Master’s in data Analytics (Grade- 3.56/4.00)

Courses taken

Data Mining.

Introduction to Data Analysis.

Data Integration for Analytics.

Python Language and Programming.

Decision Support System Analysis/Design.

Database Management System.

Impact Of Emerging Technology.

Defending and Securing Network.



Contact this candidate