Harshitha Vejendla Data Engineer
Address: *** ******** **** *** ** Marietta Georgia 30067
LinkedIn: www.linkedin.com/in/harshithavejendla Phone: +1-770-***-**** Email: *******************@*****.*** PROFESSIONAL SUMMARY
• Data Engineer with 6 years of experience in architecting, developing, and optimizing data systems across Logistics, Aviation, Healthcare, and financial domains
• Expert in end-to-end data pipeline design, cloud data architecture, and distributed data processing with Python, PySpark, Kafka, SQL, Databricks, and Airflow
• Hands-on experience implementing high-volume ETL/ELT frameworks, data lakes, Snowflake Data Pipelines and CI/CD for real-time analytics. skilled in AWS and Azure ecosystems, data governance, and integrating ML workflows for predictive insights that drive business growth
TECHNICAL SKILLS
• Programming Language: Python, SQL, C/C++
• Data Management: ETL, ELT, Data Warehousing, Data Lakes, Star/Snowflake Schema, Fact-Dimension Modeling
• Big-Data Technologies: Apache Spark, Hadoop, Hive, Databricks, Kafka, Airflow
• Cloud Platforms: AWS (S3, Glue, Redshift, EMR, Lambda, SageMaker, CloudWatch, CodePipeline), Azure (ADF, Synapse, Data Lake, Event Hub, AKS, VMs, Dataiku, Data Factory)
• Streaming & Real-Time Processing: Spark Streaming, Apache Flink, Kafka Streams, AWS Kinesis, Azure Stream Analytics
• Databases: Oracle, MySQL, PostgreSQL, Teradata, MongoDB
• DevOps & CI/CD: Docker, Kubernetes, Jenkins, Terraform, GitHub Actions, Azure DevOps, Git, GitHub, GitLab CI/CD
• Data Visualization: Power BI, Tableau, Looker, Grafana, Matplotlib
• Machine Learning Algorithms: Regression, Decision Tress, Random Forest, XGBoost, LightGBM, Scikit-learn
• Data Governance: Data Masking, Encryption, RBAC, GDPR, HIPAA, PCI DSS Compliance, AWS Secrets Manager PROFESSIONAL EXPERIENCE
Data Engineer, FedEx, Atlanta, GA, USA July 2025 - Current
• Designed robust ETL/ELT pipelines using PySpark, Kafka, SQL, Databricks, and AWS Glue to manage large-scale logistics data, improving delivery analytics by 35%
• Built real-time data streaming with Kafka, Spark Streaming, Apache Flink and AWS Kinesis for shipment status monitoring and route optimization
• Developed data warehouses, Star Schema, Snowflake Schema and data lakes on AWS (SageMaker, EC2, Lambda, Databricks, S3, Redshift, Glue, EMR) enabling real-time operational analytics across delivery hubs
• Consolidated multi-source data (SQL/NoSQL databases, APIs) into unified data Lakes, enabling enterprise-wide access to operational and shipment data
• Leveraged Python, Scikit-learn, to build, train, and deploy models with 80% predictive accuracy, enabling proactive decision-making in operations, customer service, and revenue optimization
• Deployed Airflow workflows and CI/CD pipelines using Jenkins, Docker, Git and Terraform to automate releases across environments
• Collaborated with Data Scientists and ML Engineers ready datasets and deploy predictive models (Regression, Random Forest, XGBoost, Decision Trees) on Databricks and SageMaker
Environment: Python, ETL/ELT, PySpark, SQL, NoSQL, Databricks, Kafka, Spark Streaming, Apache Flink, AWS Kinesis, Scikit-learn, Terraform, Docker, Git, Airflow DAGs, AWS (S3, Glue, SageMaker, Lambda, EMR, CloudWatch), Data Governance, Real-time Data Processing, Data Warehousing, Model Deployment, Predictive Analytics, Data Quality, Data Security, Deequ, Apache NiFi, dbt, Data Engineer, American Airlines, Dallas, Texas Aug 2024 – June 2025
• Designed and optimized scalable ETL/ELT pipelines using PySpark, Airflow, Databricks and Azure Data Factory to process terabytes of structured and unstructured data from cloud and on-prem sources
• Built cloud-native data architectures on Azure (ADF, Blob Storage, Data Factory, Synapse Analytics, Dataiku, Data Lake, VMs, AKS) improving system efficiency and cost optimization by 30%
• Implemented real-time data streaming systems using Apache Kafka, Synapse Analytics and Flink for live event processing in airline use cases
• Developed end-to-end data lakehouse solutions integrating Snowflake, Databricks, and Power BI enabling business users to run analytics faster by 40%
• Automated CI/CD pipelines using Jenkins, Terraform, GitHub Actions and Azure DevOps for ETL workflows, ensuring zero-downtime deployments and reproducible builds
• Engineered real-time streaming with Spark Streaming, Kafka Streams, Azure Stream Analytics capture and process airline and provider updates, improving operational response time by 40% Environment: Python, ETL/ELT, SQL, PySpark, Airflow, Databricks, Apache Spark, Kafka, Data Pipelines, Data Lake, Data Warehouse, Data Ingestion, Data Transformation, Data Cleansing, Fact-Dimension Modeling, Star Schema, Snowflake Schema, Data Marts, Metadata Management, Data Lineage, Data Integration, Real-time Data Processing, Batch Processing, Terraform, Fivetran, Stich Data Data Engineer, Hexaware Technologies, Remote Jan 2022 – Aug 2023
• Designed, developed, and maintained ETL pipelines for large-scale Electronic Health Records (EHR), claims, and clinical trial data using Python, PySpark, and SQL
• Built and optimized data ingestion frameworks from multiple sources such as HL7, FHIR, CCD, EMR/EHR systems, and medical devices, ensuring data integrity and HIPAA compliance
• Created and orchestrated data workflows using Apache Airflow, AWS Glue, and AWS Lambda, AWS Redshift to automate ingestion, transformation, and loading processes
• Implemented data lake and data warehouse architectures on AWS S3, Redshift, Snowflake, and AWS Kinesis, improving query performance and scalability for healthcare analytics
• Designed data models (star, snowflake) for claims processing, pharmacy transactions, and patient demographics using Dimensional Modeling and Kimball methodology
• Monitored and optimized data pipeline performance using Datadog, Prometheus, and Grafana, reducing processing latency and improving reliability
• Supported data governance initiatives by implementing data lineage tracking (Apache Atlas, Collibra) and metadata management frameworks
• Partnered with BI teams to deliver Tableau / Power BI dashboards for healthcare KPIs like claims cost, patient outcomes, and clinical operations efficiency
Environment: Python, SQL, PySpark, Kafka, Spark Streaming, Docker, Kubernetes, Oracle, MySQL, ETL/ELT Pipelines, Data Lakehouse, Data Warehousing, Data Quality, Data Security, GDPR, PCI DSS, HIPAA Compliance, HIVE, AWS (S3, SageMaker, Glue, DynamoDB, Lambda), Power BI, Snowflake, Teradata, MongoDB, Data Masking, Data Encryption, Parallel processing, Datadog, Tableau, Grafana Data Engineer, Keyu Tech, Remote Mar 2019 – Dec 2021
• Designed, developed, and automated ETL pipelines for transactional, risk, and regulatory reporting data using Python, SQL, and PySpark in AWS cloud environments
• Engineered data ingestion frameworks to process large-scale banking, trading, and credit-risk datasets from Oracle, Teradata, Kafka, and API sources with high reliability and low latency
• Developed data Lakehouse architecture using AWS S3, Glue, Redshift Spectrum, and Snowflake to unify structured and unstructured financial data for analytics and audit purposes
• Implemented data transformation logic for Basel III, CCAR, AML, and KYC compliance reporting, ensuring 100% alignment with Fed and FINRA regulatory standards
• Partnered with risk analytics teams to deliver credit risk, fraud detection, and portfolio performance models through curated feature datasets and data marts
• Created data models and star schemas for trading, loan origination, and treasury systems, enabling faster query response and improved financial insights
• Integrated real-time data streaming using Apache Kafka and Kinesis for transaction monitoring and fraud alert systems
• Deployed data lineage and metadata management solutions (Apache Atlas, Collibra) for audit traceability and data governance
• Supported real-time dashboards and BI solutions in Tableau and Power BI for executive decision-making on risk, revenue, and operational metrics
• Collaborated with data scientists and financial analysts to deliver predictive models for loan default, market volatility, and customer churn Environment: Python, SQL, PySpark, AWS (SageMaker, S3, Glue, Redshift, Lambda), Databricks, Snowflake, Kafka, Kinesis, Airflow, Jenkins, Docker, CI/CD, ETL Pipelines, Data Lake, Data Warehouse, Data Modeling, Regulatory Reporting, Basel III, CCAR, AML, KYC Risk Analytics, Data Governance, Metadata Management, Power BI, Tableau, Great Expectations, JSON/XML EDUCATION
Master of Information Technology, Kennesaw state university, Kennesaw, GA, USA