SINDHU VELURU
St. Louis, MO +1-314-***-**** ****************@*****.*** LinkedIn
OBJECTIVE
Experienced Data Analyst/Engineer with 3+ years of expertise in building scalable data pipelines, real-time processing systems, and cloud-based analytics solutions using AWS, Spark, Kafka, Airflow, and dbt. Skilled in Python, SQL, and R with hands-on expertise across Snowflake, Redshift, and PostgreSQL. Proven ability to deliver predictive insights, ensure data integrity, and support compliance in cross-industry environments. WORK EXPERIENCE
PTC Boston, MA
Data Engineer Mar 2025 – Present
Orchestrated telemetry ingestion using PySpark and AWS Glue, funneling IoT data from industrial assets into S3, which enabled predictive analytics and reduced equipment failure by 30%.
Formed real-time dataflows with Kafka and Spark Streaming, linking live machine signals to ThingWorx dashboards, enhancing operational responsiveness by 40%.
Blended structured CAD data from Creo and usage metrics via SQL, supplying ML-ready features that accelerated product iteration cycles for engineering teams.
Governed access policies in AWS Lake Formation, supporting regulatory compliance across cross-regional datasets aligned with GDPR and ISO standards.
Novatore Solutions Dover, DE
Data Engineer Jan 2024 – Nov 2024
Configured batch and incremental pipelines with Apache Airflow, Python, and AWS Glue, ensuring reliable ingestion from multiple vendor APIs into a centralized Redshift environment.
Refactored financial and marketing datasets using dbt and SQL, which brought down data model refresh lag by 60% and improved dashboard responsiveness.
Tuned real-time processing streams via Kafka and S3, enabling a dynamic forecasting engine for an e-commerce platform handling 1M+ daily events.
Implemented rigorous data integrity checks using PySpark and Great Expectations, supporting secure clinical data workflows in accordance with HIPAA mandates.
Embedded SageMaker and MLflow within existing data flows to auto-trigger retraining of predictive models, improving fraud pattern recognition accuracy by 28%.
Assembled multi-source analytics in Power BI, backed by PostgreSQL and Snowflake, enhancing sales intelligence and product replenishment strategies.
CyberNest Bengaluru, IN
Data Analyst Jan 2020 – Dec 2022
Led a multi-source data harmonization project for a healthcare client using Python (Pandas), SQL, and AWS S3, delivering a HIPAA-compliant patient data lake that reduced reporting latency by 60%.
Consolidated retail transaction data from 4 regional warehouses into Snowflake using dbt and SQL transformations, improving inventory visibility and driving a 25% reduction in stockouts.
Designed Power BI dashboards for a telecom provider, enabling real-time churn tracking and contributing to a 15% retention increase by revealing usage pattern anomalies.
Structured a product recommendation framework using R and collaborative filtering techniques for an e-commerce client, boosting average order value by 18% within 3 months.
Delivered a predictive maintenance model for a manufacturing client using Python (Scikit-learn) and time series forecasting, reducing machine downtime by 22% across 12 facilities.
Streamlined financial reporting for a fintech partner by developing automated Excel macros and SQL-based ETL scripts, reducing monthly report generation time from 7 hours to under 45 minutes.
Interfaced with cross-functional teams and clients to translate business questions into data analysis workflows, supporting decisions with Tableau-driven insights backed by statistically validated KPIs. PROJECT
Healthcare Analytics – Co-occurring Disorders Analysis
Leveraged SQL and Python to consolidate and explore synthetic patient records generated via Synthea, focusing on mental health and substance use disorder trends.
Illustrated demographic and cost distribution insights through data visualizations, supporting data-driven decisions in California's public health policy and funding strategies.
SKILLS
Programming Languages & Methodologies: Python, R, Scala, SQL, Agile/Scrum, Waterfall, SDLC BigData Technologies: Hadoop, HDFS, Yarn, Sqoop, Oozie, Hive, HBase, Spark, Kafka, Nifi, Cassandra, Apache Airflow, Databricks Databases: MySQL, SQL Server, Snowflake, PostgreSQL, MongoDB, Cassandra Cloud Computing: AWS (S3, CloudWatch, RedShift, EMR, EC2, DynamoDB), Azure (Data Factory, Blob Storage, Databricks) Data Analytics: Data Cleaning, Data Masking, Data Manipulation, Data Visualization Reporting Platforms: SSRS, Tableau, Microsoft Power BI, Sigma Computing, Metabase ETL/ELT Tools & File Formats: SSIS, SSAS, Informatica, Matillion, Azure Data Factory, DBT, Parquet, Avro, ORC, JSON Operating Systems & Version Control: Windows, Linux, Unix, MacOS, Git, GitLab, GitHub EDUCATION
Saint Louis University St. Louis, MO
Master of Science in Analytics Dec 2024
St. Mary's College Hyderabad, IN
Bachelor of Business Administration Jul 2021