Data Engineer with AWS, Python, SQL Experience

Location:

Ashburn, VA

Salary:

80000-10000k

Posted:

November 12, 2025

Contact this candidate

Resume:

PRANEETHA KOUNDINYA KALLURI

Email: ********************@*****.*** Contact: 945-***-**** LinkedIn

SUMMARY

Results-driven Data Engineer with hands-on experience in building and optimizing ETL/ELT pipelines, data models, and data warehousing solutions using AWS, PySpark, SQL, and Python. Skilled in managing large-scale financial data architectures, ensuring quality, scalability, and regulatory compliance. Experienced in pipeline orchestration and Redshift integration, delivering efficient, business-aligned data systems. EXPERIENCE

Data Engineer,

Project: Enterprise Data Infrastructure June 2025 – Present CompSciPrep LLC, Client: Fannie Mae, Reston, VA

● Gathered financial data from multiple internal systems for capital and risk-weight calculations under FHFA guidelines. Crafted complex SQL scripts to extract, validate, and organize mortgage and asset datasets, aligning each transformation with STTM standards and business inputs to maintain stable data feeds across sources to ensure accuracy for downstream financial components.

● Applied data modeling practices, establishing fact and dimension layers for capital, exposure, and risk datasets. Built schema references and maintained data dictionaries to provide transparency and traceability across financial workflows and Supported data governance efforts by defining naming standards.

● Set up recurring data pipelines using AWS Lambda, SNS, and SQS to handle daily and monthly batch processes across capital components, monitored CloudWatch logs to identify and fix pipeline issues quickly, ensuring smooth data flow and minimal downtime. Performed SCA, SAST, and DAST scans as part of the CI/CD process to detect and address security vulnerabilities before deployment.

● Processed high-volume capital data using AWS EMR, running distributed Spark jobs to speed up transformations and improve overall ETL performance.

● Managed data migration AWS RDS and Redshift, triggering AWS Glue jobs for transformation and loading. Implemented Apache Spark within Glue for distributed data transformations, improving runtime and scalability and automated quality checks, file operations through Python scripts, improving runtime efficiency and reliability.

● Supported failover and failback mechanisms across AWS regions, ensuring data synchronization and system continuity during regional outages. Contributed to the business reconciliation framework, comparing system outputs with operational data to ensure accuracy and compliance.

● Prepared ad-hoc reports and validations supporting capital adequacy assessments for mortgage portfolios and documented operational runbooks and test scenarios for monthly production cycles, aiding smooth transitions and audits. Data Engineer Feb 2024 – May 2025

Mars Software Solutions, USA

● Orchestrated ingestion of real-estate property data with PySpark + SQL, landing raw files in AWS S3 and triggering updates via AWS Lambda. Standardized file layouts, partitioning, and naming to keep pipelines predictable and easy to operate.

● Bridged business requirements and data engineering workflows, converting stakeholder inputs into SQL logic, transformation rules that accurately reflect business operations improving data accuracy by 20% across business reports.

● Applied data-warehousing principles to build Star and Snowflake data modeling schemas, defining fact tables for transactions and dimension tables for master data. Incorporated Slowly Changing Dimension (SCD) Type-2 logic with data granularity, indexing, and partitioning to ensure accurate history tracking, faster queries, and consistent reporting.

● Curated cleansed/validated layers with PySpark (type casting, dedupes, late-arriving logic), and indexed lookups in DynamoDB for fast, operational reads. Added pipeline health checks, row-count/variance checks, and reconciliation logs to keep quality visible and auditable.

● Composed SQL-based KPIs for rent, expenses, occupancy, and aging; surfaced trends and exceptions and published AWS Quick sight dashboards for property managers with time-series views of rental trends, occupancy rates, maintenance costs, and lease renewals.

● Ran statistical validation (trend checks, ANOVA, simple regression) using IBM SPSS to confirm data consistency before reporting. Utilized simple AI and ML workflows, including rule-based anomaly detection and LLM-based text summarization, to streamline internal reporting.

Data Analytics Engineer July 2022- December 2022

Advanced Robotic Research Organization, Hyderabad, India

● Engineered end-to-end data pipelines in PySpark and SQL to manage IoT DJI Phantom drone flight data captured as sensor logs and JSON-like streams, including altitude, speed, and GPS coordinates.

● Processed and refined datasets in a central data repository using Python and OpenRefine to standardize sensor readings and ensure data quality for downstream reporting. Integrated results into Tableau, creating interactive dashboards to highlight flight trends, time-based patterns, and performance insights.

Automated Data Engineer November 2021- June 2022

Inventive Core Integrations, Hyderabad, India

● Worked on designing and building a sensor-driven pipeline crawler robotic system for pipeline inspections, improving accuracy and efficiency by 25%. Developed Python scripts to automate data collection and control locomotive movement, reducing manual efforts and ensuring smooth navigation.

● Built ETL pipelines using Python and Embedded C to process and analyze real-time data from robotic sensors and utilize OpenCV for image processing to enhance pipeline inspection accuracy, enabling better detection of structural defects and anomalies. Used AWS S3, Glue, and Redshift for data storage, transformation, and integration, improving system reliability by 30%. Applied a Random Forest model for anomaly detection to support predictive maintenance and prevent failures.

SKILL

● Data Engineering: Data Preprocessing, Data Transformations, Data Pipeline Optimization, Workflow Automation, Data Warehousing, ETL/ELT workflows, Debugging.

● AWS & Bigdata: S3, Lambda, Glue, Redshift, EMR, Amazon RDS, EC2, SNS/SQS, CloudWatch & CloudFormation usage for deployments, handling data movement across AWS services, Apache Spark, Apache Hadoop

● Programming: SQL, Python, Pyspark, R, Numpy, Pandas, Embedded C

● Analytics & Reporting: Power BI, Tableau, Crossfunctional Collaboration, KPI Reporting, AWS Quicksight

● Data Modeling: Star, Snowflake, Fact & Dimension, ER Modeling

● Generative AI, ML & Statistical Methods: Prompt Engineering, LLM API usage, RAGs, regression/classification models, Scikit-learn, Random Forest, K- Means Clustering, Exploratory Data Analysis (EDA), ANOVA, Chi-Squared Test, Multivariate Analysis, A/B Testing, Hypothesis testing

● Tools & Collaboration: Git, JIRA, Agile Methodologies, VS Code, OpenRefine, IBM SPSS Statistics, OpenCV, MS Excel CERTIFICATIONS

• Oracle Cloud Infrastructure 2024 Generative AI Certified Professional View Credential

• Build Your Generative AI Productivity Skills with Microsoft and LinkedIn View Credential

• AWS Certified Cloud Practitioner View Credential EDUCATION

• Master of Science (Data Science) – University of North Texas, USA Jan 2023 – May 2024

• Bachelor of Technology (Electronics and Communication Engineering) – Sreyas Institute of Engg & Technology July 2018 – June 2022

Contact this candidate