Vishnu Vardhan Mandula
Data Engineer
********************@*****.*** 551-***-**** LinkedIn Portfolio Data Engineer with 5+ years of experience designing, developing, and optimizing large-scale data acquisition systems, ETL pipelines, and database solutions. Proven track record in building real-time ingestion workflows, dynamic web scrapers, and content enrichment pipelines using Python, Spark, and AWS. Skilled in database design, SQL/NoSQL optimization, and delivering high-quality datasets to stakeholders under tight deadlines. Passionate about leveraging cutting-edge technologies to improve data speed, accuracy, and accessibility in fast-paced, high-volume environments. Programming languages: Python 3 (Pandas, NumPy & matplotlib), Java, HTML, CSS SQL, Spark, Scala, Bash. Big Data: Hadoop, Hiva, Elasticsearch, Redis, PostgreSQL, MongoDB, MySQL Distrusted systems: Apache Spark, Databricks, Kubernetes, Kafka AWS/Azure cloud: EC2, S3, EMR, Airflow, Lambda, Athena, Glue, IAM, Redshift, DynamoDB, CloudWatch, Sage maker, Kinesis, Azure SQL Database, Azure Load Balancer, DevOps Tool Integrations. Other: Docker, Git, Kibana, Flask, PyTorch, Salesforce, Tableau, Power Bl, Mata Trader 4, Jira, Terraform, Grafana. Senior Data Engineer, Freddie Mac June 2023 – present Jersey City, NJ
• Built a Data Lake in Amazon S3 using Python, R, and Spark for the client, imported data from Snowflake tables using CRM Postgres DB, Salesforce, MySQL Server, Amazon RDS, and Integrated data from multiple sources.
• Implemented automation regression scripts for ETL processes across AWS Redshift, MongoDB, and SQL Server; contributed to a 15% increase in data reliability, benefiting over 100 business users relying on accurate analytics.
• Built and maintained real-time data acquisition and ingestion pipelines in AWS + Databricks, processing 100GB+ of multi- source data daily from Salesforce, Snowflake, and SQL Server.
• Transformed vast sets of financial temporal data into actionable insights by leveraging AWS S3, EMR, Athena, HDFS, Databricks, and Apache Spark, resulting in a 30% increase in data processing efficiency.
• Increased storage capacity by 50% and reduced latency by 250% with Amazon Aurora Databases and DynamoDB, implementing real-time data processing solutions with AWS Kinesis and AWS Lambda.
• Optimized the performance of AWS-hosted applications using CloudWatch monitoring, which reduced error rates by 10%, and migrated the company's entire workload to AWS cloud using EC2 and S3 for efficient scaling, which resulted in 40% more efficiency.
• Evaluated performance of business requirements, performed data segmentation, integrated customer data into emails, enforced compliance approvals, and analyzed customer engagement using Databricks, Snowflake, PySpark, and AWS S3, enhancing customer insights and compliance adherence.
• Improved data pipeline efficiency by 30% by managing a cloud-based data manipulation pipeline using SQL, Python, and DBT. Optimized data transformation and integration across banking applications, leveraging CloudFormation and Jenkins for streamlined CI/CD deployment.
• Created and optimized SQL queries for robust data reporting and visualization in Looker, enhancing the accuracy and accessibility of financial reports for stakeholders.
• Coordinated OLTP and OLAP processes by collaborating with cross-functional teams, achieving a 20% increase in data consistency, enhancing strategic planning, and accelerating decision-making capabilities across the organization by 25%
• Streamlined data access protocols within MongoDB and HBase structures for better performance and scalability, ensuring efficient handling of large datasets and high query throughput.
• Exported the analyzed data to the relational databases using Amazon EMR for visualization and generated reports using Quick sight. Utilized Apache Airflow pipelines to automate DAGs, their dependencies, and log files. PROFESSIONAL SUMMARY
TECHNICAL SKILLS
PROFESSIONAL EXPRIENCE
Data Analyst Engineer, Thryve Digital Health LLP Jun 2020 – Aug 2022 Hyderadbad, India
• Developed predictive models using regression in collaboration with the healthcare analytics team using Python, AWS Sage Maker, EC2, and S3 which improved analysis of total charges and length of stay for patients with COVID-19 and mental illness.
• Gathered medical records in the filing system which helped decrease outpatient wait time by 13.2%, adhering to Agile principles and delivering projects on time.
• Maintained data pipeline up-time of 99.8% while ingesting streaming and transactional data across 8 different primary data sources using Spark, Redshift, S3, and Python
• Managed on-premises data infrastructure including Apache Kafka, Apache Spark, Elasticsearch, Redis, and MongoDB, using Docker containers, which enhanced system reliability and performance.
• Constructed MapReduce jobs to validate, clean, and access data and worked with Sqoop jobs with incremental load to populate and load into Hive External tables, improving data accuracy and accessibility.
• Executed in-depth data analysis and transformation processes, achieving compliance with industry regulations while improving the accuracy of predictive models by 25%, and enhancing stakeholder decision-making capabilities.
• Utilized data warehousing techniques like Star Schema, Snowflake Schema, normalizing, denormalization, transformations, and aggregation to streamline data processes, improving data accessibility and analysis
• Collaborated with ten cross-functional teams to identify key data management requirements, resulting in the deployment of a system supporting over 100 data entry points, which improved operational scalability using Oracle Database for transactional data management
• Engineered scalable NoSQL databases using MongoDB and HBase to manage large volumes of unstructured data, enhancing data retrieval speeds and storage efficiency.
• Built user-friendly dashboards to analyze over 1M+ data points using ggplot, Python matplotlib, Power BI, and Tableau to analyze important features and model performance.
• Created a centralized repository using Git and GitHub for medical project data management, ensuring better collaboration and code management within the team.
Software Engineer- Data Crawling & Analytics, Cliff.AI Dec 2019 – May 2020 Hyderabad, India
• Led a project to streamline data extraction processes by organizing web scraping scripts using Python libraries, Java, R, and SQL, ensuring ethical data usage. Reduced manual data collection time by 25 hours monthly, which increased team productivity
• Designed and developed dynamic web scraping jobs using Python (Scrapy, Beautiful Soup), XPath, and asynchronous requests to acquire structured and unstructured business data from diverse web sources and APIs.
• Implemented content acquisition, enrichment, and classification pipelines, improving data freshness and coverage while reducing processing time by 40%.
• Transformed raw data into a usable format using Pandas and NumPy, improving the quality of datasets for further analysis. Automated Data Extraction, Transformation, and Loading operations using Java-based ETL processes, utilizing J2EE and NoSQL technologies, which reduced manual workload by 29% monthly
• Ingested data from disparate data sources using a combination of •SQL, Google Analytics API, and Salesforce API using Python to create data views in BI tools like Tableau.
• Designed and maintained robust data pipelines for annuity computations and risk analysis using •SQL, JSON, and XML formats.
• Automated data extraction and processing workflows using• VBA, ensuring seamless integration of diverse data sources into predictive modeling frameworks.
• Led the implementation of • RESTful API integrations for seamless data exchange between domain systems, leveraging JSON and XML formats for data transfer.
• Contributed to developing BI tooling solutions, integrating •Power BI dashboards with existing data infrastructure, and using Tableau to create and maintain data visualizations.
• Collaborated with cross-functional teams, including Data scientists & Application developers to guide the development and• implementation of Cloud applications, systems, and processes using DevOps methodologies. Master’s in computer science, Stevens Institute of Technology Aug 2022 – Dec 2023 Hoboken, NJ Teaching Assistant - Instructed students in the use of R, Python, SQL, Excel, and Tableau. Guide students in the analysis and modeling of their Confidential project datasets.
Bachelor’s in computer science, JNTUH Hyderabad, India EDUCATION