Facebook Data Pipeline using ChatGPT (for Knok’d)
AI-Driven Social Media Data Pipeline for Real Estate Listings (Knok’d)
PROBLEM
STATEMENT
The client was building a platform to list rental and selling properties and needed help with his data needs. They wanted to extract data from different social media channels, like FB groups, and ensure that all the data coming to their platform is cleaned and transformed according to industry standards.
SOLUTION
I started by analyzing the sources to finalize the scope of data extraction. Once they were confirmed, I implemented a Python script to monitor and fetch the required information from all the FB groups. The script was designed to store textual data in PostgreSQL while the images were being saved in GCP.
The raw data was then cleaned before being passed to ChatGPT (via OpenAI API). A set of specified prompts was then used to transform the data according to the required structure. The data returned by GPT was then pushed to the frontend of the client’s platform.
Input
A list of FB groups and other sources to extract data
Output
The data (posts, moderators, subreddit metadata) for all the subreddits was stored in BigQuery tables
Tools &
Technologies
Python (Scrapy)
ChatGPT (OpenAI API)
GCP