Facebook Data Pipeline using ChatGPT (for Knok’d)

AI-Driven Social Media Data Pipeline for Real Estate Listings (Knok’d)

PROBLEM

STATEMENT

The client was building a platform to list rental and selling properties and needed help with his data needs. They wanted to extract data from different social media channels, like FB groups, and ensure that all the data coming to their platform is cleaned and transformed according to industry standards.

SOLUTION

I started by analyzing the sources to finalize the scope of data extraction. Once they were confirmed, I implemented a Python script to monitor and fetch the required information from all the FB groups. The script was designed to store textual data in PostgreSQL while the images were being saved in GCP. 

The raw data was then cleaned before being passed to ChatGPT (via OpenAI API). A set of specified prompts was then used to transform the data according to the required structure. The data returned by GPT was then pushed to the frontend of the client’s platform.

Input

A list of FB groups and other sources to extract data

Output

The data (posts, moderators, subreddit metadata) for all the subreddits was stored in BigQuery tables

Tools &
Technologies

Facebook Data Pipeline using ChatGPT (for Knok’d)

Python (Scrapy)

Facebook Data Pipeline using ChatGPT (for Knok’d)

ChatGPT (OpenAI API)

Facebook Data Pipeline using ChatGPT (for Knok’d)

GCP

cronjob.png

Cronjob

Wikipedia Scraping

Scroll to Top

01. Home

02. Portfolio

03. Services

04. About

05. Blog

Office

Contact

Follow us