Instagram Flyer Finder
AI-Powered Instagram Scraper for Automated Event Detection & De-duplication
PROBLEM
STATEMENT
The client needed an Instagram scraper to scan images from the posts and stories of certain Instagram handles and classify them as posters or flyers. These images had to be analyzed by an AI tool, like Google Vision API or ChatGPT (OpenAI API), for getting the details of these flyers (event name, event date, etc.).
Based on these details, the system needs to find the upcoming events and save the details into the database. The client also wanted the scraper to handle the issues of past events and duplication as different users can post about the same event.
SOLUTION
I started by finding a Rocket API that could provide all the required information. Then, I implemented the Python script to retrieve images based on the criteria specified by the client. The scraper was optimized to store only the images that meet the specified criteria.
For this purpose, each image was passed to the OpenAI API (alongside a specific prompt) to ensure that it meets the criteria before being stored into the S3 bucket. The details of the stored image were then stored in a MySQL database. The images that failed to meet the specifications were discarded on the spot (instead of first saving and then deleting). This optimization helped in saving a lot of money in terms of storage cost and API credits.
Input
List of Instagram handles
Output
The details of all the upcoming events (posted by given Instagram handles) were fetched and saved in S3 and MySQL database.
Tools &
Technologies

Python

Rocket API

OpenAI API (ChatGPT)

AWS S3

MySQL
