Reddit Collector

Scrape all reddits made under a specific reddit.

Reddit to MongoDB data flow via JSON

PROBLEM

STATEMENT

Our client wanted to download all data (including post text and media) on-demand for any given sub-reddit for data analysis and machine learning. The app was required to be fast and reliable so that a large amount of data could be collected without any missing pieces.

SOLUTION

This app crawled all posts made under a specific subreddit in a given time period. The extracted data includes title, timestamp, post text, permalink, category, votes, media type (audio/video) and media of each post scraped. The textual data was saved in a mongo DB while the media files were saved on client’s server.

Input

No of inputs:

Output

The extracted data was saved as follows:

Tools &
Technologies

Python programming language logo

Python (Scrapy)

Pie chart data visualization

MongoDB

API Logo

Cronjob

High Growth Startup Finder

Scroll to Top

01. Home

02. Portfolio

03. Services

04. About

05. Blog

Office

Contact

Follow us