Reddit Collector

Scrape all reddits made under a specific reddit.

PROBLEM

STATEMENT

Our client wanted to download all data (including post text and media) on-demand for any given sub-reddit for data analysis and machine learning. The app was required to be fast and reliable so that a large amount of data could be collected without any missing pieces.

SOLUTION

This app crawled all posts made under a specific subreddit in a given time period. The extracted data includes title, timestamp, post text, permalink, category, votes, media type (audio/video) and media of each post scraped. The textual data was saved in a mongo DB while the media files were saved on client’s server.

Input

No of inputs:

Output

The extracted data was saved as follows:

Tools &
Technologies

Python (Scrapy)

MongoDB

Cronjob

High Growth Startup Finder

Scroll to Top

01. Home

02. Portfolio

03. Services

04. About

05. Blog

Office

Contact

Follow us