Scraping Reddit for Healthcare Information

Sadrach Pierre, Ph.D.
3 min readNov 21, 2019
Photo by Gustavo Fring on Pexels

In this post, we will be using a python package called PRAW to scrape Reddit for healthcare information. A basic knowledge of python syntax is required.

First, install PRAW:

pip install praw

Next, you need to register an application of the appropriate type here:

Once you redirect to the registration page, sign up by creating a username and password.

Next, you must create a new app:

Upon creating your app, you should get your client_id and secret key.

Next, let’s import the necessary packages:

import praw
import pandas as pd

Then we create an authenticated Reddit instance:

r = praw.Reddit(client_id = 'AbCd1234!', 
client_secret = '4321dCbA!',
username= 'username101',
password= 'password101!',
user_agent='someagentinfo ')

Now let’s try pulling some posts related to health insurance:

while True:
subreddit = r.subreddit('healthinsurance')
for submission in
op_text = submission.selftext.lower()

This outputs something like:

We can also get health insurance specific posts. For Aetna we have:

while True:
subreddit = r.subreddit('aetna')
for submission in
op_text = submission.selftext.lower()
Sadrach Pierre, Ph.D.

Writer for Built In & Towards Data Science. Cornell University Ph. D. in Chemical Physics.