Member-only story
Scraping Reddit for Healthcare Information
In this post, we will be using a python package called PRAW to scrape Reddit for healthcare information. A basic knowledge of python syntax is required.
First, install PRAW:
pip install praw
Next, you need to register an application of the appropriate type here:
Once you redirect to the registration page, sign up by creating a username and password.
Next, you must create a new app:
Upon creating your app, you should get your client_id and secret key.
Next, let’s import the necessary packages:
import praw
import pandas as pd
Then we create an authenticated Reddit instance:
r = praw.Reddit(client_id = 'AbCd1234!',
client_secret = '4321dCbA!',
username= 'username101',
password= 'password101!',
user_agent='someagentinfo ')
Now let’s try pulling some posts related to health insurance:
while True:
subreddit = r.subreddit('healthinsurance')
for submission in subreddit.hot(limit=10):
op_text = submission.selftext.lower()
print(op_text)
This outputs something like:
We can also get health insurance specific posts. For Aetna we have:
while True:
subreddit = r.subreddit('aetna')
for submission in subreddit.hot(limit=10):
op_text = submission.selftext.lower()
print(op_text)