Chris Malec

Data Scientist

The Stardew Valley Grange Display - Basic Web Scraping

Lately, I’ve been pretty obsessed with Stardew Valley. It’s a lovely little farming game, with a lot more to do besides grow crops. For the purpose of the next couple posts, I’d like to focus on an event that happens every fall, the Stardew Valley Fair. Specifically, you have the option of creating a Grange Display, and winning gets you 1000 star points. The star points aren’t the most useful thing in the world, but like many things in the game, that’s hardly the point. I’d just like to figure out how to get a perfect score with the least amount of work.

Step one that I’ll describe in this post is some super basic web-scraping of the stardew valley wiki. My goal is to grab the tables on the page about the fair to find the point values for the different items you can put in the grange display. I’ll use the requests package to get the web page and parse the html using Beautiful Soup. My ultimate goal is to create a pandas data frame.

#import libraries
import requests
from bs4 import BeautifulSoup
from IPython.display import display, HTML
import pandas as pd
#get html from stardew wiki
r = requests.get('https://stardewvalleywiki.com/Stardew_Valley_Fair')
html_text = r.text
soup = BeautifulSoup(html_text, 'html.parser')

Beautiful Soup turns the html page into an object that can be accessed in a much more pythonic way. For example, here I can find all the ‘wikitable’s on the page. Dealing with the actual text inside each row and column requires a slightly different procedure depending on if the text is a single string, multiple strings, or an image. Cells that have both an image and text are read just as text, since the image is usually a ‘gold’ or ‘star point’ image. It would be possible to reproduce these cells, but it would be more work, and doesn’t really help me.

#find all the tags that are part of the wikitable css class
tables = soup.find_all(class_="wikitable")
#a little helper function for when there are several strings withtin the same tag
def join_strings(tag):
    return ''.join([string for string in tag.stripped_strings])
#function to parse the table into a pandas dataframe
#optionally searches for the heading to add a 'category' column to the dataframe
def parse_table(table, get_heading = True):
    if get_heading:
        table_title = [join_strings(table.find_previous(class_='mw-collapsible').th)]
    else:
        table_title = []
    row = table.tr #the first row has several 'th' tags, this is pretty general for tables on this site
    headers = [elem.string.strip() for elem in row.find_all("th")]
    #print(headers)
    records = []
    row = row.next_sibling
    while row:
        if row.string:
            row = row.next_sibling #prevents reading in rows that are just a newline without columns
            continue
        else:
            column = row.td #find first column
        record = []
        while column:
            if column.string:
                cell_text = column.string.strip() #get string within the td tag
            else:
                cell_text =  join_strings(column) #get string when there are multiple strings in a tag''.join([string for string in column.stripped_strings])
            try:
                image = column.img
                #Don't want the star token and gold images
                if (image.attrs['alt'] != 'Token.png')&(image.attrs['alt'] != 'Gold.png'):
                    if image.attrs['src'][:5]!='https':
                        image.attrs['src'] = 'https://stardewvalleywiki.com'+image.attrs['src'] #get full location
                    image = str(image)#Needs to be a string to be resolved in DataFrame
                else:
                    image = None
            except:
                image = None
            
            if cell_text != '':
                record.append(cell_text) #adds column to row record if it isn't empty
            elif image:
                record.append(image) #adds column to row record if it's an image (and just an image)
            column = column.next_sibling #goes to the next column
        records.append(tuple(record+table_title)) #adds row to the list of records
        #print(records)
        row = row.next_sibling #goes to the next row
    if get_heading:
        columns = headers+['category'] #adds category column if we are finding the table heading
    else:
        columns = headers #just uses the column names found in th tags
    table_df = pd.DataFrame.from_records(data=records,columns = columns).dropna() #creates a pandas dataframe from a list of records
    return table_df
def display_table(table_df):
    return display(HTML(table_df.to_html(escape=False,index=False))) #helper function to make it display like I want
dfs = []
for table in tables[3:11]:
    dfs.append(parse_table(table))#Get all the tables related to the grange display
grange_df = pd.concat(dfs,axis=0,ignore_index=True)#Concatenate the tables into one flat table
grange_df = grange_df.melt(id_vars = ["Item","category","Price"],
                             value_vars = ["Base","Silver","Gold","Iridium"],
                             var_name = "Quality",
                             value_name = "Points")#Melt the table so that the points are all in one column
display_table(grange_df)
Item category Price Quality Points
Honey (Wild)* Artisan Goods 100g Base 3
Jelly (Ancient Fruit) Artisan Goods 1,150g Base 6
Jelly (Apple) Artisan Goods 250g Base 4
Jelly (Apricot) Artisan Goods 150g Base 3
Jelly (Banana) Artisan Goods 350g Base 5
Jelly (Blackberry) Artisan Goods 90g Base 3
Jelly (Blueberry) Artisan Goods 150g Base 3
Jelly (Cactus Fruit) Artisan Goods 200g Base 4
Jelly (Cherry) Artisan Goods 210g Base 4
Jelly (Coconut) Artisan Goods 250g Base 4
Jelly (Cranberries) Artisan Goods 200g Base 4
Jelly (Crystal Fruit) Artisan Goods 350g Base 5

And here’s an example of one of the tables that has images in it. For my goal of finding an easy perfect score grange display, I won’t really need it, but it’s a nice thing to know how to do.

display_table(parse_table(tables[12]))
Image Name Description Price category
Dried Sunflowers.png Dried Sunflowers Can be placed inside your house. 100 Vegetables
Fedora.png Fedora A city-slicker's standard. 500 Vegetables
Rarecrow 1.png Rarecrow Collect them all! (1 of 8) 800 Vegetables
Stardrop.png Stardrop A mysterious fruit that empowers those who eat it. The flavor is like a dream... a powerful personal experience, yet difficult to describe to others. 2,000 Vegetables
Light Green Rug.png Light Green Rug Can be placed inside your house. 500 Vegetables

And that’s it! In the next post, I’ll analyze my pandas dataframes.