22 Oct 2025
Another Numpy Neural Net
There’s a number of resources on creating a neural net “from scratch” out there. I put the from scratch in quotes because numpy is a lot of machinery to start with, but I think it’s a fair label because numpy looks about as close to the abstract math that powers these things as you can get. My goal here is to demonstrate development fundamentals I’ve picked up as a scientist-turned-dev, and another was to see just how much co-pilot could do for me on a task that I know is well represented in the training set. The answer short answer is, quite a bit, but you really need to know what you’re doing or it’ll get you into trouble fast.
The Bedrock of Deep Learning: Differentiable Functions
A well known but under-appreciated fact about most machine learning models is that the very important loss/activation functions have to be differentiable. So I could have made an Activation class and a Loss class first, but for pedagogical reasons, I made a DifferentiableFunction class first. The first subclass is of course our friend softmax. I just typed in the class names, and tabbed my way to success on these, including type hints and function signatures for the callables (highly recommended).
import numpy as np
from typing import Callable
class DifferentiableFunction:
function: Callable[[np.ndarray], np.ndarray]
derivative: Callable[[np.ndarray], np.ndarray]
def __init__(self, function: Callable[[np.ndarray], np.ndarray], derivative: Callable[[np.ndarray], np.ndarray]):
self.function = function
self.derivative = derivative
class SoftMax(DifferentiableFunction):
def __init__(self):
def softmax(x: np.ndarray) -> np.ndarray:
e_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
return e_x / e_x.sum(axis=-1, keepdims=True)
def softmax_derivative(x: np.ndarray) -> np.ndarray:
s = softmax(x)
return s * (1 - s)
super().__init__(softmax, softmax_derivative)
the super that appears in SoftMax calls the method from the parent class, so after defining the softmax function and its derivative, the class instantiates an instance of DifferentiableFunction with those two functions.
I also tried to get copilot to write me some tests, which for this task were pretty all right, but testing numerical functions is very straightforward in that you can give it some inputs (though I should include 0) and see if it outputs what it should. Also, though I see plenty of devs with more years under their belt than me do this, I feel like it could have made a fixture to house the test data since it’s going to be used over and over again. However, there is something to be said for having it right there in the test.
import numpy as np
from DifferentiableFunction import DifferentiableFunction, SoftMax
def test_softmax_function():
sm = SoftMax()
x = np.array([1.0, 2.0, 3.0])
result = sm.function(x)
expected = np.exp(x - np.max(x)) / np.sum(np.exp(x - np.max(x)))
np.testing.assert_allclose(result, expected, rtol=1e-6)
def test_softmax_derivative_shape():
sm = SoftMax()
x = np.array([1.0, 2.0, 3.0])
deriv = sm.derivative(x)
assert deriv.shape == x.shape
Another note on the tests, this is an area where you need to pay for the better copilot plan to get more than a tiny number of tests written for you, but I might. Even though as we will see later, the tests become less impressive as our tasks become more abstract, the fact that the computer did the gruntwork of writing some test functions and later a test class, including throwing some sample data in there is really nice. It needs to be done, it’s not hard to do, and I hate doing it, so fine Microsoft, take my money.
In the next section, we will make a Layer class!
20 Oct 2021
The IEX Order Book - Reading the Data
I recently gave a talk to the Python Frederick Meetup about Pandas and wanted to show some financial data since that’s what I’ve been working with lately. Specifically, I wanted something that had real time quotes and trades, but at best most sources give you the high, low, and median price aggregated over something like an hour or a day. There are a lot of sources of market data, and they have most definitely put a price on that data, a high price.
NYSE TAQ is about \$1000/month for the first 12 months and $500/month to go farther back in time. NASDAQ TotalView and similar products have a pretty complex fee structure, so I’m not entirely sure how much it costs. You can at least see the real time data on your RobinHood (Gold account) or Charles Schwab account, but historical data’s going to cost you.
Which brings us to the Investor’s Exchange, or IEX. Featured in the Michael Lewis book Flash Boys, they do their best to offer access to the market with protections in place that favor long-term over short-term strategies, keeping in mind ‘long term’ might still mean only holding a stock for a millisecond. They’re a pretty small compared to NYSE or NASDAQ, but they allow you to download pretty detailed historical data for free…as a dump of network packets.
What is an order book?
Why on earth do I want to see the order book in the first place? What is it? Every time a broker wants to make a trade, they put an an order (I want to buy 200 shares of stock A for $100 each). Until the order is canceled, or a seller is found that is willing to meet the buyer at their price, the order ‘rests on the book.’ Most heavily traded stocks do not have orders that rest long, unless the number of shares is exceptionally large or the price point is completely unreasonable. Here’s an extremely simplified view of what an order book might look like.
| time |
side |
quantity |
price |
| 9:30 |
B |
100 |
45.00 |
| 9:31 |
B |
200 |
45.01 |
| 9:31 |
S |
100 |
45.05 |
| 9:32 |
B |
50 |
45.00 |
What do you get from the IEX feed?
In the DEEP data product, you get a number of message types, notably, when there is a trade, and when there is a price level update. This isn’t quite the order book, in that there could be many orders combined together to create one price level update, but you do see how many shares have a bid (I will buy your stock) or offer (I want to sell you my stock) out at each dollar amount, which is pretty darn good for free. Important to note again, IEX is only a few percent of the total market volume, so you’re not seeing the whole market, but you get a neat view that you would typically have to pay a lot to see. Here, you only pay with parsing pain.
!xxd -l 112 -s 20000 ~/Downloads/data_feeds_20210122_20210122_IEXTP1_DEEP1.0.pcap
00004e20: 7cb9 0500 ed64 2f58 e805 0000 e805 0000 |....d/X........
00004e30: 0100 5e57 1504 b859 9ff9 2d53 0800 4500 ..^W...Y..-S..E.
00004e40: 05da e340 4000 4011 9f90 17e2 9b84 e9d7 ...@@.@.........
00004e50: 1504 288a 288a 05c6 0402 0100 0480 0100 ..(.(...........
00004e60: 0000 0000 d948 9605 4200 fc42 0000 0000 .....H..B..B....
00004e70: 0000 1903 0000 0000 0000 f76d ff74 b88d ...........m.t..
00004e80: 5c16 1600 4854 c51f ff74 b88d 5c16 4149 \...HT...t..\.AI
And that’s what you get. A lot of hex digits and a specifications document. Fortunately, there is a super cool person on the internet who made a parser that can get all this data into a json format based on the specifications document.
#A lovely command line conversion of the binary to a json file
!pip install iex_parser
!iex_to_json -i iex_file.pcap.gz -o iex_file.json.gz -t 'GME' -s
import json
with open('iex_deep_quotes_and_trades.json') as file:
line = file.readline()
print(line[:500])
[{"type":"trade_report","event":null,"timestamp":"2021-01-22T13:02:31.215300+00:00","status":null,"symbol":"GME","detail":null,"halt_status":null,"reason":null,"flags":96.0,"size":39.0,"price":45.19,"trade_id":2067095.0,"side":null,"security_event":null},{"type":"trade_report","event":null,"timestamp":"2021-01-22T13:08:14.700160+00:00","status":null,"symbol":"GME","detail":null,"halt_status":null,"reason":null,"flags":96.0,"size":50.0,"price":44.87,"trade_id":2639914.0,"side":null,"security_even
It’s still hard to understand without reading the documentation and a little knowledge of how order books work, but it’s a heck of a lot more readable, and easy to turn into a dataframe.
#Read in json file, many json files that have a schema without
#a lot of nesting and variation can be read safely as records
from pandas.io.json import read_json
json_df = read_json('iex_deep_quotes_and_trades.json',orient='records')
display(json_df)
|
type |
event |
timestamp |
... |
trade_id |
side |
security_event |
| 0 |
trade_report |
NaN |
2021-01-22 13:02:31.215300+00:00 |
... |
2.067095e+06 |
None |
NaN |
| 1 |
trade_report |
NaN |
2021-01-22 13:08:14.700160+00:00 |
... |
2.639914e+06 |
None |
NaN |
| 2 |
trade_report |
NaN |
2021-01-22 13:11:52.294756+00:00 |
... |
3.063945e+06 |
None |
NaN |
| 3 |
trade_report |
NaN |
2021-01-22 13:18:22.383301+00:00 |
... |
3.669247e+06 |
None |
NaN |
| 4 |
trade_report |
NaN |
2021-01-22 13:19:09.002873+00:00 |
... |
3.739332e+06 |
None |
NaN |
| ... |
... |
... |
... |
... |
... |
... |
... |
| 490763 |
price_level_update |
NaN |
2021-01-28 21:59:06.793899+00:00 |
... |
NaN |
S |
NaN |
| 490764 |
price_level_update |
NaN |
2021-01-28 21:59:06.797557+00:00 |
... |
NaN |
S |
NaN |
| 490765 |
trade_report |
NaN |
2021-01-28 21:59:08.241677+00:00 |
... |
3.027362e+09 |
None |
NaN |
| 490766 |
price_level_update |
NaN |
2021-01-28 21:59:08.241677+00:00 |
... |
NaN |
S |
NaN |
| 490767 |
price_level_update |
NaN |
2021-01-28 22:00:00.018339+00:00 |
... |
NaN |
S |
NaN |
490768 rows × 14 columns
#see what the na situation is
json_df.isna().sum()
type 0
event 490768
timestamp 0
status 490768
symbol 0
...
size 0
price 0
trade_id 356056
side 134712
security_event 490768
Length: 14, dtype: int64
#get rid of columns that are entirely null
json_df = json_df.dropna(axis = 1,how='all')
display(json_df)
|
type |
timestamp |
symbol |
... |
price |
trade_id |
side |
| 0 |
trade_report |
2021-01-22 13:02:31.215300+00:00 |
GME |
... |
45.19 |
2.067095e+06 |
None |
| 1 |
trade_report |
2021-01-22 13:08:14.700160+00:00 |
GME |
... |
44.87 |
2.639914e+06 |
None |
| 2 |
trade_report |
2021-01-22 13:11:52.294756+00:00 |
GME |
... |
44.58 |
3.063945e+06 |
None |
| 3 |
trade_report |
2021-01-22 13:18:22.383301+00:00 |
GME |
... |
44.04 |
3.669247e+06 |
None |
| 4 |
trade_report |
2021-01-22 13:19:09.002873+00:00 |
GME |
... |
43.78 |
3.739332e+06 |
None |
| ... |
... |
... |
... |
... |
... |
... |
... |
| 490763 |
price_level_update |
2021-01-28 21:59:06.793899+00:00 |
GME |
... |
262.00 |
NaN |
S |
| 490764 |
price_level_update |
2021-01-28 21:59:06.797557+00:00 |
GME |
... |
262.00 |
NaN |
S |
| 490765 |
trade_report |
2021-01-28 21:59:08.241677+00:00 |
GME |
... |
262.00 |
3.027362e+09 |
None |
| 490766 |
price_level_update |
2021-01-28 21:59:08.241677+00:00 |
GME |
... |
262.00 |
NaN |
S |
| 490767 |
price_level_update |
2021-01-28 22:00:00.018339+00:00 |
GME |
... |
988.00 |
NaN |
S |
490768 rows × 8 columns
#What are we left with?
json_df.isna().sum()
type 0
timestamp 0
symbol 0
flags 0
size 0
price 0
trade_id 356056
side 134712
dtype: int64
#What data types are we working with?
json_df.dtypes
type object
timestamp datetime64[ns, UTC]
symbol object
flags int64
size int64
price float64
trade_id float64
side object
dtype: object
#The objects really should be strings
json_df = json_df.astype({'type':'string','symbol':'string','side':'string'})
json_df.dtypes
type string
timestamp datetime64[ns, UTC]
symbol string
flags int64
size int64
price float64
trade_id float64
side string
dtype: object
#Fill in nulls on the side, since that may cause trouble plotting trades
#create a date column for filtering purposes, and change from UTC to EST
from datetime import timezone
json_df = json_df.fillna({'side':'X'})#replace nulls
json_df['date'] = (json_df
.apply({'timestamp':lambda x: x.date}))
json_df['timestamp'] = (json_df['timestamp']
.apply(lambda x: x.astimezone(tz='EST')
.replace(tzinfo=None)))#change to local time
display(json_df)
|
type |
timestamp |
symbol |
... |
trade_id |
side |
date |
| 0 |
trade_report |
2021-01-22 08:02:31.215300 |
GME |
... |
2.067095e+06 |
X |
2021-01-22 |
| 1 |
trade_report |
2021-01-22 08:08:14.700160 |
GME |
... |
2.639914e+06 |
X |
2021-01-22 |
| 2 |
trade_report |
2021-01-22 08:11:52.294756 |
GME |
... |
3.063945e+06 |
X |
2021-01-22 |
| 3 |
trade_report |
2021-01-22 08:18:22.383301 |
GME |
... |
3.669247e+06 |
X |
2021-01-22 |
| 4 |
trade_report |
2021-01-22 08:19:09.002873 |
GME |
... |
3.739332e+06 |
X |
2021-01-22 |
| ... |
... |
... |
... |
... |
... |
... |
... |
| 490763 |
price_level_update |
2021-01-28 16:59:06.793899 |
GME |
... |
NaN |
S |
2021-01-28 |
| 490764 |
price_level_update |
2021-01-28 16:59:06.797557 |
GME |
... |
NaN |
S |
2021-01-28 |
| 490765 |
trade_report |
2021-01-28 16:59:08.241677 |
GME |
... |
3.027362e+09 |
X |
2021-01-28 |
| 490766 |
price_level_update |
2021-01-28 16:59:08.241677 |
GME |
... |
NaN |
S |
2021-01-28 |
| 490767 |
price_level_update |
2021-01-28 17:00:00.018339 |
GME |
... |
NaN |
S |
2021-01-28 |
490768 rows × 9 columns
#Create subset that is just the order book updates
mask = json_df['type']=='price_level_update'
select_cols = ['timestamp','size','price','side']
sort_cols = ['timestamp','price']
order_df = json_df.loc[mask,select_cols].sort_values(sort_cols)
I’m not sure if this is the most effective way to accomplish the task, but I felt like making a class to do the data manipulation necessary to carry the information forward from one row to the next. The goal is for each timestep to specify the correct number of shares at each price level at each point in time. This basically accumulates all the different price levels forward in time. If there were only a few price levels, breaking them out into their own columns and doing a cumulative sum would be the most effective. However:
print('There are ',len(order_df.price.unique()),' price levels.')
There are 11848 price levels.
class PriceLevels(dict):
def ignore_item(self,item):
return self
def add_or_discard(self,size,price,side,quote_side):
if (size > 0)&(side==quote_side):
self.update({price:size})
elif (size == 0)&(side==quote_side):
self.pop(price)
else:
self.ignore_item
return self
def get_bbo(self,side):
if (side == 'B')&(len(self)>0):
return max(self.keys())
elif (side == 'S')&(len(self)>0):
return min(self.keys())
else:
return None
def get_vwap(self):
if len(self)==0:
return None
volume = self
return sum([k*v for k,v in self.items()])/sum([v for v in self.values()])
def update_prices(self,size,price,side,quote_side):
self.add_or_discard(size,price,side,quote_side)
return PriceLevels(self.copy())
#Some examples of how the object works
a = PriceLevels({45.45:100,50:100,55:50})
print('set price levels: ',a)
a.update_prices(100,46.05,'B','B')
print('update price level same side: ',a)
a.update_prices(100,46.10,'S','B')
print('update price level opposite side: ',a)
a.update_prices(0,50,'B','B')
print('remove price same side: ',a)
print('get the best bid: ',a.get_bbo('B'))
print('get vwap: ',a.get_vwap())
set price levels: {45.45: 100, 50: 100, 55: 50}
update price level same side: {45.45: 100, 50: 100, 55: 50, 46.05: 100}
update price level opposite side: {45.45: 100, 50: 100, 55: 50, 46.05: 100}
remove price same side: {45.45: 100, 55: 50, 46.05: 100}
get the best bid: 55
get vwap: 47.6
#For each timestamp, find current sizes of each available price, the best bid and offer,
#as well as the VWAP (value weighted average price) or the buy orders, sell orders, and all orders
bid = PriceLevels()
ofr = PriceLevels()
quotes = dict()
#Use iterrows()
for row in order_df.iterrows():
timestamp,size,price,side = row[1]
quotes[timestamp] = {'bid':bid.update_prices(size,price,side,'B'),
'ofr':ofr.update_prices(size,price,side,'S'),
'best_bid':bid.get_bbo('B'),
'bid_vwap':bid.get_vwap(),
'best_ofr':ofr.get_bbo('S'),
'ofr_vwap':ofr.get_vwap(),
'avg_vwap':PriceLevels({**bid,**ofr}).get_vwap(),
'date':timestamp.date()}
list(quotes.items())[500:504]
[(Timestamp('2021-01-22 09:31:48.205629'),
{'bid': {24.98: 100},
'ofr': {43.61: 300},
'best_bid': 24.98,
'bid_vwap': 24.98,
'best_ofr': 43.61,
'ofr_vwap': 43.61,
'avg_vwap': 38.9525,
'date': datetime.date(2021, 1, 22)}),
(Timestamp('2021-01-22 09:31:48.320642'),
{'bid': {24.98: 100},
'ofr': {43.61: 300, 43.59: 300},
'best_bid': 24.98,
'bid_vwap': 24.98,
'best_ofr': 43.59,
'ofr_vwap': 43.6,
'avg_vwap': 40.94,
'date': datetime.date(2021, 1, 22)}),
(Timestamp('2021-01-22 09:31:48.320661'),
{'bid': {24.98: 100},
'ofr': {43.59: 300},
'best_bid': 24.98,
'bid_vwap': 24.98,
'best_ofr': 43.59,
'ofr_vwap': 43.59,
'avg_vwap': 38.93750000000001,
'date': datetime.date(2021, 1, 22)}),
(Timestamp('2021-01-22 09:31:48.325218'),
{'bid': {24.98: 100},
'ofr': {43.6: 300},
'best_bid': 24.98,
'bid_vwap': 24.98,
'best_ofr': 43.6,
'ofr_vwap': 43.6,
'avg_vwap': 38.945,
'date': datetime.date(2021, 1, 22)})]
#Create a dataframe of current in the order book
quote_df = (pd.DataFrame
.from_dict(quotes,orient='index')
.reset_index()
.rename(columns={'index':'timestamp'})
.dropna(subset=['best_bid','best_ofr'],how='all'))
display(quote_df)
|
timestamp |
bid |
ofr |
... |
ofr_vwap |
avg_vwap |
date |
| 0 |
2021-01-22 08:25:48.218283 |
{44.45: 1000} |
{} |
... |
NaN |
44.45000 |
2021-01-22 |
| 2 |
2021-01-22 08:44:23.578790 |
{44.04: 1000} |
{} |
... |
NaN |
44.04000 |
2021-01-22 |
| 4 |
2021-01-22 08:49:02.526710 |
{} |
{43.66: 100} |
... |
43.66000 |
43.66000 |
2021-01-22 |
| 6 |
2021-01-22 08:49:28.189795 |
{43.34: 100} |
{} |
... |
NaN |
43.34000 |
2021-01-22 |
| 8 |
2021-01-22 08:51:11.786607 |
{43.41: 100} |
{} |
... |
NaN |
43.41000 |
2021-01-22 |
| ... |
... |
... |
... |
... |
... |
... |
... |
| 342613 |
2021-01-28 16:56:26.486624 |
{} |
{988.0: 112} |
... |
988.00000 |
988.00000 |
2021-01-28 |
| 342614 |
2021-01-28 16:59:06.409716 |
{} |
{988.0: 112, 262.0: 100} |
... |
645.54717 |
645.54717 |
2021-01-28 |
| 342615 |
2021-01-28 16:59:06.793899 |
{} |
{988.0: 112} |
... |
988.00000 |
988.00000 |
2021-01-28 |
| 342616 |
2021-01-28 16:59:06.797557 |
{} |
{988.0: 112, 262.0: 100} |
... |
645.54717 |
645.54717 |
2021-01-28 |
| 342617 |
2021-01-28 16:59:08.241677 |
{} |
{988.0: 112} |
... |
988.00000 |
988.00000 |
2021-01-28 |
342023 rows × 9 columns
#Get just the trades from the original dataframe
trade_df = json_df.loc[json_df['type']=='trade_report',:]
trade_df
|
type |
timestamp |
symbol |
... |
trade_id |
side |
date |
| 0 |
trade_report |
2021-01-22 08:02:31.215300 |
GME |
... |
2.067095e+06 |
X |
2021-01-22 |
| 1 |
trade_report |
2021-01-22 08:08:14.700160 |
GME |
... |
2.639914e+06 |
X |
2021-01-22 |
| 2 |
trade_report |
2021-01-22 08:11:52.294756 |
GME |
... |
3.063945e+06 |
X |
2021-01-22 |
| 3 |
trade_report |
2021-01-22 08:18:22.383301 |
GME |
... |
3.669247e+06 |
X |
2021-01-22 |
| 4 |
trade_report |
2021-01-22 08:19:09.002873 |
GME |
... |
3.739332e+06 |
X |
2021-01-22 |
| ... |
... |
... |
... |
... |
... |
... |
... |
| 490754 |
trade_report |
2021-01-28 16:54:05.085758 |
GME |
... |
3.026635e+09 |
X |
2021-01-28 |
| 490755 |
trade_report |
2021-01-28 16:54:05.592947 |
GME |
... |
3.026636e+09 |
X |
2021-01-28 |
| 490759 |
trade_report |
2021-01-28 16:56:26.486624 |
GME |
... |
3.026962e+09 |
X |
2021-01-28 |
| 490762 |
trade_report |
2021-01-28 16:59:06.793899 |
GME |
... |
3.027353e+09 |
X |
2021-01-28 |
| 490765 |
trade_report |
2021-01-28 16:59:08.241677 |
GME |
... |
3.027362e+09 |
X |
2021-01-28 |
134712 rows × 9 columns
#Combine the two dataframes so that each trade report is associated
#with a set of 'prevailing quotes.'
plot_df = (pd.concat([quote_df,trade_df],ignore_index=False)
.fillna({'type':'quote','symbol':'GME'})
.sort_values(['timestamp','type'])
.drop('side',axis=1)
.ffill()
.dropna(subset=['best_bid','best_ofr'])
.astype({'date':'string'}))
plot_df = plot_df.loc[plot_df['type']=='trade_report',:]
plot_df['midpt'] = (plot_df['best_bid']+plot_df['best_ofr'])/2
display(plot_df.loc[:,['timestamp','price','best_bid','best_ofr','midpt']])
|
timestamp |
price |
best_bid |
best_ofr |
midpt |
| 27 |
2021-01-22 08:53:12.048708 |
43.85 |
43.55 |
43.66 |
43.605 |
| 28 |
2021-01-22 08:54:24.385311 |
44.09 |
43.55 |
43.66 |
43.605 |
| 31 |
2021-01-22 08:57:36.794024 |
44.22 |
43.55 |
44.30 |
43.925 |
| 32 |
2021-01-22 08:57:39.852032 |
44.20 |
43.55 |
44.30 |
43.925 |
| 33 |
2021-01-22 08:57:41.975903 |
44.28 |
43.55 |
44.30 |
43.925 |
| ... |
... |
... |
... |
... |
... |
| 490754 |
2021-01-28 16:54:05.085758 |
257.58 |
257.58 |
988.00 |
622.790 |
| 490755 |
2021-01-28 16:54:05.592947 |
257.58 |
257.58 |
988.00 |
622.790 |
| 490759 |
2021-01-28 16:56:26.486624 |
264.00 |
257.58 |
988.00 |
622.790 |
| 490762 |
2021-01-28 16:59:06.793899 |
262.00 |
257.58 |
988.00 |
622.790 |
| 490765 |
2021-01-28 16:59:08.241677 |
262.00 |
257.58 |
988.00 |
622.790 |
134697 rows × 5 columns
I’d like to make a special note that concatenating two dataframes and then sorting them and using the forward fill option to fill in a prevailing value is a nice trick. The logically obvious way to do this is you only wanted the trades is to do some kind of time range join, which would take waaaaaay longer. Concatenating is fast, sorting is fast, forward filling is fast, and throwing away a ton of rows is fast. Joins are slow.
18 Oct 2021
Market Manipulation
I’m moving on from my time at FinRA, and thought it was a good time to reflect on some of the interesting things I’ve learned about how the stock market works.
Super Basic Market Structure: The Order Book
Every time a broker wants to make a trade, they put an an order (I want to buy 200 shares of stock A for $100 each). Until the order is canceled, or a seller is found that is willing to meet the buyer at their price, the order ‘rests on the book.’ Most heavily traded stocks do not have orders that rest long, unless the number of shares is exceptionally large or the price point is completely unreasonable.
| time |
side |
quantity |
price |
| 9:30 |
B |
100 |
45.00 |
| 9:31 |
B |
200 |
45.01 |
| 9:31 |
S |
100 |
45.05 |
| 9:32 |
B |
50 |
45.00 |
Pretty Basic Market Structure: Reg NMS
Reg NMS is an SEC rule that created the ‘National Market System.’ Stock exchanges became publicly traded companies and firms routing orders to the markets gained the obligation of ‘best execution.’ What exactly this means is fairly complicated, because it’s not just about price. But a simple explanation is that if NYSE’s order book has a stock selling at 45.05 and NASDAQ’s order book has the same stock selling at 45.04, your buy order with Charles Schwab account needs to take the shares at NASDAQ at the better price. Complications set in when the number of shares in your order are not all available at the same price.
The regulation is intended to keep brokerages from giving customers a bad deal. A couple more important terms are ‘best bid’ and ‘best offer’ that refer to the highest price anyone is willing to buy and the lowest price anyone is willing to sell respectively.
What is market manipulation?
This is relatively subjective, but the crux of many types of manipulation is ‘are you engaging with the market in good faith?’ In other words, are you placing orders onto the books because you would like to buy a stock at the price you stated (or at least hope someone will sell to you at that price), or are you sending false signals to manipulate at what prices people are willing to buy/sell.
Here is an example of a sequence of orders that includes a manipulator. The manipulator enters many sell orders to entice the other sellers to lower their prices. Keep in mind, most people can’t see that righthand column.
| time |
side |
quantity |
price |
firm |
| 9:30 |
B |
100 |
45.00 |
A |
| 9:31 |
B |
200 |
45.01 |
A |
| 9:31 |
S |
100 |
45.05 |
B |
| 9:32 |
B |
50 |
45.00 |
A |
| 9:33 |
S |
100 |
45.04 |
Manip |
| 9:33 |
S |
200 |
45.04 |
Manip |
| 9:33 |
S |
500 |
45.03 |
Manip |
| 9:33 |
S |
100 |
45.05 |
Manip |
| 9:33 |
S |
400 |
45.03 |
Manip |
| 9:33 |
S |
100 |
45.03 |
Manip |
| 9:34 |
S |
600 |
45.02 |
IllBite |
| 9:34 |
B |
600 |
45.02 |
Manip |
I made up a more complicated example as a .csv file, let’s see how it looks when plotted.
import pandas as pd
manip_df = pd.read_csv('example_manip.csv',
sep=',',
header=1)
display(manip_df)
|
Time |
Firm |
Side |
Price |
Quantity |
Order ID |
Action |
NBO |
NBB |
| 0 |
10/13/2021 9:31:00.0 AM |
Firm S |
S |
50.05 |
1000 |
1 |
Order |
50.05 |
49.95 |
| 1 |
10/13/2021 9:31:00.0 AM |
Firm B |
B |
49.95 |
1000 |
2 |
Order |
50.05 |
49.95 |
| 2 |
10/13/2021 9:31:00.2 AM |
Firm M |
S |
50.04 |
200 |
3 |
Order |
50.04 |
49.95 |
| 3 |
10/13/2021 9:31:01.5 AM |
Firm M |
S |
50.03 |
100 |
4 |
Order |
50.03 |
49.95 |
| 4 |
10/13/2021 9:31:03.0 AM |
Firm M |
S |
50.03 |
100 |
4 |
Cancel |
50.04 |
49.95 |
| 5 |
10/13/2021 9:31:03.2 AM |
Firm M |
S |
50.01 |
150 |
5 |
Order |
50.01 |
49.95 |
| 6 |
10/13/2021 9:31:03.3 AM |
Firm M |
S |
50.00 |
300 |
6 |
Order |
50.00 |
49.95 |
| 7 |
10/13/2021 9:31:04.5 AM |
Firm M |
S |
50.03 |
500 |
7 |
Order |
50.00 |
49.95 |
| 8 |
10/13/2021 9:31:04.6 AM |
Firm M |
S |
50.04 |
200 |
8 |
Order |
50.00 |
49.95 |
| 9 |
10/13/2021 9:31:05.7 AM |
Firm M |
S |
49.96 |
200 |
9 |
Order |
49.96 |
49.95 |
| 10 |
10/13/2021 9:31:05.7 AM |
Firm M |
S |
50.04 |
100 |
8 |
Cancel |
49.96 |
49.95 |
| 11 |
10/13/2021 9:31:05.5 AM |
Firm M |
S |
49.97 |
600 |
10 |
Order |
49.96 |
49.95 |
| 12 |
10/13/2021 9:31:05.6 AM |
Firm M |
S |
49.96 |
300 |
11 |
Order |
49.96 |
49.95 |
| 13 |
10/13/2021 9:31:06.8 AM |
Firm S |
S |
49.96 |
1000 |
12 |
Order |
49.96 |
49.95 |
| 14 |
10/13/2021 9:31:06.9 AM |
Firm M |
B |
49.96 |
1000 |
13 |
Order |
49.96 |
49.96 |
| 15 |
10/13/2021 9:31:06.9 AM |
Firm M |
X |
49.96 |
1000 |
12_13 |
Trade |
49.96 |
49.95 |
| 16 |
10/13/2021 9:31:07.0 AM |
Firm M |
S |
49.96 |
300 |
11 |
Cancel |
49.97 |
49.95 |
| 17 |
10/13/2021 9:31:07.0 AM |
Firm M |
S |
49.97 |
600 |
10 |
Cancel |
50.00 |
49.95 |
def string_to_timestamp(string):
timestamp_components = string.split(' ')
timestamp_string = 'T'.join(timestamp_components[:2])
return pd.to_datetime(timestamp_string)
manip_df['timestamp'] = manip_df['Time'].apply(string_to_timestamp)
import plotly.express as px
import plotly.graph_objects as go
fig = px.scatter(manip_df,
x='timestamp',
y='Price',
symbol='Action',
color='Firm',
size='Quantity',
color_discrete_sequence=['goldenrod','aqua','lightgreen'],
opacity=.5)
fig.add_trace(go.Scatter(x=manip_df['timestamp'],
y=manip_df['NBO'],
mode='lines',
line_color='magenta',
line_shape='hv',
name='Best Bid'))
fig.add_trace(go.Scatter(x=manip_df['timestamp'],
y=manip_df['NBB'],
mode='lines',
line_color='cyan',
line_shape='hv',
name='Best Offer'))
fig.update_layout(template='plotly_dark');
fig.write_html("manip_example.html")
fig.show()
In this example a Firm S is selling the stock and a Firm B is buying. The ‘spread’ or difference between best bid and best ask is 10 cents. The midpoint price, or what is often referred to as the ‘price’ if buyers and sellers met in the middle is \$50.
A Firm M would like to buy this stock, but they would like to buy closer to the bid than the ask, or even closer to the bid than the midpoint. What Firm M does is enter a bunch of sell orders to make it look like a lot of people suddenly want to sell this stock. Keep in mind that Firm S and Firm B cannot see who is putting in the orders, just that there are suddenly new orders at several different price points.
The orders are placed so that the best offer becomes progressively lower. Firm S, noting the sell pressure, panics that they may get stuck with an even lower price if they wait around for their original order of \$50.05, and so they put an offer of $49.96, or the current best offer. Now the spread is just a penny, but Firm M won’t go any lower, because their sell order would trade with Firm B, and remember they don’t actually want to sell the stock.
When Firm M sees a sell order at a favorable price appear, they immediately put in a buy order and trade with Firm S, and then cancel all their other orders. They may even place some cancels beforehand to make the whole thing look more believable. Ultimately, Firm M will pull out quickly since they do not actually want to trade with anyone buying stock at their artificially low prices. The more believable they make their ‘non-bona fide’ quotes, the more real risk they take on. At some point, if their quotes could not be distinguished from normal activity, they would probably be trading against buyers looking for good prices and end up acting like a real trader in spite of themselves.
Though in this example Firm M got a 9 cent improvement on their buy order of 1000 shares, they netted \$90, which doesn’t sound like much, especially considering they put nearly \$50,000 on the line to do it. However, for one thing, they can turn around and reverse the whole scheme, netting \$90 the other way and keep going back and forth. For another thing, Firm M accomplished this in 7s. Though this is fabricated data, 7s is actually a pretty long time in the stock market, so it is a realistic time scale for a manipulation like this. The wage for this scheme is then \$46,285.71 per hour.
The less realistic aspect of the data is the large spread of 10 cents, though it’s not unreasonable for a lightly traded stock. A lightly traded stock would also have a smaller number of firms trading, allowing Firm M to dominate the market more easily. Firm M would probably have to jump around different symbols since traders generally notice when their fills aren’t going so hot and may stop treating Firm M’s orders as believable, even though they wouldn’t be able to prove that all the orders were from the same party.
In short, if Firm M really was trying to repeat this back and forth constantly all day, someone would notice, and may report them to FinRA. However, you can see that someone can make very good money by cheating. I hear traders who don’t cheat also make very good money, so I’d reccommend the honest strategy.
As a final plug for honesty, just imagine if absolutely everyone was trading purely in a ‘zero-sum’ game mentality where the point is to take as much money for yourself as possible from everyone else’s pile. If there is ‘a point’ to the capital markets it is that companies that have a higher chance of succeeding, or that would benefit the most from a capital infusion, get money to grow their business. If no one is thinking about the value of owning a stock, collecting dividends, voting with shareholders, collecting premiums from those who would like to borrow the stock in short and options trades; then any rationality that may exist in the market is gone. Maybe I’m holding on to too much belief in markets as price discovery engines, but without fair markets where traders can basically trust the intentions of the other participants, our 401k’s are just funding a money game that produces little to no value.
18 Sep 2021
Stardew Valley Grange Display - Application
Now that I’ve got an ‘ideal’ display made up, and I made lofty claims that I had a lot of it on hand, let’s see the reality. I want to look at my save file to see what 8 point items I have lying around and which I need to actually aquire. In my current game, I just got the letter from the mayor saying the fair is in one week, so if I bother to figure this out now, I’ve got 125 points in the bag.
I’ll be using lxml’s etree (element tree) to navigate the xml of the save file. I’ve put variables related to my save file’s location into a notebook. I’m not too concerned about you all knowing my Farm’s name and id, but it’s a nice trick for other times when I might want to avoid showing various paths/names/etc from my computer.
You now have access to the variables farm_name, and farm_path
#import necessary packages
from IPython.display import display, HTML
import pandas as pd
from lxml import etree as ET
#open and parse file
with open(farm_path) as file:
tree = ET.parse(file)
root = tree.getroot()
#This is an example of the xml containing information about an item
print(ET.tostring(root.find('player').find('items').findall('Item')[12],pretty_print=True).decode())
<Item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="Object">
<isLostItem>false</isLostItem>
<category>-16</category>
<hasBeenInInventory>true</hasBeenInInventory>
<name>Clay</name>
<parentSheetIndex>330</parentSheetIndex>
<specialItem>false</specialItem>
<SpecialVariable>0</SpecialVariable>
<DisplayName>Clay</DisplayName>
<Name>Clay</Name>
<Stack>1</Stack>
<tileLocation>
<X>0</X>
<Y>0</Y>
</tileLocation>
<owner>0</owner>
<type>Basic</type>
<canBeSetDown>true</canBeSetDown>
<canBeGrabbed>true</canBeGrabbed>
<isHoedirt>false</isHoedirt>
<isSpawnedObject>false</isSpawnedObject>
<questItem>false</questItem>
<questId>0</questId>
<isOn>true</isOn>
<fragility>0</fragility>
<price>20</price>
<edibility>-300</edibility>
<stack>1</stack>
<quality>0</quality>
<bigCraftable>false</bigCraftable>
<setOutdoors>false</setOutdoors>
<setIndoors>false</setIndoors>
<readyForHarvest>false</readyForHarvest>
<showNextIndex>false</showNextIndex>
<flipped>true</flipped>
<hasBeenPickedUpByFarmer>true</hasBeenPickedUpByFarmer>
<isRecipe>false</isRecipe>
<isLamp>false</isLamp>
<minutesUntilReady>0</minutesUntilReady>
<boundingBox>
<X>0</X>
<Y>0</Y>
<Width>64</Width>
<Height>64</Height>
<Location>
<X>0</X>
<Y>0</Y>
</Location>
<Size>
<X>64</X>
<Y>64</Y>
</Size>
</boundingBox>
<scale>
<X>0</X>
<Y>0</Y>
</scale>
<uses>0</uses>
<preservedParentSheetIndex>0</preservedParentSheetIndex>
<destroyOvernight>false</destroyOvernight>
</Item>
#helper function to flatten lists of xml elements
def unnest_lists(list_of_lists):
from functools import reduce
return reduce(lambda x,y: x+y, list_of_lists,[])
#recursively finds all tags in the tags list in the order they appear, returns a list of elements
def find_tags(parents,tag_list):
if type(parents) is not list:
parents = [parents]
#base case
if len(tag_list) == 0:
return [[child for child in parent] for parent in parents]
#recursive case
else:
tag = tag_list.pop(0)
elements = unnest_lists([element.findall(tag) for element in parents])
return find_tags(elements,tag_list)
#reads the text within the tags listed in tags, returns None if there is no text within a tag
def read_tags(element_list,tags):
record = []
for element in element_list:
for tag in tags:
if element.tag == tag:
text = element.text
record.append(text)
if len(record)<len(tags):
diff = len(tags)-len(record)
record += diff*[None]
return record
#displays the tables the way I'd like
def display_table(table_df):
return display(HTML(table_df.to_html(escape=False,index=False))) #helper function to make it display like I want
By general exploration, I found three relevant locations for the items I’m looking for: my active inventory, chests and mini-fridges, and my main fridge. The large fridge that comes with the house is apparently separate from the other chests. I also found that I could not use the object attribute ‘chest’ to identify relevant chests since this is an attribute for the shipping crate and mini-shipping crate.
#All the objects in all the game locations (Farm, Bus Stop, Ginger Island, etc)
Objects = find_tags(root,['locations','GameLocation','objects','item','value','Object'])
#Names of all the objects
Object_Names = unnest_lists([[element.text for element in Object if element.tag == 'DisplayName'] for Object in Objects])
#Narrow in on container objects
containers = ['Chest','Mini-Fridge','Junimo Chest','Stone Chest']
isContainerObject = [container in containers for container in Object_Names]
Container_Objects = unnest_lists([[Object for Object in Objects[i] if Object.tag=='items'] for i in range(len(Objects)) if isContainerObject[i]])
element_lists = find_tags(Container_Objects,['Item'])
#All the items in all your chests, stone chests, mini-fridges, and junimo chests
items_in_chests = [read_tags(element_list,['Name','price','stack','quality']) for element_list in element_lists]
#These are simpler because items falling under the <player><items>...</items></player>
#pattern don't include a bunch of items like 'tables' and 'tv'. The same is true for the fridge items
#All the items you're holding right now
active_inventory = find_tags(root,['player','items'])
items_in_inventory = [[read_tags(item,['Name','price','stack','quality']) for item in items] for items in active_inventory]
#Turns out your fridge is its own thing and did not make the cut in chest
fridge = find_tags(root,['locations','GameLocation','fridge','items'])
items_in_fridge = [[read_tags(item,['Name','price','stack','quality']) for item in items] for items in fridge]
#Create three dataframes from our list of items
in_chests = pd.DataFrame.from_records(items_in_chests,columns = ['Name','Price','Stack','Quality'])
on_hand = pd.DataFrame.from_records(items_in_inventory[0],columns = ['Name','Price','Stack','Quality'])
in_fridge = pd.DataFrame.from_records(items_in_fridge[0],columns = ['Name','Price','Stack','Quality'])
#Concatenate the dataframes along the row axis, drop empty rows, and set the data types for each column
#Additionally, I'd like the 'stack' column to add together the total amount of each item I have in all chests
all_items = (pd.concat([in_chests,on_hand,in_fridge],axis = 0)
.dropna()#drop rows with missing columns
.astype({'Name':str,'Price':int,'Stack':int,'Quality':str})#force correct types on the columns
.groupby(['Name','Price','Quality'])#groupby non-stack columns
.sum()#sum the stacks
.reset_index())#get the groupby columns back out of the index
display_table(all_items.loc[1:200:20,:])#Let's look at a sampling of the dataframe
| Name |
Price |
Quality |
Stack |
| Algae Soup |
100 |
0 |
2 |
| Blue Jazz |
50 |
1 |
2 |
| Clay |
20 |
0 |
525 |
| Crayfish |
75 |
0 |
4 |
| Fall Seeds |
45 |
0 |
30 |
| Green Algae |
15 |
0 |
157 |
| Lucky Lunch |
250 |
0 |
4 |
| Oak Resin |
150 |
0 |
47 |
| Pumpkin Pie |
385 |
0 |
1 |
| Rice Pudding |
260 |
0 |
1 |
#I now need the dataframe I created earlier
%run Grange_Display.ipynb
display_table(grange_df.head())
| Item |
category |
Price |
Quality |
Points |
| Duck Egg |
Animal Products |
95g |
Base |
3 |
| Duck Feather |
Animal Products |
250g |
Base |
4 |
| Egg |
Animal Products |
50g |
Base |
2 |
| Brown Egg |
Animal Products |
50g |
Base |
2 |
| Goat Milk |
Animal Products |
225g |
Base |
4 |
So now I have a dataframe with point values for various items, and a dataframe of items I actually have. I would like to join the dataframes to find a list of 8 point items that I actually have. Since the quality columns are formatted differently in the two dataframes, I’ll have to fix that.
#Note there are some differences in how the columns are labeled,
#most importantly quality is numeric in my save file but a string on the wiki
quality_dict = {'0':'Base','1':'Silver','2':'Gold','4':'Iridium'}#map between the two formats
all_items['Quality_str'] = all_items['Quality'].apply(lambda x: quality_dict[x])#relabel my save file quality
#Create a join between the two dataframes, drop columns from join that would be duplicated
my_grange_items = grange_df.merge(all_items.drop(['Price','Quality'],axis=1),
left_on=['Item','Quality'],
right_on=['Name','Quality_str'],
how='inner')#Do an inner join
#display 8 point items I have on hand
display_table(my_grange_items.loc[my_grange_items.Points == 8,['Item','category','Price','Stack']])
| Item |
category |
Price |
Stack |
| Large Milk |
Animal Products |
190g |
6 |
| Rabbit's Foot |
Animal Products |
565g |
3 |
| Cheese |
Artisan Goods |
230g |
1 |
| Flounder |
Fish |
100g |
1 |
| Walleye |
Fish |
105g |
1 |
| Beet |
Vegetables |
100g |
1 |
I can see that I’ve got 4/6 categories with 8 point items, and sufficient iridium quality Large Milk to pad out the display to 9 items. I’m missing forage and fruit. I earlier identified coconuts and certain mushrooms as an easy way to get these.
And sure enough, I threw in an iridium quality chanterelle and coconut and got 125 points on my grange display!

07 Sep 2021
The Stardew Valley Grange Display - Analysis
According to the Stardew Valley wiki, the grange display’s score is calculated according to the formula:
14 + (# of items) + 5*(# of categories - max of 6) + (individual item scores)
So if you had 9 items in the display representing 9 categories, and 3 items had a score of 5, 4 items had a score of 7, and 2 had a score of 8 the final score would be:
14 + 9 + 5*6 + (5+5+5+7+7+7+7+8+8) = 112
To get a perfect score, you need 9 items representing at least 6 categories that are all 8 point items:
14 + 9 + 5*6 + (8+8+8+8+8+8+8+8+8) = 125
So we really need to find items that are worth 8 points. Some items are ineligible and get 0 points, they are marked on the table with a dash.
This table shows us the max score available in each of the categories. Notice that not every category has an 8 point item! Since there are still 6 categories available with 8 point items, we just won’t use ‘cooking’ or ‘Minerals’.
#Find categories with no 8 point entries
grange_df['Points'] = grange_df['Points'].apply(lambda x: 0 if x == '—' else int(x))
max_in_category = grange_df.groupby('category').Points.max()
display(max_in_category)
category
Animal Products 8
Artisan Goods 8
Cooking 6
Fish 8
Foraging, Flowers, and Tree Saps 8
Fruits 8
Minerals 6
Vegetables 8
Name: Points, dtype: int64
It’s also important to note that not every item in the game is actually in the wiki table, for items outside the tables there is a lookup table using sell price and quality to determine points. Here, I’m using the ‘tables’ object from the previous post.
import re
def strip_nonnumeric(string):
return re.sub("[^0-9]", "", string)
price_df = parse_table(tables[11],get_heading=False)
price_df['Low Price'] = price_df['Sell Price'].apply(strip_nonnumeric)
price_df['High Price'] = price_df['Low Price'].shift(-1)
price_df = price_df.drop('Sell Price',axis = 1)
price_df.columns = ['Base','Silver','Gold','Iridium','Low Price','High Price']
display_table(price_df.loc[:,['Low Price','High Price','Base','Silver','Gold','Iridium']].fillna('Inf'))
| Low Price |
High Price |
Base |
Silver |
Gold |
Iridium |
| 0 |
20 |
1 |
2 |
3 |
5 |
| 20 |
90 |
2 |
3 |
4 |
6 |
| 90 |
200 |
3 |
4 |
5 |
7 |
| 200 |
300 |
4 |
5 |
6 |
8 |
| 300 |
400 |
5 |
6 |
6 |
8 |
| 400 |
Inf |
6 |
6 |
6 |
8 |
Professions
There are several profession bonuses available in the game, and most grange display calculators take these into account, but I’m narrowing my focus to items that lead to maximum scores. So I bet I can toss a bunch of these out, by finding profession bonuses that will never result in an 8 point item. To do this, let’s check out the table for calculating points:
Spring Onion Mastery
Spring Onions have a sell price of 8g, iridium quality doubles this to be 16g, and with spring onion mastery will be 16g x 5 = 80g. That’s still only 6 points, so this won’t help us push anything up to 8.
Bear Knowledge
I love the Bear, but bear knowledge is similarly pumping up pretty low priced items. Salmonberries are even cheaper than spring onions (5g), and Bear Knowledge only gives a x3, so those will never be an 8. Blackberries are 20g, Iridium quality is 40g, and so with bear knowledge, they sell for 120g, which is still only 7 points.
Blacksmith, Gemologist, and Tapper
These are bonuses on items that can never be iridium quality (bars, minerals, and gems), so these can never be 8 points (see table).
Tiller, Artisan, Rancher, Fisher, and Angler
These all require some actual calculation. We’re looking for 8 point items, and so we can narrow our focus to iridium quality.
- Tiller will apply to vegetables, non-foraged fruits, and flowers.
- Artisan will apply to artisan goods
- Rancher will apply to animal products
- Fisher and Angler will apply to Fish
Though some items within a category will be unaffected by these bonuses (e.g. flowers in the Foraging Flowers and Tree Saps category is the only one affected by tiller) there is no cross-category bonuses, and only one category with two (Fish).
Inspecting the items that are in the table, none of the profession bonuses actually boost an item enough to make an item an 8 that was not already an 8.
import re
def strip_nonnumeric(string):
return re.sub("[^0-9]", "", string)
def assign_points(price):
if price >= 200:
return 8
elif price >= 90:
return 7
elif price >= 20:
return 6
elif price >= 0:
return 5
else:
return '—'
bonus_dict = {'Animal Products':1.2,
'Artisan Goods':1.4,
'Cooking':1,
'Fish (Fishing)':1.25,
'Fish (Angler)':1.5,
'Foraging, Flowers, and Tree Saps':1.1,
'Fruits':1.1,
'Minerals':1,
'Vegetables':1.1}
df = grange_df.loc[(grange_df['Quality']=='Iridium')&(grange_df['Points']!='—'),:]
fishing_df = df.loc[df['category']=='Fish',:].copy()
fishing_df['category'] = 'Fish (Fishing)'
angler_df = df.loc[df['category']=='Fish',:].copy()
angler_df['category'] = 'Fish (Angler)'
df = pd.concat([df.loc[df['category']!='Fish',:].copy(),fishing_df,angler_df], axis = 0)
#df['Points'] = df['Points'].apply(lambda x: 0 if x == '—' else int(x))
df['Price'] = df['Price'].apply(strip_nonnumeric).astype(int)
df['Price w/ Bonus'] = (df['Price']*df['category'].apply(lambda x: bonus_dict[x])).astype(int)
df['Points w/ Bonus'] = df['Price w/ Bonus'].apply(assign_points)
boost_filter = (df['Points'].astype(int) < 8)&(df['Points w/ Bonus'] == 8)
print("Items that were less than 8 points and were boosted to 8 by profession bonuses:")
display_table(df.loc[boost_filter,:])
Items that were less than 8 points and were boosted to 8 by profession bonuses:
| Item |
category |
Price |
Quality |
Points |
Price w/ Bonus |
Points w/ Bonus |
In other words, none. Perhaps this is no longer true if I include items outside the wiki tables, but that’s for the next post. And if you want to calculate point values for an arbitrary display, the bonuses should obviously be included.
Now I’d like to look at the cheapest, easiest way to get 8 point items in each category. For now, I’ll try looking at the items with the lowest sell price in each category.
high_score_df = grange_df.loc[(grange_df['Quality']=='Iridium')&(grange_df['Points']=='8'),:]
high_score_df['Price'] = high_score_df.Price.apply(strip_nonnumeric).astype(int)
summary_df = high_score_df.groupby('category').agg({'Item':'count','Price':min}).rename(columns={'Item':'Total Items','Price':'Minimum Sell Price'}).reset_index()
min_priced_items_df = (high_score_df
.merge(summary_df.loc[:,['category','Minimum Sell Price']], on = ['category']))
display_table(min_priced_items_df.loc[min_priced_items_df['Price'] == min_priced_items_df['Minimum Sell Price'],:])
<ipython-input-89-f0dd172bd0dc>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
high_score_df['Price'] = high_score_df.Price.apply(strip_nonnumeric).astype(int)
| Item |
category |
Price |
Quality |
Points |
Minimum Sell Price |
| Milk |
Animal Products |
125 |
Iridium |
8 |
125 |
| Wine (Hot Pepper) |
Artisan Goods |
120 |
Iridium |
8 |
120 |
| Dorado |
Fish |
100 |
Iridium |
8 |
100 |
| Flounder |
Fish |
100 |
Iridium |
8 |
100 |
| Largemouth Bass |
Fish |
100 |
Iridium |
8 |
100 |
| Lionfish |
Fish |
100 |
Iridium |
8 |
100 |
| Midnight Squid |
Fish |
100 |
Iridium |
8 |
100 |
| Pike |
Fish |
100 |
Iridium |
8 |
100 |
| Slimejack |
Fish |
100 |
Iridium |
8 |
100 |
| Tuna |
Fish |
100 |
Iridium |
8 |
100 |
| Snow Yam |
Foraging, Flowers, and Tree Saps |
100 |
Iridium |
8 |
100 |
| Apple |
Fruits |
100 |
Iridium |
8 |
100 |
| Coconut |
Fruits |
100 |
Iridium |
8 |
100 |
| Orange |
Fruits |
100 |
Iridium |
8 |
100 |
| Beet |
Vegetables |
100 |
Iridium |
8 |
100 |
| Taro Root |
Vegetables |
100 |
Iridium |
8 |
100 |
- Animal Products: Milk
- Artisan Goods: Hot Pepper Wine
- Fish: Dorado, Flounder, Largemouth Bass, Lionfish, Midnight Squid, Pike, Slimejack, Tuna
- Foraging, Flowers, and Tree Saps: Snow Yam
- Fruits: Apple, Coconut, Orange
- Vegetable: Beet, Taro Root
A lot of these are actually pretty easy to get. Iridium quality milk you can get by the pallet even in Year 1 with a cow at full hearts. Dorado, Flounder, and Lionfish are pretty easy fish to get a perfect catch, though you need Ginger Island unlocked to get the Lionfish. Iridium Coconuts can just be picked up off the ground in the Calico Desert provided you have the Botanist profession (and you really should).
A few of these, though having a low sell price, do not meet my requirement of ‘easy’. Hot pepper wine requires a fully upgraded house, and a lot of time (or fairy dust) to get to iridium quality. Snow Yams do not benefit from botanist since you dig them from artifact spots, therefore I believe the only way to get an iridium snow yam is to use deluxe fertilizer and winter seeds (and that’s two RNG’s I don’t need). Beets and Taro Root would both require deluxe fertilizer, taro root has the advantage of not requiring watering, though by the time you have deluxe fertizlizer and Ginger island unlocked, you should have iridium sprinklers coming out your ears. Beets grow faster, so I can roll the dice more on less deluxe fertizer in the same amount of time. Since there is no other way to get an iridium vegetable, I’ll stick with the beets, the other two will need replacing. Let’s look at the fuller list in these two categories.
category_filter = high_score_df['category'].isin(['Foraging, Flowers, and Tree Saps','Artisan Goods'])
category_filter &= high_score_df['Item'].str[:4] != 'Wine'
display_table(high_score_df.loc[category_filter,:].sort_values(['category','Price']))
| Item |
category |
Price |
Quality |
Points |
| Mayonnaise |
Artisan Goods |
190 |
Iridium |
8 |
| Beer |
Artisan Goods |
200 |
Iridium |
8 |
| Cheese |
Artisan Goods |
230 |
Iridium |
8 |
| Pale Ale |
Artisan Goods |
300 |
Iridium |
8 |
| Goat Cheese |
Artisan Goods |
400 |
Iridium |
8 |
| Snow Yam |
Foraging, Flowers, and Tree Saps |
100 |
Iridium |
8 |
| Poppy |
Foraging, Flowers, and Tree Saps |
140 |
Iridium |
8 |
| Morel |
Foraging, Flowers, and Tree Saps |
150 |
Iridium |
8 |
| Chanterelle |
Foraging, Flowers, and Tree Saps |
160 |
Iridium |
8 |
| Purple Mushroom |
Foraging, Flowers, and Tree Saps |
250 |
Iridium |
8 |
| Fairy Rose |
Foraging, Flowers, and Tree Saps |
290 |
Iridium |
8 |
| Magma Cap |
Foraging, Flowers, and Tree Saps |
400 |
Iridium |
8 |
Beer and Pale Ale have the same issue as wine. You can get gold quality cheese pretty easy and age it to iridium in about a week, so this is probably the way to go. If you have Ginger Island unlocked and really enjoy doing Professor Snail’s bidding, iridium quality ostrich eggs are a way to get iridium quality mayo without the fully upgraded house.
Poppys and Fairy Roses would also require deluxe fertilizer, and I’d like to minimize that, as Qi coins don’t grow on trees. It’s hardly the scarcest thing in the game though, and I just got one as a drop in the dangerous mines. The mushrooms benefit from Botanist, and Morel’s/Chanterelles are available in the secret woods. Magma Cap’s can be picked up in the volcano and Purple Mushrooms are available in Professor Snail’s mushroom cave.
So that leaves me with a fairly low-stress 125 point grange display:
- 4 Large Milk
- 1 Lionfish
- 1 Coconut
- 1 Cheese
- 1 Morel Mushroom
- 1 Beet
All Iridium quality. 14 pts for being me + 30 pts for 6 categories + 9 pts for 9 items + 9x8 pts for items. In my current playthrough, I have not unlocked the walnut room, so I will be stuck at 124 until I get that iridium beet. The rest of these items I have lying around in chests/fridges in Winter 2. My dream of a perfect Grange should be fulfilled in Year 3. I think someone who really made it a priority could get to this point by Year 2, but Year 1 is probably out of the question. Some people are very good at this game though, so I don’t want to say impossible.