Vivek Viswanathan is the Head of Research at Rayliant Global, a quantitative asset manager focused on generating alpha from investing in China and other inefficient emerging markets.
Our conversation circles around three primary topics. The first is the features that make China a particularly attractive market for quantitative investing and some of the challenges that accompany it. The second is Vish’s transition from a factor-based perspective to an unconstrained, characteristic-driven one. Finally, the critical role that machine learning plays in managing a characteristic-driven portfolio.
And at the end of the conversation we are left with a full picture of what it takes to be a successful, quantitative investor in China.
I hope you enjoy my conversation with Vivek Viswanathan.
Transcript
Corey Hoffstein 00:00
All right 321 Let’s go Hello and welcome everyone. I’m Corey Hoffstein. And this is flirting with models the podcast that pulls back the curtain to discover the human factor behind the quantitative strategy.
Narrator 00:19
Corey Hoffstein Is the co founder and chief investment officer of new found research due to industry regulations he will not discuss any of new found researches funds on this podcast all opinions expressed by podcast participants are solely their own opinion and do not reflect the opinion of newfound research. This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of newfound research may maintain positions in securities discussed in this podcast for more information is it think newfound.com.
Corey Hoffstein 00:50
This season is sponsored by simplify ETFs simplify seeks to help you modernize your portfolio with its innovative set of options based strategies. Full disclosure. Prior to simplify sponsoring the season, we had incorporated some of simplifies ETFs into our ETF model mandates here at New Found. If you’re interested in reading a brief case study about why and how visit simplify.us/flirting with models and stick around after the episode for an ongoing conversation about markets and convexity with the convexity Maven himself simplifies own Harley Bassman. The vague Viswanathan is the head of Research at ralien global, a quantitative asset manager focused on generating alpha from investing in China and other inefficient emerging markets. Our conversation circles around three primary topics. The first is the features that make China a particularly attractive market for quantitative investing and some of the challenges that accompany it. The second is vicious transition from a factor based perspective to an unconstrained characteristic driven one. Finally, the critical role that machine learning plays in managing a characteristic driven portfolio. And at the end of the conversation, we’re left with a full picture of what it takes to be a successful quantitative investor in China. I hope you enjoy my conversation with Vivek Visvanathan. Fish. Welcome to the show. Really excited to have you here. Thank you for joining me. Thank you. I got you right as you’re about to take a sip of water there. That’s what I like to do my guests catch me. Well, let’s begin for the guests who maybe haven’t heard of you before with a little bit of an intro and some background.
Vivek Viswanathan 02:36
Absolutely. So I went to the University of Chicago for undergraduate mentioned AB and MFE from UCLA and a finance PhD from UCI over the years, I worked very briefly at Morningstar in 2005, quickly fired for telling them they should change or get rid of their equity mutual fund rating system because funds that achieve five star status still underperforming index funds, but my morning wasn’t quite as kind as that when I sent it. So that may have contributed to the firing. After that I worked in research fluids and worked on equity factor research and cross sectional research and other asset classes. Then, in 2016, Jason Sue and I, along with a few others spun off the Asia arm of research affiliates to start rolling Global Advisors. And really our focus has been on China, primarily Chinese equities, but also quite a bit on Chinese commodity futures and bonds. And we’ve recently launched a China a shares ETF and the US are a YC. Add really, we employ more sophisticated quantitative models using alternative data, expected returns built from machine learning models and
Corey Hoffstein 03:39
optimization. So first question for you would be why so much focus on China specifically, what makes China particularly attractive to attack from a quantitative investing perspective?
Vivek Viswanathan 03:54
Yeah, that’s an excellent question. First, I want to draw some lines when I say that China’s a great market for quantitative investing. I’m specifically talking about China a shares which are listed on shore in mainland China. In Shanghai or Shenzhen, there are mainland Chinese stocks listed in Hong Kong, which are eight chairs, red chips, and P chips. And there are China ADRs listed in the US, we generally find that alpha comes from investor mistakes. So Chinese stocks listed in Hong Kong or the US are going to be reasonably efficient. It’s a stocks listed in Shanghai and Shenzhen that we believe have a lot of alpha. And just to emphasize this point, from a return correlation perspective. He shares are as different from he shares and ADRs as the US is from emerging markets. That’s how different it is. So do keep in mind we’re talking about onshore China, that’s China Asia’s so China, ah, hertz is a huge market, not as large as us, obviously, but it’s punching in a similar weight class. The US has a $47 trillion total market capitalization, while China a shares has a $13 trillion total market cap. In 2020, the US had a monthly average trading volume of 6.5 trillion doors, while China a shares had $2.4 trillion in monthly average trading. The reason why you want to find alpha in a huge market is that you want it to be liquid enough to trade and deep enough that the Alpha won’t disappear after you deploy a modicum of capital toward the strategies. You might be able to find a lot of alpha in a small market, but you can’t actually invest significant capital towards it. China air share a shares it luckily is huge. So the second thing you want to look for is whether other institutional investors are outperforming. Of course, if you look at large cap US equities, 90% of active managers underperform the market over any given tenure period. In China, the average equity mutual fund outperforms the market after costs despite having a 1.5% management fee. On average, the average international investor in China issues outperforms. In other words, stocks with high international holdings outperform those with low international holdings. So in the US, you need to be in the 90th percentile among fund managers just to perform as well as a benchmark in China being in the 90th percentile means delivering solid performance. But that’s the act of management in general, what about quant investing in particular? First, I’ll talk about this in context of factors. But you can beat the factor framework, but we’ll we’ll talk about that later. So if you invested in a standard basket of long short factors think McLean and Pontiff 2016 factors integrated into a long only portfolio. In the US, you’d earn a point three six information ratio from 2010 to 2020. In em, you would earn a point nine one information ratio in China, you would earn a 1.06 information ratio, just so this is clear. These are factors that were discovered in the United States, and we’re blindly implementing them in China and earning a 1.06 information ratio. That’s pretty incredible. If instead of looking at standard factors, you look at China’s specific factors using alternative data specific to China. Those earned an IRR of 1.43 since 2010, and that’s yet another source of alpha in China eight years and it makes up about a third of our excess returns. Now, for what it’s worth, all of those information ratios are pre transaction costs, you probably need to shave about point two off of those to get to post transaction cost numbers. So just to quickly sum up that long winded answer. We think China a shares is a good place to apply quant investing because it’s very liquid so can absorb a lot of capital flows, because other institutional investors tend outperform. So we have a sense there’s genuine alpha to be found. And because standard quant factors do well, and China’s specific quant factors do even better.
Corey Hoffstein 07:52
Can you expand a little bit on that notion of China’s specific quant factors, like maybe some examples of what those signals are?
Vivek Viswanathan 07:59
Absolutely. So the first one I want to discuss which we’re publishing a paper on in June is the H premium. I mentioned that a shares are listed in mainland China and Shanghai and Shenzhen and eight shares are listed in Hong Kong. Some companies actually do a list in both Mainland China and Hong Kong. That means that we have two identical shares with the same voting rights, same dividend rights listed on two different exchanges. Since the two shares can’t be converted into each other, this isn’t a pure arbitrage play or anything, but you would expect the prices to be relatively similar since of the same share on different exchanges. But the thing is, the prices aren’t even close on an equally weighted basis. Dual listed as shares are on average 75% More expensive than their age share counterparts. On a cap weighted basis. They’re on average 33% more expensive. And there are some stocks like the Chinese fashion company la Chapelle sounds French but it isn’t the Chinese which is 4.5 times more expensive than its eight share, while other more staid maybe boring stocks like ping on have an A share price that is effectively identical to its H share price. Now it turns out that because Hong Kong tends to have more sophisticated investors, a high ah premium does not bode well for the Asia because that means Hong Kong investors think that stock is overpriced and a low age premium means that Asia will perform well because Hong Kong investors think the stock is fairly priced or underpriced. Now if you overweight, low age premium stocks and underweight high age cream stocks, you can earn a point eight nine information ratio from that alone. So that’s pretty good for one sick. Next, let’s talk about pleasures. So in many countries, including the United States and China, you can use shares you hold in a publicly listed company as collateral for a loan. If you can’t pay back the loan, the bank sells the shares. This is what you do if you want to keep your shares but you might want money for or a house or a boat, or whatever. In China, directors and managers might pledge $50 million $100 million, or much more for a loan. Now, obviously, they’re not buying a boat with that money. So what are they doing? They’re lending the money back to the firm. So in China, if you’re not a state owned enterprise, it’s surprisingly hard to get a loan. And issuing bonds or equity requires a fairly lengthy regulatory process. But if shares are pledged as collateral, it’s much easier to get a loan from a bank. So managers will pledge your own shares as collateral, get a loan and lend that money back to the front, effectively doubling down on the front. That is a vote of confidence. They would only do that practically, if they believed in the firm. Now, it also suggests that they are not profitable enough or have enough cash to get money from traditional sources. So this signal only positively predicts if you control for profitability, and other variables. Now, if a firm as large total pledge shares after a long period of time, that suggests the firm can’t pay back the loan, and that’s a negative sign. So there are two signals to be happier. One is new pledge shares, which is a positive signal after controlling for other variables, and total pledges, which is a negative signal whether or not you control for other variables. So just to go over one more, I mentioned foreign investors outperforming the foreign holdings in various stocks are disclosed every day, just look in northbound connect holdings, which is a particular type of foreign holdings. If every month you overweight, high northbound connect holding stocks and underweight, low northbound connect holding stocks, then from July 2016, when the data start to now, you would have earned an information ratio of 1.92. That sounds incredible. And obviously, you shouldn’t assume that you would earn exactly that information ratio in the future. But it suggests you can get a tailwind from following the behavior of other smart money investors. By the way, one last thing, I am not the driver trying to schedule status ahead of China research Donald Ho, who works out of Hangzhou, China, I don’t want to take credit here we have an entire team in China driving this. In any case, those signals give some color about what’s so nice about China, Asia is there is so much data, there are so many mandatory releases for firms, many of those releases contain information that you can use to predict return. Much of the data is either on wind, which is a data provider in China, similar to Bloomberg, or in a well organized format online that can be easily scraped. And the signals I discussed, just begin to scratch the surface, there are so many more signals that one can build from the sheer volume of data that comes from these Chinese firms.
Corey Hoffstein 12:29
What I find somewhat contradictory here is you’re talking about disclosures and very frequent disclosures. It sounds like that is leading to an increased amount of information in the marketplace. And I think from an academic perspective, we would expect that to lead to a more efficient market. And you’re saying it actually, it doesn’t change the efficiency. In fact, you can find more inefficiencies with all this data. How do you sort of reconcile that fact.
Vivek Viswanathan 12:57
So about 80% of the volume in China, a shares is driven by retail trading. So practically, these folks aren’t reading these releases are crunching this data. They certainly aren’t building natural language processing algorithms or processes data faster than any human could read it. China has the perfect combination of things to generate alpha, a ton of data and a large segment of investors who are not fully or optimally utilizing that data. Of course, that will change over time. And as that happens, we need to find more data and build models that are capture that data. Fargo’s wall. Eventually, the market will become efficient and 90% of active managers will underperform just like in the US. Then we’ll pack our bags and tell everyone to buy a cat with the index fund. But we’re probably a few decades away from that.
Corey Hoffstein 13:43
from an outsider’s perspective, it seems like the regulatory regime in China can be quite fluid. What sort of challenges or even potential opportunities is this present for you as an investor,
Vivek Viswanathan 13:55
the regulations are constantly changing. The Chinese stock market is fairly young. The Shanghai Stock Exchange was founded in 1990. in Shenzhen right after that, so regulators are still triangulating on their preferred regulatory regime. So significant regulatory changes still occur with some regularity. Back in 2015, there was a stock market bubble and crash localized to China a shares. The Chinese regulators blame malicious short sellers, which obviously is not why the market crash, but we’ll whatever, let’s not get into that. And they’ve been most investors ability to lend shares, which effectively banned short sell. Now in June of 2020. They allowed securities lending again for most investors. And if you look at the graph of short selling, it looks like it’s at the very beginning of the escrow so it’s in its exponential growth phase. Now, the cross sectional dispersion of short song can presumably be turned into an IQ signal. Now, as you know, historically, on average, shares that are sold short more heavily have lower expected returns than those with lower short interest. But we haven’t done that research yet. So that’s it. New potential signal that came from regulatory change. Another recent regulatory change was the removal of the IPO PE restriction. So let me explain what that is. Back in 2014, the government created a cap on the price to earnings ratio of IPO firms at 23. This wasn’t even a formal rule. By the way, it was just an informal guidance that was followed perfectly. Now, this also wasn’t a secret you can read about it in the news. It’s just another interesting feature of Chinese markets. Some of the rules are not technically codified. But But neither are they a secret. The point behind this rule that cap IPO PE at 23 was to make sure that retail investors who love signing up for these IPOs can get a great deal and make a ton of money. Now as you can probably guess, because these IPOs are discounted, they are heavily oversubscribed, so you have to enter a lottery to get in. But it’s fantastic if you do get them. But before this rule was initiated, IPOs were underpriced by 10 or 15%. After this rule was implemented IPOs were under price by 200 to 400%. I got those numbers from Deng Sinclair and use recent working paper called redistribute the wealth, Chinese IPO market reforms, I want to make sure I credit them. Anyway, last year, they removed the 23 pe limit for a small subset of the market, the gym and star markets, which are small cap tech heavy boards, and they’ll probably eventually remove it for the market as a whole. So as long as this rule is in place, you probably want to subscribe to IPOs in the gym, and store markets is not nearly so compelling anymore. For what it’s worth, if you’re a foreign investor using Hong Kong Kinect to invest in China issues, which is the usual way to invest, you can’t even subscribe to IPOs. So this rule change doesn’t matter much us. But if you are a Kewpie investor or have a fund in China, you can subscribe to IPOs. And you probably should try. These are just two of the recent changes. Some of the other ones are delete price limits in the gym and store markets were increased from 10 to 20%. Recently, regulators are also getting more aggressive about D listing firms that engage in accounting malfeasance. But the key here overall is you really have to keep your ear to the ground when it comes to Chinese regulatory changes. One way to do so is to have a person on the ground, which is genuinely very helpful. But when it comes to regulations, at least you can generally just read the news and you should be okay.
Corey Hoffstein 17:15
So you spent a good part of your career, I think it was nine years at Research Affiliates, which was a firm that was a real innovator in the Smart beta space. I know since joining ralien, you sort of shed off that factor oriented view of the world. And I’m curious as to what led to that change.
Vivek Viswanathan 17:34
This one’s going to be a long and technical one. So forgive the soliloquy I’m about to give here. So let me clear out some potential semantic misunderstanding here. When I say don’t have a factor view of the world, I mean anomalies as linear factors. I believe that the equity market factor is an extremely useful tool, I believe in industries and countries drive the covariance matrix. So not having a factor view of the world is specifically not viewing anomalies as risk factors are linear univariant mappings to return. And I hope to make that claim that when you think about anomalies, you need not think about risk factors and about long short portfolios built on linear characteristic sorts. Those are useful for publishing papers and for heuristic understanding, but they are not useful for portfolio construction or for mapping how characteristics relate to expected returns. There are a few high level ideas I want to touch on. The first is where the idea of factors come from. Factors are an idea that stems from arbitrage pricing theory or a PT that the expected return of an asset is a linear combination of priced risks. And you might argue that even if you don’t believe in AP T with respect to all these anomaly factors, you can still believe that factors are a useful tool. I’ll address that later. But for now, I want to talk about why the characteristics that we see that predict return should not be considered priced risk factors. There are a few things we would expect to see if we believe things like value growth, profitability, cash flow, accruals, post earnings announcement drift, new sentiment momentum, whether those were priceless. So first, one that few people talk about is that we would expect to see products that sold high valuation, low profitability, negative earnings, surprise and every other negative factor tilt, you can imagine, because those would be great hedges for whatever latent risk investors are trying to avoid. If these anomalies are risky, it stands to reason that investors would want to hedge these risk would happily pay their counterparties to hedge these risks. But those products largely don’t exist. There’s no hedge related risk factor product that earns a negative expected return in exchange for risk hedging. No one is selling these supposed to insurance products to investors. Moreover, it no one has figured out what risks this multitude of factors is meant to price. What risk is profitability a proxy for what risk is conservative accounting approximate? What about earnings growth? What about momentum? No one really seems to know. Now certainly the case of value and size embodies some aspect of risk. Imagine you have two firms with exactly the same expected cash flows. But one is riskier than the other than the riskier one will have a higher discount rate, and will be small cap and deeper value. So if there are latent risks, they will probably show up at least in part in value and size. But it’s really difficult to make the case for the multitude of other signals. And of course, if factors were priced risk, we would expect to see covariances driving returns not characteristics, but they don’t that was found by Daniel and Titman in 1996, with respect to value, but it’s also been shown about most other anomalies as well. In fact, my thesis shows us with regard to the 97 McAleenan pontificators, at least 80 succeeds in Compute globally, its characteristics that drive return, it’s true that there have been attempts to recover this idea that if you squint hard enough covariances are driving returns. So in 2018, Gu Kellyanne shows used a variant of an auto encoder, and account for the predictability of returns in that encoder, and say, you can compress the information in characteristics and outperform other linear asset pricing models. And of course, you can because as we’ll discuss, the relationship between characteristics and returns is nonlinear. So if you put a nonlinear model against a few linear ones, you’re probably going to win out. But that doesn’t mean you found price risk, it just means that auto encoders are good at compressing information and nonlinear models outperform linear ones, the real test would be putting it up against a neural network that didn’t have a bottleneck layer in it. In other words, does the act of compressing your factor space, improve or hurt, your predictability? And from all of our analysis, compressing the factor space hurts, you really want that 100 plus signals helping your prediction. And yet another piece of evidence against the risk factor approach to understanding anomalies is how we construct covariance matrices. When we build covariance matrices, you might account for market industries, country’s principal components of return and admittedly size and value. But you wouldn’t account for post earnings announcement drift or operating profitability factors, they are not meaningful parts of the covariance structure. Now, why does this matter? Who cares whether anyone thinks factors are price risk? It matters, because if you believed that factors were priced risks, you would sit on one side of the trade forever expecting to get paid for taking that risk, you would expect that someone would happily pay you a premium to take on this factor risk. But it’s a Behavioral Anomaly, then you’re keenly aware that the market might become efficient to that signal. But before I talk about why factors are not even particularly useful as tools, I want to talk about the number of factors. So there’s this idea of a factor zoo brought up by john cochran that there are too many factors. And there’s a mystery here. And indeed, he’s right, there is a mystery. But the way much of the literature has tried to explain it is not correct. Way back when I read Harvey, Leo and choose and the cross section of expected return, and later Hoshi and James replicating anomalies. And if you read those papers, you might think that most cross sectional anomalies are data stupid. And the markets are more efficient than the anomaly literature suggests, or that most anomalies can be collapsed into other anomalies so that there are only a handful of anomalies out there. And by the way, I briefly believed all those things, right. So very briefly, though, like maybe a few months, some pretty quick empirical tests can refute them. So as it turns out, you really want those 100 Plus signals in your expected return model. The first bit of evidence for this is Jacobs and molars 2019 paper anomalies across the globe. They found that if you create cap weighted Long, short portfolios out of the 241 anomalies that have been found in the literature, and equally weighed them, that is equally with the factors, then you will earn significant excess returns and 38 out of 39 markets. And that’s significant at a point on one level, the odd one out happens to be turkey. Now let’s think about this for a second. These anomalies were discovered in the US, if they were the result of data stupid, then they would not produce excess returns in other markets, but they deliver excess returns on average and 38 out of 39 markets. Now, you might argue that the returns might be driven by only a handful of these two unimportant one anomalies, maybe five out of the 241. Let’s say that’s where my thesis comes in. I didn’t look at 241 anomalies, but I did look at the 86 out of 97 McAleenan pot of anomalies that can be calculated globally from 1995 to 2018, and 44 out of 86 of those are in a significant fama French three factor alpha, and that’s at a point oh five level two of those A’s exercise and value themselves, so they can’t produce a fama French three factor alpha. So 44 out of 84 Anomalies deliver a significant fama French three factor alpha globally, that is over half of the anomalies found in the literature, that’s too many significant anomalies for the result to be driven purely by luck. Now, you might ask, why does the fama French three factor alpha naught Why not just look at whether the returns are significant in their own right without looking at the fama French three factor alpha? The answer is that almost every signal is a quality signal. And quality signals as a whole tend to be negatively correlated with size and valid. AQR showed this with respect to size and their size matters. The idea holds for a variety of quality factors, even ones that come from alternative data sources. So, that is why the FF three factor alpha is so important, and why you will see so many anomalies with insignificant return but significant FF three factor alphas. Now, I want to briefly talk about host Dwayne Jenks paper replicating anomalies, because there’s a particular issue with it, that some of these other papers have tried to collapse anomalies have as well, they first look at factors that earn significant returns in their own right. And then they try to explain only those factors. The insignificant factors are never tested against a factor model. But this is the issue with that approach. If you regress insignificant factors on a set of other factors, those insignificant factors might have a positively significant alpha. And that is now a new anomaly. You don’t get to ignore those guys. If your asset pricing anomaly generates more significant alphas than there were significant factor returns in the first place. That is not a successful model, because there’s now more significant factors than before, a linear combination of factors is still a fact. So I hit on two things already. One is that factors are behavioral anomalies. The second is that there are many anomalies. Now we have these 100 or more potential anomalies, and we don’t believe the risk factors. So what are they, they are information about long horizon, expect to return to the firm. And this is information that can interact and be nonlinear, and it can be incorporated into price and one period another. And this is exactly how we would view the world. If we never learned about factors, we would look at a company and say it has this gross profitability, this operating profitability these accruals, I read this negative piece of news and this positive thing from their annual report. They’re being sued by a competitor for anti competitive practices. And you take all this information and you get a value for the firm. And you compare that to the price and you say this stock is overvalued or undervalued, you want your model to effectively do the same thing. Practically, that means you need machine learning model. At the very least you need linear reg. So you can prevent yourself from overfitting to those 100 Plus signals. But if you want interactions and nonlinearities, you need gradient boosting and random force if not feed forward neural networks. You don’t have to predict the value of the firm though. Yeah, practically, you’re predicting returned. But you want to include all of those signals and thrown price and valuation as additional features. Now, once you have your model, how do you improve your product, instead of trying to build new cross sectional factors, you’re trying to provide information to your model about the value of this firm. So you can learn more about expected return. So you’re still engineering features, you’re just not building factors. In summary, these anomalies aren’t risk factors. They’re characteristics that predict return. And they can do so through interactions and nonlinearities. There are many, many characteristics and you want to search for this data everywhere you can and use the latest tools to try to encode information about stock expected returns.
Corey Hoffstein 28:12
So I think for a lot of quants, who got their education in the last 15 years, that can be very difficult to sort of mentally unchained themselves from the traditional factor framework. And often it sort of requires seeing a nonfactor framework in practice to sort of break that mental box. So I’m curious whether you could go into maybe what that nonfactor framework does look like in practice versus the more traditional linear models? Absolutely.
Vivek Viswanathan 28:47
First, you need an expected return model. And assuming you’re predicting cross sectional equity returns, that model should utilize some subset of things that fall under machine learning. If you aren’t trading in individual stocks, then what you do is dependent on the amount of data that you have. In cross sectional equities, you generally have a lot so your models can be far more sophisticated. If you’re looking at commodity futures on a monthly horizon, I might use linear rich, but the cross section is probably not big enough for anything more complicated. If you’re looking at asset allocation, on the other hand, is even fine to use static weights. So because it’s hard to build conditional expected returns in that space that are better than unconditional expected returns. When you’re predicting cross sectional equity returns with 100 signals, you do not want to use OLS because you will overfit and get no excess returns in your auto sample back test. I can’t emphasize enough how different your experience will be using OLS versus using a model with some form of regularization built in that prevents your parameters from overfitting. But if you’re predicting on an annual horizon, you can rely linear reg and maybe Random Forests. If you’re predicting on a quarterly or monthly horizon. You want linear rich random forests and gradient boosting. You might be able to use neural networks on a monthly returns but we don’t if you’re predicting on a fine horizon like daily You’re intraday, then you definitely want to use neural networks, probably only neural networks. Now the next and most important step is you predict returns and each time step, let’s say every month, using models that could only be fit in prior months. If you’re predicting the return in July 2010, you can only use models that fit data up until June 2010. In other words, you can only use models that were fit to prior period data to predict the next periods return. That is how real life works. And you want your model to behave similarly, these are what are called pseudo out of sample or quasi out of sample back tests, we tend to just call them out of sample back to us because the word back test already implies that they can’t literally be out of sample. Now let’s go back to our predicting returns in 2010. example, let’s say your data set starts in 1985. In 2010, you have 15 years of data to fit your returns in 2020, you have 25 years of data to fit your returns. So your back test assumes you have less information than you do now, that does mean your back, this will somewhat understate your ability to earn returns. But that’s probably more than offset by the fact that markets get more efficient over time. Now, once you have your expected returns, you need a covariance matrix. There are many different ways to do this wall, you probably want to count for structural sources of covariance, like industry country size, and assume the remaining variances residual. But there are other ways for example, using some number of principal components, and then assuming the remaining variances residual works fine too. You also need to account for various decay horizons and you need to shrink the loadings on sources of covariance. I’m gonna skip over that, because it’s probably its own conversation. And it’s just not good podcast conversation, you need a whiteboard or something. Now, you have your machine learning expected returns and you have this covariance matrix. You want to do mean trackier optimization with respect to a benchmark, if your relative return investor, if you’re trying to maximize Sharpe ratio do mean variance optimization. You also want to have constraints on industry weights, country weights and individual stock weights. It’s difficult to communicate your model that you’re uncertain about the covariance matrix, and constraints are clumsy but effective way to say look, I did my best on the covariance, but take it with a grain of salt let’s make sure to not get too crazy here with the offsetting bets. I mentioned earlier that an equally weighted basket of China factors has something like a 1.5 information ratio from 2010. Until now, using all these methods on global and China specific signals and a walk forward out of sample way. An optimized portfolio China large cap will produce an IRR of about 2.8 After transaction costs since 2010. A walk forward optimized portfolio in China small produces IRF, about 3.5. After transaction costs, it will do that partially by building good expected returns but also by drastically reducing tracking error. At that information ratio your tracking error will be between four and 6%. So your back tested, expected return will be 12 to 24%. excess returns have sorry, given such a low tracking error, you might get classified as enhanced indexing despite your expected return. That’s something to be wary of. To deal with this, you can relax your constraints and reduce your risk aversion. But your information ratio will fall you cannot maintain very high information ratios while taking high tracking error because of the zero lower bound on weights. Now, if you have a long short portfolio, you can kind of go to town here. Now there’s some things worth realizing about expected return loss. If the model doesn’t think you can perform well on a market, it’s not going to bullshit you. So if you run the same walk forward optimize model in US large cap, you would earn an excess return of 2%. And an information ratio of point six after transaction costs since 2010. In Hong Kong, you would get an information ratio of zero after transaction costs, you wouldn’t earn any excess returns on the market, running em large, you get about 1.6 After transaction costs you run in China large spot 2.8. If you’re trying to do walk forward prediction in an efficient market, you’re gonna have mediocre results as you would in real life. If 90% of active managers in the US underperform over any 10 year period. It would be strange if we found a model that did incredibly well. Now another useful insight is that if you build a solid covariance matrix, your optimized model may show no significant factor tools using traditional factor attribution. And we actually had a discussion about this at our firm. Just last week, we fed one of our newly launched products into some factor attribution software. And lo and behold, we had virtually no factor tilts relative to our benchmark, the one factor till we did have momentum did not perform well over the past three months or so, and get our act of return over this benchmark that we had no significant factor tilts against was a positive 5%. And this is because factor tilts don’t really matter. They don’t tell you the expected return of the portfolio. They don’t even do a good job classifying the risk of your portfolio, a profitable retail stock in China eight Shares is not going to be any more correlated with a profitable tech stock in China eight shares and an unprofitable tech stock in China ah shares. In the end, even though you’re using value signals and profitability signals and many other signals, those very well may not show up as linear factor tilts, since your models nonlinear and accounts for covariance risk, and now naturally classified as a question. If you can’t use factor attribution, how do you know if you’re taking the same risk across two portfolios across two managers, you would take the correlation of the act of Return of the two portfolios. Now, if you’re worried that this is too backward looking, you can build an expected covariance matrix of the underlying securities and calculate the expected correlation of the two portfolios. That’s hard, but at least it’s accurate. If you’re using factor loadings to compare portfolios, you might as well use astrology for all the good it’s going to do you.
Corey Hoffstein 35:50
So one of the fundamental struggles that almost every quant deals with is the non stationarity of data. And it seems to me like that problem potentially compounds when you start talking about interaction and nonlinear effects. But before we even get there, when we talk about a market like China, where we mentioned there are big regulatory regime shifts, how do you think about model construction, where you just may not have the depth of data necessary for analyzing a signal because of a meaningful regime shift occurs.
Vivek Viswanathan 36:23
So if the regime changes rapidly and unexpectedly, then you’ll just be caught off guard anyway, nothing you can really do about that. But let me give an example in China that may be helpful in understanding what one might do to a regime shift that you seek. So in China, the government is trying to reform SOEs and make them more efficient. We have SOE classification. And SOE stands for state owned enterprises. By the way, we have SOE classification as a parameter in our expected return model. But if we believe the reforms would be successful, we should remove that variable. Now, if we don’t believe the reforms will be successful, we should keep that variable. And for what it’s worth, we haven’t decided what to do yet. But the key here is we need to act before the regime has shifted or soon after otherwise, the model will learn about it without our help. Now, one thing you can do to ensure that the model can handle changes in the relationship between features and expected return is to have a decay on the data, such that it learns more from recent data and less from data from the distant past. That’s a good idea in general, but it will handle slow changes in the market environment, not fast changes. But for what it’s worth, changes in the relationship between features and expect to return tend to be slow moving. With regards to adjusting models by hand. In general, I would strongly caution against it. If you have a strategy earning an IRR of three or even an IRR of two, what forward looking thing can you add to improve the model, it’s much more likely to hurt the model as improvement. And I’m embarrassed to say I know this from personal experience. We used to adjust our model weights when they did things that we deemed unintuitive. It turns out that’s a pretty bad idea. The model is learning from mapping features to expected returns using 25 years of data and 10s of 1000s of stocks, and optimally trading off between expected return and tracking error given a particular risk aversion, our intuitions are probably not going to help them model them. Now, if your model is missing something, it’s generally a good idea to add it directly to the model. So for example, let’s say you find that your biggest negative contributors to portfolio Performance come from failing to react to news. And the model currently has no way of reading the news. Instead of you read in the news and manually adjusting portfolio weights. You can use an NLP model, maybe Google’s Bert or something to encode the news and use that to predict return, you wouldn’t use a full encoding of a document, that vector would be too large, and instead, you might extract sentiment or some other component of information. That way the model can learn optimally without human intervention. When you
Corey Hoffstein 38:57
look at the large basket of characteristics that you’ve compiled over time, how many of those signals or characteristics do you think would fall under traditional factor classifications? For example, gross profitability would be a profitability or quality factor as an example,
Vivek Viswanathan 39:16
the brief answer is something like 50% fall under typical factor classification. Those signals fall under familiar rubrics like value size, accounting, conservatism, investment, conservatism, profitability, earnings and revenue growth. But I want to use your question to talk about signal classification systems in general, because I think it can inform signal research. And forgive me if I’m hijacking the question a little bit. And critically, I want to point out I’m talking to quants here. This is not how you want to present your signals to your clients, just how you want to think about the philosophy of signal taxonomy. So let me talk about how my research and portfolio management team think about signal classification because it’s in some ways, different but in some ways, extreme Leave familiar. First I want to do something similar to what AQR is quality minus junk paper did and take it in a slightly different direction. As you might remember, they look at the Gordon growth model divided by book value and say everything on the right side of that equation can be considered quality, I’m going to do something similar, but with the residual income valuation model. The residual income valuation model says that the value of a stock is equal to the book value plus the discounted sum of residual income. It’s very similar to the dividend discount model. And in fact, it’s derived from it. But it uses residual earnings instead of dividends. And it’s added to book value. So let’s divide through by both value and what you would have is the price to book value of a firm is equal to one, plus the discounted future return on equity of the firm. So this is pretty damn similar to what the AQR paper did, but again, I want to take it in a slightly different direction. This equation shows that there are three signals that predict long horizon return market capitalization, which is on the left side of the equation before we divide through by book value, price to book value, and current and future profitability. So you have three signal categories, size, value, and current and future profitability, size and value are more or less one signal each. Obviously, there are 20 different ways to define value, but the ones that perform better do so because they’re leaning on current and future profitability. The current and future profitability category encompasses almost every signal in existence besides size and value. So it encompasses accounting conservatism, so firms with low cash flow cools tend to have higher subsequent earnings. Firms with low net operating assets tend to have higher subsequent earnings. Current and future profitability includes investment conservatism, so firms that invest less tend to be more profitable in the future, presumably because they’re not moving into their lower ROI projects. Current and future profitability includes profitability signals, obviously. So return on equity return on net operating assets Return On Assets profit margin, it includes productivity signals like asset turnover and change in asset turnover, which also predict future earnings. It includes default risk measures like Altman Z score, or Campbell Hilscher and Salamis default probability, both of which negatively predict earnings, even low volatility, positively predictors. Presumably because firms that have growing earnings are less likely to exhibit market volatility as they are further away from default, kind of a Merton model the stock as a call option on the firm there. Even momentum predicts earnings though momentum. Of course, parsley reverses after the first one. And indeed, other market signals also predict earnings as well, which is somewhat surprising. And all those signals I mentioned before in China low age premium, those stocks have higher such winners high foreign holding stocks have higher subsequent earnings. Firms with new pledge shares can have higher subsequent rings. While firms with long term pledge shares high long term pledge shares tend to have lower subsequently exactly the same direction as a return prediction. And I’m not the first person who realizes McLean pontiff and engelberg wrote a paper in 2018, called anomalies and news. And they found that stock return anomalies were six times higher on earnings Nostrum days, and this was consistent with biased expectations regarding earnings, we might call this class of signals that predict earnings quality signals, but I think it’s important that folks don’t get confused by the nomenclature. Under this framework, there is no such thing as the quality factor in the quality signal. Quality represents dozens, if not hundreds of signals, often weekly, are completely uncorrelated with each other. They’re independent bits of information about future earnings. So your expected return model should be something like 80 or 90%, quality signals. And that’s only if you have different specifications of value. If you have only one specification of value, you might be 99% quality signals. But again, they’re just independent bits of information about current and future earnings. Because people could mean many different things. When they say quality. I prefer the verbose, current and future earnings, your current and future profitability, because it makes it abundantly clear what’s going on when someone proposes a signal on our research group. We asked one question, does that signal predict future earnings and half. That predictability of earnings is also why all quality signals rely on people under reacting to information if people were overreacting information that would flow right into value. If investors under react to a piece of information we need that information in the model. If investors overreact to a piece of information, that signal will be captured and valued and that information itself is not strictly necessary in the model. Now, there’s an entire other class of signals, which includes everything from high frequency trading signals to short term reversal and CO skewness. Those are signals that predict return but then reverse. In general, those signals are extremely high turnover. If you’re predicting on a daily horizon at a higher frequency. Those signals may Couple of baskets, the vast majority of your predictors, but for a monthly or less frequent model, those signals likely serve little purpose, unfortunately, don’t have a good term for those. So I just call them high decay. And I believe that is a complete taxonomy of signals for expected stock returns value, size, current and future profitability, and high decay signals. But critically, this is a taxonomy for quants and aids in organizing our research, it’s not the best taxonomy for targeted clients, because it requires that entire lengthy discussion I just gave. By the way, if someone has convinced me of a signal that doesn’t fall into that rubric, please reach out to me, I would love to discuss it.
Corey Hoffstein 45:37
So machine learning remains, surprisingly, still a very hot button topic among quants, there are those who don’t think there’s a lot of value. There’s some who just view it as a branch of statistics. And there’s others that have wholesale rebuilt their firms from the bottom up to incorporate a machine learning mindset. I’m curious as to how you gain the confidence to make it a wholesale part of your process. And why
Vivek Viswanathan 46:06
absolutely this shortest answer as to why we use it is because predicts returns better than not using machine learning. But honestly describe the process that we went through to get to machine learning because we were trying to solve a problem. And it wasn’t clear at the time that machine learning was even the case. So we had been using the long only factor approach in all of our strategies. That’s where you just weights upwards or downwards based on characteristic scores. And there’s a question of what factors strengths to use. There’s a question of over what dimensions in particular signal predicts returns? Is it within industry within country or across industries? There’s a question of correlations between factors and how you adjust other factors strengths, once you introduce a new factor, there’s a question of overfitting. Because instead of living in a world where you have to predict in a walk forward out of sample manner, you have to select your factors and strengths inset. So you know, your backtest always looks far better than your live performance. And I knew we had to find a better way. Now for what it’s worth, I do not come from a machine learning background, all I knew was that I needed to map the signals to expect a return out of sample. And so we tried various methods, we tried ordinary least squares that overfit. And remember, we were predicting out of sample, so we got pretty bad results. We tried loss, so partial least squares and principal component regression, and they performed poorly. Now, let’s talk about why those perform poorly, because they all try to collapse the feature space as a general rule trying to collapse the feature space under performance methods that use a the full range of features. linear range, on the other hand, will share loadings across colinear signals. If you have co linear signals, you don’t want to collapse them into one which PCR and pls effectively do or to push one out which lasso effectively does. Linear rich shares the loaded between co linear features. And that helps when you’re predicting noisy variables, and of course, stock returns are noisy. So we tried gradient boosting random forests and neural networks as well. All of those work fine neural networks are computation expensive and require careful initialization. So we ended up just going with linear Ridge gradient boosting and random forests. We found that taken together, mostly regardless of the region, which you predict, they predict returns wall in our walk forward out of sample models. The models earn much lower information ratios in Hong Kong and US large cap over the past decade. But those aren’t our primary targets we want to predict well in emerging markets and China in particular, and there it worked exceptionally well. Once we add an optimization, it became a no brainer. From an empirical standpoint, the information ratio is nearly doubled from the factor approach and the machine learning approach was walked forward out of sample while the factor approach had unavoidable in sample bias. But we didn’t immediately switch, which was probably the biggest mistake we’ve made it as as an investment organization. As we compared our machine learning paper portfolio results side by side with our factory based live portfolio results. Our machine learning strategies beat the living hell out of our old factory strategies. After seven months of seeing these ml strategies, Trump’s roll strategies, we migrated the model over, and I still occasionally look at what our returns would have looked like if we had stuck with those old strategies. And let’s just say I’m glad to switch. Now, let me be clear, we still have some smart beta products. And I should say there is still an advantage to Smart beta, which is easy, transparent. You can be transparent with a machine learning model, but it just takes longer. If you want to give someone a full specification of your strategy without discussing gradient boosting trees than the factory approach is a clean way to do so. We have to be clear about the trade offs here. When you choose the factor approach over the optimized machine learning approach. You’re basically giving up high risk adjusted returns for ease of explanation, which strangely is sometimes a mutually beneficial trade. If you say I use a mixed integer programming optimizer with a linear ensemble of random forests gradient boosting and linear rich for expected returns and a covariance matrix go from shrunk principal components Some people will just say, look, that might be brilliant or might be idiotic. But I’m in no place to judge. So just give me something I can understand. For those folks, you want smart beta. Now, I want to briefly talk about why machine learning is a hot button issue for quants. The first is, they might work in a space where machine learning isn’t appropriate. If you’re doing asset allocation, and you read rebalanced monthly, there might be a way to use machine learning to help your model. But I honestly can’t think of it. That’s obviously completely reasonable. Those folks shouldn’t use machine learning. The second is that some folks think it doesn’t really add a lot of value. Despite working in investment universe with a lot of data, they might have tested and said, this actually doesn’t help very much. I would encourage those folks to check which models are using, check their signals and check their hyperparameters. I’ve already discussed which model to use on which horizons, but I haven’t talked too much about the risk of using too few signals. If you have a multifactor strategy with four signals, and you put only those four signals into your expected return model, you probably will not see big gains from using machine learning. There’s insufficient data for the model to work with. Now, regarding hyper parameters, some hyper parameters, like learning rate really matter. So if you set your learning rate too high, your model is going to overfit. And your out of sample back test will generate dismal results. Now, if I’m being perfectly honest, I can’t know everyone’s analysis. So it could be the case that someone has figured out a proprietary non machine learning implementation that beats the optimal machine learning implementation, to not admit that possibilities to be completely diluted. But obviously, I haven’t seen that implementation. And that’s what I’d be talking about right now. The second issue that folks might have is that they have some factor dynamism that they find valuable. So for example, you might find that factors have momentum or that high value dispersion means that value will perform that expected return models actually already handled us. Factors that have momentum have momentum because their underlying characteristics are persistent, not because of anything about factors. In particular, if the underlying characteristics are not persistent, you won’t see the same positive autocorrelation. In fact, the return high dispersion and characteristics predicts higher factor returns because the stock returns are related to the underlying characteristic, not the percent all sorts in matters, that the highest are we stock has 150% ROI in one period, and 100% ROI in another. Those are different numbers. And that difference has an effect on the expected returns. A feature should drive expected returns more if that feature is more dispersed. That is as true in OLS. As it is in linear Ridge gradient boosting random forest or neural networks is because we’re stuck in factor world that we find short term factor momentum, or characteristic dispersion at all relevant or noteworthy. I like to compare factors to a geocentric view of the universe. So the geocentric model found all sorts of interesting behaviors for planets. So in the geocentric model the universe, most planets move east to west in the sky, but occasionally, they move west to east in these retrograde epicycles. Well, that’s just an illusion. Because planets aren’t revolving around the Earth, they’re revolving around the sun in elliptical orbits, without the need to reverse motion. In a similar way, all these potential factor timing models only arise because factors are poor description of the relationship between characteristics and expected returns. The last type of factor timing that I’ve seen is regime switching roles. I personally have not seen this work in context of time, they can build amazing back tests, but seem to turn into random number generators ex post. This may be due to my limited exposure, but most regime switching models I see are 90% confident they’re in one regime or the other. And then you infer conditional expected returns based on those regime states. They’re way too overconfident and thus they can result in bad decision making. Now the third issue is that it’s harder to explain underperformance in a machine learning context. If you’re ever building a value portfolio and value underperforms, you can always say well value underperform. So the portfolio underperform, after all, the client was the one who bought the value portfolio that’s on your fault. Even in a multifactor context, you can say, well, on average, these factors underperform so the portfolio underperformance factor investing sort of gives you now it’s easier to explain underperformance it’s even better when you underperform because junk stocks are glamour stocks or aggressive accounting stocks outperform then you can say expensive junk outperform the spirit. And that sounds like you’re not wrong, but the markets wrong. But if you use a method that’s supposed to account for all this information, as much information as it can, and earn excess returns based on that information, then when you underperform, you can’t really hide behind attribution, you can’t really hide behind any. You just have to say, look, I messed up. I like having that specific type of accountability. But not everyone does. For obvious reasons.
Corey Hoffstein 54:37
I’m hesitant to even ask this question, because I’m pretty sure I’m like, giving you a volleyball that you’re just going to spike down in my face. But the typical arguments against machine learning are, it’s a black box, it’s going to lead to overfit models when you combine 140 characteristics. It’s not appropriate for non stationary data, which is what we would assume Financial Data is I’m curious how you address these arguments.
Vivek Viswanathan 55:03
So linear Ridge gradient boosting and Random Forests aren’t blackbox. They’re just not OLS linear, which literally gives coefficients that you can look at the gradient boosting and random forest trees can be looked at and interrogated using impulse response functions. But we do I ever own attribution system. And there’s nothing really proprietary about it. So I’m happy to explain it. The idea is to build expected returns incrementally with traditional signal groups, let’s say value, then value plus profitability, then value plus profitability, plus accounting conservatism, then all of those signals plus low default risk. And so you then build optimized portfolios with these incremental expected returns. And you can attribute weights to each of the signal groups. Once you have the weight decomposition, you can produce back tests of returns from this and do a return decomposition. And this is really helpful in an intuitive way to attribute weights and it does so in the same nonlinear way that the model does so captures everything. But no, you’re not going to get very far attributing these portfolios with linear factors. But you really don’t want I’ll draw an analogy. Let’s say you saw a picture of a draft and said, That’s a draft. And I said, How do you know? Well, you’d probably say, it has a long neck, and these brown patches against like code for these horn light things on top of its head. And if I asked Yeah, but what lines make it a draft like point out the specific lines on the picture that make the singer draft, that doesn’t actually make any sense, you can point out the aggregate features on the picture. But you can’t point out specific lines on the picture that make this thing a giraffe. In a similar way, this decomposition can point out the weight effects of groups of features like value or profitability holistically, but they’re not going to give you linear relationships. Now, regarding overfit, I never really understood that in context of machine learning, because the biggest thing that machine learning emphasizes is tuning hyper parameters to cross validation sets and predicting out of sample on a test set. If you only allow yourself to predict on a walk forward out of sample basis, what are you overfitting to? Are you overfitting hyper parameters? Well, I’ll happily admit, I overfit some hyperparameters. And I’m treating model choice here as a hyper parameter. I only know that loss so PCR and pls do not work. Well. Now, I did not know that 1995 When our data strikes, I only know that trying to clean up signal space from 150 signals down to fewer signals is not a good idea ex post. So if I had to start in 1995, I would have happily been mixing in good methods with bad methods and only figured out that they were bad a decade later. But that’s nothing compared to the overfitting that occurs when you don’t use out of sample testing. The normal way to test signals is inception. So you know, a given signal performs well, if you test a signal in sample and get a T stat of six, then how do you add that to your factor model without overfitting the backtest? We have added signals to our strategy that meet our economic bar for inclusion and sometimes our back test of return declines. Do we now remove that signal? No, because that’s our new back test with five pips more return than before. Too bad for the backtest. In the live portfolio, you want the model to have that information contained in that signal, if indeed, it did meet the economic bar for inclusion, which is the ability to predict future earnings. So in short, machine learning models are less prone to overfitting because they’re evaluated on test sets. That is not to say that machine learning doesn’t suffer from overfitting. You might hear the machine learning community complaining about model architectures from methods being overfit to particular commonly used datasets like image. But that’s not the level of sophistication that these critics of machine learning have. Those folks are testing signals in sample, maybe doing some T stat correction, and then complaining that machine learning over fits. That’s utter nonsense. Now, I think you also brought up non stationary data. First, it’s optimal to have monitor key you want to overweight, more recent observations, but not by a lot. You want something like a 15 year half life on the weight of your observations. But incredibly, even if you have an infinite Half Life, which means no decay whatsoever, you do totally fine. And that was one of the bigger surprises we had we expected the optimal model decay to be high and a super important parameter. But empirically, it’s not. So it turns out the relationship between features and expected returns is fairly slow moving. If you want to make your decay high, you’re going to lose valuable information from those decision observations.
Corey Hoffstein 59:33
I’m curious when you look at the portfolio over time, you mentioned already that you actually don’t load on traditional factors. But do you find that the machine learning method is simply creating certain structural permanent, nonlinear characteristic tilts? Or are those characteristics that it’s leaning into? Are those dynamically changing over time given different market conditions?
Vivek Viswanathan 59:58
If you look at the In nonlinear loadings, according to our attribution, they are relatively stable over time, but they will change based on things like the underlying characteristic dispersion, or linear factor loadings definitely change over time. So if you did a traditional factor attribution on a portfolio, you would see that the loadings don’t explain the portfolio very well. And insofar as the model does have linear factor loadings, they are dynamic over time, this might be seen as some sort of cool feature machine learning models with a model dynamically timing factors. But it says more about the limitations of linear factor attribution than it does about machine learning models.
Corey Hoffstein 1:00:36
So despite having moved on from a more traditional factor based approach to investing, you actually hold still some pretty strong views. Specifically, there’s this ongoing debate about sort of multi factor portfolio construction, whether you should take an integrated approach or a mixed approach. We were chatting before we started recording, and you mentioned that you actually don’t think either approach is correct, you prefer something called a stacking approach, which I had never heard of. And I was hoping you could take a moment to explain to the listeners.
Vivek Viswanathan 1:01:07
Absolutely, by the way, I’m not entirely out of the spark beta game. For quad portfolios, I think we can and should jettison the factor approach and use machine learning models and characteristics to generate expected returns and optimization to build portfolios. But if you’re making index products, you need to use things that are straightforward like factor source, because you have to explain it fully to an index provider and to clients. Now, for what it’s worth. When someone asks for smart beta product, we can now use much more sophisticated methods, we can build expected returns of the factors, we can optimize the factor weights, we can use alternative data. Anyway, that disclaimer out of the way let’s get to mixing versus integrating versus stacking. So even though I’m sure everyone is familiar with this debate, by now, let me quickly go over the advantages and disadvantages mixing and integration. The mixing approach is averaging single factor Smart beta portfolios, while the integrated approach is averaging Factor scores and then creating a single factor from that average of scores. The advantage of the mixing approach versus the integrated approach is you get dynamic active weights based on how confident you are about various stocks. So if your factors completely disagree with each other, you will get very small active weights which you want because your signals disagree with each other. And by the way, bottom up expected return models behave like this too. If your expected return model thinks that the market is efficient, and there are no abnormal returns to be earned, it will not give you any activewords. That is a good thing. Another advantage of mixing is you can mix signals that have completely different rebounds frequencies. In theory, you could mix a high frequency trading signal that trades every second with a value signal trading every year, because they’re traded entirely separately. Now, of course, you want to trade your value signal every month. Because rebalanced timing luck, which you have written extensively on but I’m just speaking hypothetically, integrating can’t mix strategies of different rebalance frequencies generally. Now, the disadvantage of mixing is that you’re always losing active weights with each additional factor. That is your active weights mechanically reduce since if two signals agree on a stock’s active weight, it’s active weight will stay the same, but if they disagree, the act of weight will decline. Moreover, if two signals want to more than zero out a stock, while another signal wants to give a small underweight to the stock, the stock will get a small positive weight in the portfolio, even though two of the signals would ideally want to negate the stocks Wait, while the other single doesn’t even like the stock, but doesn’t hate. That doesn’t seem right either. So we came up with this method of stacking in 2016. Before we even heard of this debate, and I’m sure many, many others have independently discovered it, it’s not like mixing and integrating approaches never entered my consciousness, but we quickly decided stacking is superior. So what is stacking, you create long short portfolios from various signals you want to use in your strategy. And you multiply these long short portfolios by strength, let’s say 12.5%, which would mean the portfolio is 25% long and 25%. Short for a 25% active weight for each factor, then you sum up all the active weights across all the factors. Note that stock weights within a factor sum to zero aggregating factors will have a total weight of 0%. You add these summed weights to your benchmark weights. And since factor weights inherently sum to zero, your portfolio will have weights that sum to one. However, some of those resulting weights might be negative, you zero those out and you renormalize the positive weights, this method is what we call stack. Now, this method has all the advantages of the other methods. If your factors disagree, your active weights decrease if they agree, your factor weights increase. This dynamic agreement and disagreement contributes to your return and information ratio. You can test this empirically. By forcing your active weights to be constant, you will find that the naturally dynamic active weights earn a higher inflammation ratio than a static act. Moreover, you can mix different rebounds frequencies together by starting with low rebounds frequency signals and stacking the higher rebalanced frequency signals on top of the price drifted weights of the low rebound support for low low rebounds frequency portfolio. This might sound difficult, but it takes just an hour or two to Kota. Also, with stacking, you don’t lose active weights. So you can either gain or lose activities based on agreement or disagreement. If two factors agree, you will add activate, if they disagree strongly, you’ll lose activate. Finally, if a stock has a 2% weight in the portfolio, and two factors want to give it a negative 3% actively each and one wants to give it an EIGHT 0% active weight, it will completely zero out the stock. The third factor that is indifferent about the stock won’t give the stock back its positive weight. Now for what it’s worth, if you allow your portfolio to short stocks, or if your active weights are extremely small, then the mixing and the stacking approach is different only by a multiplier on activates. But if you’re in a standard long only portfolio, these approaches have very real differences. Now if you’re a quant, I wouldn’t worry about any of that, right? Just focus on the machine learning stuff and expect to return as an optimization. But if you’re a smart beta manager, hopefully that was helpful. Well, very
Corey Hoffstein 1:06:28
last question for you hear. We’re starting to see accelerated vaccine rollout. It seems like hopefully fingers crossed the COVID pandemic is behind us. And we’ll all be reopening again and on the road and meeting at conferences. Curious what are you most looking forward
Vivek Viswanathan 1:06:44
to? This is maybe unrelated to going back outside. But I’ve just started playing around on Kaggle. And Kaggle is a website that allows you to compete in machine learning competitions. And as somebody who is kind of come into machine learning it didn’t study it. At university, it’s really fascinating to learn and compete on these different things that are completely different from predicting stock returns. So you might identify the glomerulus in a kidney, or you might say whether a catheter was put in correctly or not. And I’ve just started engaging with this is incredibly difficult, and interesting and fascinating. And I love trying something new and just getting my ass kicked on it and slowly working my way up. So I’m definitely looking forward to engaging more with that.
Corey Hoffstein 1:07:34
Wonderful. Well, thank you so much for joining me today. I really appreciate it.
Vivek Viswanathan 1:07:38
Wonderful, thank you.
Corey Hoffstein 1:07:44
If you’re enjoying the season, please consider heading over to your favorite podcast platform and leaving us a rating or review and sharing us with friends or on social media. It helps new people find us and helps us grow. Finally, if you’d like to learn more about newfound research, our investment mandates mutual funds or associated ETFs please visit think newfound.com. And now welcome back to my ongoing conversation with Harley Bassman. My experience a lot more people focus on hedging the left tail but very few discuss managing the right. Why do you find upside convexity. so appealing
Harley Bassman 1:08:23
is the case that markets tend to rise slowly so the escalator up and they fall quickly elevator down. And if you do look at 1% moves or 2% moves, there’s many more down 2% And up 2% moves. And that makes sense because if you think about it, everybody’s long. Everyone owns something who’s short? Well, corporations are short because they sold the stock. They don’t hedge corporations are short, they issue the bonds, they don’t hedge homeowners issue the bonds the mortgage securities to us buyers, they don’t go and sell their bathroom, so they don’t hedge. So everyone’s long. The market is long financial instruments. And therefore it’s a challenge for them to adjust their risk profile. Because what they have to do to reduce their exposure, they’ve got to sell it to someone else, and have that person get a little longer risk. That’s tough to do. And thus you see skews in the market where the puts vols higher than the call vol aside from the general demand supply demand and he’ll buy puts and selling calls, which drives skew. It’s just the mere nature of risk. People are not risk neutral. Losing $1 hurts more than making $1. And so therefore, it’s very challenging to go and move risk around. And therefore you tend to see these out of the money calls, which in theory nobody wants because they’re already low in the market. Why do you want to pay to get longer that option tends to trade very inexpensive, sometimes crazy cheap. And when you go and take that asset, that instrument that risk that path dependent convexity and add that to your portfolio, you could then go and get rid of some other linear risk and end up with the superior profile. Most times, you will see out of the money calls in the equity market, trading well below realized volatility. And so a product like our spy up is almost brilliant in the way that you’re buying a core index, and then you’re buying another money call those two together give you a very enhanced, upward package. And so what your theory you could do is, instead of buying 50 of an ordinary index, you can buy 48 of ours. And then you have in a downtrend, you’re only long 48. In an upgrade, you get long, more of it, maybe long 52 And it doesn’t cost that much to do that because an option is so inexpensive. And this is the case in most, not all but most asset classes where you have that kind of thing show up but you have a skew, creating a very cheap way to buy optionality.