Stock Portfolio Structure of Individual Investors Infers Future Trading Behavior

Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics.


Introduction
Stock market trading provides opportunities at the cost of risk. For investors, the ultimate trading goal is to make as much money as possible by acting in such a way that the highest possible profit is realized at minimal risk. Yet how to trade optimally is far from obvious, and many factors influence trading behavior. The traditional approach to represent market trading has been a model perspective, assuming that investors act as rational and identical agents [1,2]. However, this assumption has been challenged by empirical evidence, which suggests that also other elements are present in financial markets [3]. For example, researchers have suggested models in which the economic decisions of investors also consider the effects of social, cognitive, and emotional factors. These factors and their influence on trading are often studied with data external to the actual trading process and include, for example, proximity [4], social media interactions [5] and web engine search queries [6,7], In this work, we instead focus on data more directly relevant to the trading by relating trading patterns and behavioral biases to stock portfolio structure. With real financial data on individual investors from the Swedish stock market, we study the connection between what stocks investors hold, and what stocks they buy.
Trading by individual investors and related behavioral biases have been studied by, for example, local bias [8], overconfidence [9], sensation seeking [10] and the disposition effect [11]. Studies show that investors base trading not only on rationale information, but are also affected by factors connected to both personal characteristics and associated external conditions. Various individual biases give rise to a trading heterogeneity among investors, or among groups of investors. The presence of such investor groups with similar trading behavior can be explained by homophily, i.e., the tendency of individuals to behave and bond with others who are similar. Investors potentially trade more similarly if they share certain properties, including, for example, age [12], gender [13] and familiarity [14]. However, classifying individual investors into such distinct categories is not straightforward. The classification is in some cases motivated by theoretical considerations, and in other cases by observed patterns in the data. Categories include, for example, fundamentalists and chartists [15], and informed and uninformed traders [16]. Other examples of investor categorizations in financial data are derived from trading correlations [17], direct stock trading data [18], and network approaches [19,20].
Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between investors' portfolio structure and their trading behavior in practice. Studies have found that many individual investors tend to hold poorly diversified portfolios and instead concentrate investments in only a few stocks [21]. The difficulty of searching through all available stocks also makes it more likely for individual investors to invest in stocks that attract their attention [22], and these attention-grabbing stocks are typically the ones that investors already hold in their portfolios. This bias indicates that portfolio structure and trading decisions are naturally connected.
In this paper, we explore the connection between portfolio structure and trading behavior in individual investors. With detailed data on stock portfolios from the central register of shareholdings in Sweden, we study the relation between stock portfolio similarity and trading similarity. Unlike most previous research, we do not analyze the direct trading, but instead focus on long-term trading behavior by looking at changes in share portfolios over time. We aim to examine two main questions: (1) How do investors in the stock market structure their portfolios? And (2) Can we learn about trading behavior by looking at the investors' portfolio structure?
To answer the questions of individual trading behavior we take three steps: First, we investigate how individual investors hold stocks, and how they trade. Second, we divide investors into groups based on portfolio similarity. This division is done with a network approach, where individual investors are considered to be nodes, and links between investors are constructed according to stock portfolio similarities. To group similar investors, we analyze the network with the community detection algorithm Infomap [23]. Third, we analyze the derived groups to investigate the relationship between portfolio structure and trading behavior. This analysis is done by comparing investor trading within groups to investor trading outside the group. In the following section we present the methods and the results, and, in short, we find that the portfolio structure of individual investors holds information on trading behavior, and that investors with similar portfolios, to a great extent, trade in a similar way.

Data from the central register of Swedish shareholdings
We examined more than 100,000 individual investors who were actively trading in the Swedish stock market from 2009 to 2011. The investors and their stock portfolios were extracted from a dataset with around two million investors. The dataset stems from the central register of shareholdings in Sweden, and covers basically all investors and their holdings in every publicly traded company in Sweden. The dataset was provided by Euroclear Sweden AB, and permission to use the data was given under a special agreement. Data are presented in half-year share register reports between June 30, 2009, and December 30, 2011, with detailed ownership information of investors in each registered company. The reports also included the companies' total share amount and their corresponding stock ISIN (International Securities Identification Number). Additional data, obtained from the Swedish Central Statistics Office (SCB), provided share prices for companies listed on the Stockholm stock exchange. Those data specify share prices at stock exchange closing time, i.e., the price of the latest sold share on the last trading day. If price data are lacking, bid price and then ask price were used instead. In total, the data contain share prices for around 500 listed stocks. The full dataset makes it possible to extract the detailed portfolio of an investor in the Swedish stock market. We have made anonymized and reduced data available online as detailed in Ref. [24]. Below, we explain the dataset in more detail, and the restrictions we set on the data in the analysis.
Investors are reported either as legal persons, e.g. corporations and funds, or natural persons, i.e., individual investors. Since we are interested in the trading behavior of individual investors, we considered investors that actually changed their portfolio, and focused on the holdings that investors can manage themselves, namely, direct holdings. Direct holdings are registered in the investor's name, as opposed to nominee holdings, which are registered and managed by an equity manager on behalf of the investor. The direct holdings of all investors are presented in each half-year report, with detailed information about registration type, share amount, and the equity ISIN in which the shares are held. This information makes it possible to find share changes in the portfolios between reports, provided that investors have a traceable identification number. Investors who lack a Swedish identification can not be reliably tracked in the data over time, and we therefore excluded these investors in the analysis.
To reduce the effects of noise in the data, some conditions for the included stocks were established. First, we only considered stocks of companies that existed for the entire time period. We therefore excluded stocks that were introduced or removed from the market during the time period for any reason. This exclusion was done to enable comparisons between two share reports without changes in the company domain. Furthermore, we also required that the total share amount of a stock must not have changed more than five percent during the time period. This condition was set because larger changes make it hard to distinguish actual active trades of investors from more passive changes in the portfolios directly related to a share amount change, as, for example, in the case of stock splitting. As a consequence of the share amount change criteria, we excluded, for example, the companies H&M and Swedish Match from the analysis. Finally, only listed stocks were considered in the analysis, since these stocks are publicly traded and it is possible to find an explicit price for them. It is also worth noticing the distinction between stocks designated A and B on the Swedish market. A company can be associated with more than one stock, because A and B stocks, and other potential stock classes in a company, must have different ISIN codes. We considered these different stock classes as separate stocks, because classes with less voting power usually are more liquid, and therefore give rise to different trading than the ones with superior voting rights.
In summary, we examined investors on the Swedish stock market who are natural persons, traceable over time, active in trading and primarily registered as shareholders. This selection means that we, for example, excluded investors registered as legal persons and secondary ownership through funds. We required company stocks to be listed and stable, in the sense that they must exist for the entire time period and have a share amount that do not change too drastically over time. With this noise reduction and data cleaning, we were left with 100,161 investors holding capital in 209 different stocks.

Portfolio vectors and trading vectors
We represent investor holdings in normalized portfolio vectors and consider the stock portfolio of an investor as a vector p, where p i represents the investor's proportion of capital invested in stock i. As an example, we can look at an investor with portfolio vector p in a market with four stocks. If the investor holds shares of total value 20 in stock 1, and shares of total value 80 in stock 3, the portfolio vector can be expressed as p~(0:2,0,0:8,0). Note that the value at a specific index represents the relative amount invested in corresponding stock, rounded to the nearest hundredth in the analysis for simplicity.
The portfolio vector representation is used to unify investor trading, even though the shareholding data reports do not provide direct trading information. However, the reports do specify detailed half-year snapshots of the investors' portfolios, and these snapshots make it possible to track changes in the portfolios over time. Analogously to portfolio vectors, we therefore construct trading vectors for investors based on the sum of all changes between two dates. In these trading vectors, we considered stocks in which investors bought shares during the time period. We only examined purchases, because correlations between portfolio structure and sold stocks follow trivially since investors can buy any stock but only sell stocks they already hold. To compute the trading vector, p T , we therefore extracted the positive changes in the portfolio, p T~p (t 2 ){p(t 1 ), between reports from June 30, 2009, and December 30, 2011, with stock prices from the first date. In this way, all elements in the trading vectors become positive. To examine the connection between portfolios and trading, we computed a similarity value between investors' portfolio and trading vectors, based on cosine similarity [25]. Accordingly, the similarity of vectors x and y is given by the normalized dot product sim(x,y)~S x,yT DDxDD : DDyDD : We use this similarity measure for the portfolio and trading vectors because it is simple and well-suited for analyzing investment structure, The similarity value is bounded between 0 and 1, since all portfolio and trading vector elements are non-negative.

Identifying groups of investors with similar portfolio structure
Since single investors hold sparse portfolios and trade to a small extent, we need to categorize similar investors in groups, and examine the overall trading behavior of each group. However, dividing investors into groups with similar portfolios is not a straightforward task, since the number of investors is large and it is difficult to distinguish groups without making assumptions and subjective divisions. One possibility would be to simply group investors with the most similar portfolios, but that approach causes problem on where to separate the groups, and we run the risk of losing important structural information. So we require a method of dividing investors into groups that accounts for both portfolio Reduced network at portfolio level, with the three portfolio structures as nodes. Self-links of portfolio nodes represent links between investors with equal portfolio structure. The weight w i for the self-link of portfolio i, with n i investors, is calculated according to w i~ni (n i {1)=2. The link weight w km between portfolios k and m, with n k and n m investors, respectively, is calculated according to w km~wmk~nk n m s km , where s km is the link weight between two investors in each portfolio structure. doi:10.1371/journal.pone.0103006.g001 similarity and the structural information of the system. These premises can be fulfilled with clustering tools from network theory, and we therefore take a network approach to analyze the data. Network theory has received increasing interest in finance research recent years, thanks to its ability to model the organization and structure of large complex systems [26]. Network approaches aimed to find structures in finance data include, for example, bank-liability networks [27], stock correlation networks [28] and trading networks [20]. In our network approach, we model the data as a network with investors as nodes connected by links according to portfolio similarity. Optimally, we create links between nodes according to causal connections between investors, but such relationships are difficult to obtain, and it is not even clear what they would be. Instead, we use portfolio similarity as a representation of relationships, and connect investor nodes with weighted and undirected links, according to the similarity value of the investors' portfolio vectors. This representation creates a network of investors with links based on portfolio similarity, and we refer to this network as a similarity network. To account for the similarity values with the most information and also make the analysis more efficient, we only consider links with values greater than or equal to 0.9 in the similarity network. In the similarity network, investors with at least one common holding will have a similarity value and accordingly be connected by a link. This means that the total number of links in the network can be large for a single investor. To reduce this complexity and make the analysis procedure feasible and more effective, we capitalize on the fact that many investors have identical portfolio vectors and use this to create a reduced network. Investors with equal portfolio structures will share the same links to other investors, which results in a large amount of redundant information. To remove this redundancy and reduce network size, we therefore represent every portfolio structure as a node, instead of having one node for each single investor. This approach reduces both the number of nodes and the number of links in the network. The resulting reduced network becomes an aggregated version of the original network, where links between investors with identical portfolio structure are represented by a self-link. An example of the reduction procedure can be seen in Figure 1.
The goal of constructing the similarity network is to group investors with similar portfolio structure, albeit not necessarily exactly the same portfolio. To identify candidates for such groups, we could perform a random walk between investor nodes in the network, and in each step visit a neighboring node proportional to the link weights. In this approach, a group would be a number of investors where the random walker stays for a relatively long time before moving to other investors. However, we cannot identify unambiguous groups simply by performing such dynamics on the network, and therefore we need an extended method. Fortunately, exactly those dynamics are implemented in an existing community detection algorithm, namely the map equation [23,29]. For network analysis, this algorithm is referred to as Infomap. This algorithm has proven to be one of the most efficient communitydetection methods in comparative studies [30,31]. In the case of similarity networks, the algorithm identifies groups of investors with strong similarities in portfolio structure, which is precisely what we are looking for.
To find a group representation that accounts for both portfolio similarity and market structure, we use the Infomap algorithm with the hierarchical clustering option. This option provides a division of nodes, i.e., portfolio structures, into top-level clusters, consecutively divided into smaller subclusters. We are, however, interested in the division of individual investors, and this division can be obtained simply by mapping each portfolio to its corresponding investors. In this way, we get a categorization of investors into groups that represent similar, but not necessarily identical, portfolio structures. The categorization provides an overview of the stock market, and describes how investors in the Swedish stock market structure their portfolios.

Trading similarity of investors with similar portfolios
We want to examine if investors with similar portfolio structures tend to trade more similarly than other investors. To determine if two investors are similar, we consider the categorization of investors into groups and study if investors from the same group, i.e., investors with similar portfolio structures, trade in a more similar way than investors outside the group. Trading comparisons for investors of a specific group are made in a bootstrap procedure in the following way: First, we randomly choose a set of investors from the group and compute their aggregated trading vector. Next, we randomly choose two other sets of investors, one set from the same group and one set created from investors that do not belong to the group. We compute the aggregated trading vectors of the two sets and calculate their trading similarity in relation to the first trading vector. In this way, the similarity values make it possible to compare trades within the group to trades outside the group. The detailed procedure looks like: For each group G, repeat N times 1. Randomly choose a set of investors i 1 with set size n from G. In each iteration, we examine whether trades within a group are more similar than trades outside groups, and we repeat this procedure 1000 times. For each group we search for the investor set size that makes the within-group trades significantly more similar than outside trades in the comparison. If portfolio structure and trading were totally dependent, we would have set size 1 for all groups, since this would imply that we can learn about trading of investors in the same group by looking at the trading of only one single investor in the same group. However, the trading data are not very extent for single investors, and therefore we need to compare the aggregated trading of a set of investors to obtain useful information. To measure the trading similarity of a group, we search for the set size that is needed for significance, i.e., the number of investors that are needed so that 95% of trading comparisons are larger within the group than outside.

Stock portfolio similarity and trading similarity
Stock portfolios of investors differ by orders of magnitude, both when considering the number of shareholdings and the total value. To unify the portfolio structure, we therefore represent investor holdings in normalized portfolio vectors. This representation considers investment distribution and not the magnitude of investments, which means that two portfolio vectors can be similar even if the total value of the portfolios differs. Analogously, the vector representation is also used to unify the investor trading in trading vectors. When we construct portfolio vectors for the 100,161 investors in the data, we find 52,115 different vectors. Interestingly, only 2,652 portfolio vectors are needed to cover 50% of all investors, which shows that a large proportion of investors distribute their capital in a similar way. Many investors have capital invested in only one stock, and consequently hold portfolios that are not diversified at all. When we construct the trading vectors, we find that 32,970 vectors are needed to cover all existing trading strategies during the period. The number of trading vectors is smaller than for portfolio vectors because many investors only invest in one or a few stocks.
Individual investors are more likely to invest in stocks that attract their attention, due to the difficulty of searching among all available stocks. Attention therefore greatly influences individual investor trading decisions [22], and the attention-grabbing stocks are naturally the stocks that investors already hold. The combination of the attention bias and the tendency of people to act similarly to their peers, as in, for example, local bias [4,8], gives rise to an interesting question. If individual investors hold portfolios concentrated in only a few stocks, have a preference for investing in stocks they already hold, and also tend to act in accordance with similar investors, does this imply that there is a connection between portfolio structure and trading similarity? The portfolio and trading vectors make it possible to evaluate the question and compare investors, and in Figure 2, we show the relationship between portfolio vector similarity and trading vector similarity. The variation in the data is large, but a trend can be seen in the case when all investments are considered; the more similar the portfolio structure, the more similar the trading. The relationship is evident for portfolio similarity values greater than 0.9, which suggests that these similarity values hold important information. The observed relationship between portfolio structure and trading could be explained with homophily, i.e., the tendency of individuals to engage in similar activities to their peers. This tendency can sometimes make it hard to determine from observational data whether a similarity in behavior exists because two individuals are similar, or because one individual's behavior has influenced the other. Because of the nature of the shareholding data, it is difficult to determine causal reasons for the observed similarities, but since we are primarily interested in the connection between portfolios and trading similarity, this is not an issue.
Comparisons of single investors result in a large proportion of similarity values that become zero, both in the comparisons of portfolio and trading vectors. This means that many investors neither hold nor trade similar stocks, and therefore the evaluation of single investor comparisons becomes problematic. To overcome this problem and be able to compare investors, as a first approach, we created groups of randomly chosen investors, and compared the group's aggregated portfolio and trading vectors to other groups of equal size. The aggregated portfolio and trading vectors are constructed as the mean investment distributions of all investors in the group. The distributions for portfolio and trading similarities, with group sizes 1, 10 and 100, are shown in Figure 3. First of all, the figure illustrates why we want to group investors,  since larger groups decrease the number of similarity comparisons that become zero. The figure also shows why we do not want to form these groups randomly, as larger groups cause similarity values to end up in a narrower interval. This shift is a result of the random group formation and demonstrates that the information in such groups is limited, since portfolio dependencies disappear when investors are chosen randomly. Consequently, both group size and how we aggregate groups are important factors when we examine the relationship between portfolio structure and trading.

Groups of similar investors from similarity network analysis
To find groups and analyze the aggregated trading behavior of investors, we model the shareholding data as a network with investors as nodes connected by links according to portfolio similarity. The network approach creates a similarity network, and we analyze this network with the community-detection algorithm Infomap [23] to identify groups of similar investors. The groups describe how investors in the Swedish stock market structure their portfolios, and the basic properties for the ten largest top-level groups can be seen in Table 1. It is worth noticing that more than two-thirds of all investors are included in the ten largest groups. Despite the almost endless number of ways for individual investors to structure their portfolios, the analysis shows that a few related investment strategies are favored.
The investor groups represent related portfolio structures, and in each group we find some stocks that a large proportion of the investors hold. These top stocks constitute the main connectors between investors in the group. The Ericsson B-stock, which is the stock held by most investors, represents the top stock in the first and largest group. Almost three quarters of the around 25,000 investors in the first group hold shares in Ericsson B. General recommendations on how to invest in the stock market state that diversified portfolios are preferred, but investors still seem to make the choice to hold underdiversified portfolios [32]. As a result of this bias, the mean number of stocks held by individual investors is

23% TeliaSonera
Investors shows the number of investors in the group. Mean stocks reports the mean number of portfolio holdings for the investors in the group. Significant set size states the set size that is needed for trading significance, i.e., the number of investors that are needed so that 95% of the trading comparisons are larger within the group than outside. Top stocks shows the stocks held by most investors in the group and the corresponding proportion of investors in the group that hold the stock. doi:10.1371/journal.pone.0103006.t001 relatively small, which, in turn, makes it possible to identify groups of investors with some specific stock structures in common. When considering portfolio diversity it is worth noticing the possibility that investors also can have capital invested in, for example, diverse funds, but such secondary ownership is not included in the analysis. Individual investors tend to hold only a few different stocks, and this limitation can actually be beneficial, since gathering information on stocks requires resources [33]. Individual investors seldom have resources to gather information on more than a few stocks, and informed investors therefore tend to concentrate their portfolios in the stocks in which they hold an informational advantage [34].

Similar portfolio structure infers similar trading
The investor groups make it possible to compare the trading of investors with similar portfolio structures. However, the relationship between portfolio structure and trading behavior is dependent on what stocks investors hold, and therefore the relational effect varies between groups. To investigate this relationship, we use a bootstrap procedure in which we compare within-group trades to outside-group trades and search for the investor set size that makes trades significantly more similar within the group. The significant set size specifies the number of investors from the group that is needed for significance in trading similarity, i.e., the number of investors that are needed so that 95% of the comparisons between aggregated trading vectors are larger within the group than outside. The results are presented in Table 1, and we can see that only one investor is required for significance in group 4, while 43 investors are needed in group 1. The number varies between groups, which means that investors with certain portfolio structures tend to trade more similarly than others. The group differences are illustrated in Figure 4, where the mean trading similarity is shown in relation to mean portfolio similarity, for all investors within the groups. Noticeable is that group 1 has relatively low scores for both portfolio and trading similarity, which can be explained by the fact that the group is large and therefore diverse when it comes to both portfolio structure and trading. Group 4, on the other hand, has a relatively high similarity score for both portfolio and trading similarity. This suggests that the investors within group 4 are more homogeneous than investors in other groups when considering both portfolio structure and trading behavior. Unique to this group is the Saab B-stock, which is held by all investors in the group. It is also interesting to compare the trading behavior of group 2 and 8, since group 2 has a lower portfolio similarity score, but still a higher trading similarity score than group 8. An explanation for these differences could possibly be found by looking at the top stocks of each group, see Table 1. The three top stocks of group 2 are all in the car industry sector, while the three top stocks in group 8 are from three different sectors, mining industry, telecommunication and technology. Group 2 therefore seems to represent a more homogeneous ownership, and consequently the investors in the group trade more similarly than the investors in the more diverse group 8.
The group trading similarity may be due to that investors of the same group base trading decisions on similar information because they, for example, possess the same information sources, such as web sites or television [35]. These information sources are more likely to be similar if investors share common interests, which for group 2 potentially could be cars or the automotive industry. There is also evidence of communication among stock market investors, which suggests that investors exchange information about trading in discussions with their peers [36,37]. Accordingly, social interaction is an influential factor when it comes to stock market trading [38]. Therefore, our empirical findings on stock portfolio structure could in principle be used to refine multi-agent based order book models with different types of agents [39]. However, more research is needed to bridge the gap in time scales between long term investments and short term trades.
To put the results in the context of previous work, we consider some studies that have examined the joint behavior of individual investors, although not from a network perspective. Ref. [40] analyzed household trading and found that trading was highly correlated and persistent. The study observed that individual investors tend to react to the same kind of behavioral biases at, or around, the same time. Such behavioral biases could lead to associated trading for related investors, and an explanation for trading similarity could therefore be that similar investors seek and receive similar information over time, and correspondingly trade in accordance. Another explanation to the trading similarity among groups relates to investment herding [41], where some investors change their portfolio in the same way as a leading group of investors which they trust. In the end, it is interesting to consider the fact that the stock portfolio of an investor actually reflects the aggregated result of all past trades done by the investor, and therefore a relationship between portfolio structure and trading already exists.

Conclusion
We show that there is a relationship between stock portfolio structure and trading, namely, that individual investors with similar portfolio structures tend to trade in a similar way. To analyze this relationship, we use real stock market data and a procedure that is threefold. First, we find that comparisons of portfolio and trading similarity for single investors show a large variation, and therefore the data must be analyzed on an aggregated level. Second, we find that the stock market displays a structure among its investors, with groups that represent investors with similar portfolio structures. Third, we find that investors with similar portfolios, to a greater extent, trade in a similar way. Relation between mean trading similarity and mean portfolio similarity for the investors of the ten largest groups obtained in the analysis of the similarity network. The large and diverse first cluster neither score high on portfolio similarity nor trading similarity, while group four seems to have most homogeneous investors, both when considering portfolio similarity and trading similarity. The portfolio similarity values are all greater than 0.5, since the groups are created with portfolio similarity as a condition. doi:10.1371/journal.pone.0103006.g004 The results show that the stock portfolios of individual investors hold meaningful information, which could be beneficial in the analysis of individual trading behavior. The use of new data sources in economics could improve our understanding of dynamics in financial systems and make it possible to develop models for inferring market reactions. However, even though new and previously unused data can provide important information that relates to market dynamics, the problem of evaluating whether the featured relations are causal or not still persists. Therefore, while future work on the relationship between portfolio and trading includes examining the results from an economical perspective and connecting them to actual market dynamics, the general goal of future work in finance will be to further explore causality when connecting data to dynamics.