ARIMA and FBMAP approach for forecasting daily stock price in Colombo Stock Exchange,

: This study attempts to examine whether the stock prices of companies listed on the Colombo Stock Exchange (CSE) follow Random Walk Hypothesis (RWH) and presents a mathematical model of stock prices using a Fractional Brownian Motion Process with Adaptive Parameters (FBMAP) compared with Auto-Regressive Integrated Moving Average (ARIMA) time series model. The period covered by the research was January 2015 to June 2019. The main objective of the study was to investigate whether stock prices follow the RWH and to compare two major forecasting methods. To check RWH, Chi-square Test, the Runs Test, and the Auto-correlation Test were used. The Augmented Dickey-Fuller Test (ADF Test) was used to verify the stationarity of the data set. In the first phase, the best fitted ARIMA model was found using Akaike Information Criteria (AIC), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). In the second phase, the proposed FBMAP was used to predict future stock prices. The results obtained demonstrated the potential of the ARIMA model and FBMAP model to predict the stock price indices on a short-term basis. The simulation results showed that the FBMAP model is more suitable for forecasting daily closing prices than the ARIMA model.


INTRODUCTION
Stock price forecasting is an important topic in financial and academic studies due to the number of factors that can influence behaviour over time. High volatility is one of the main characteristics of this time series, and this phenomenon might occur due to several reasons such as market factors, economic environment, political scenarios, health catastrophes, etc. Nevertheless, in recent years, the attention of many researchers in the field is focused on the study of financial time series. The reasons for this phenomenon can be easily understood if one considers the global scenario for the past several years, with a massive economic crisis in big markets such as the US and Europe. In this framework, the in-depth study of financial time series, especially those involving modeling and statistical analysis applied to economics, becomes paramount. In the stock market, the stock prices rise and fall, and mathematically this phenomenon is known as the one-step Binomial-model. When buying a stock, the investor anticipates that the price will go up but there is no guarantee that this will happen (Abeysekara, 2001). Investors want to take advantage of this volatility to make money out of the stock market. In the literature, researchers in finance portrayed considerable interest in modeling stock price behavior and testing existing models. As a result, Random Walk Hypothesis (RWH) was developed by Kendall (1953) and Fama (1965) who hypothesized that stock price movements are irregular and past prices are of no use in predicting future movements (Fama, 1970). Most of studies on the RWH model focused on developed economies, where the majority of the stock prices follow the random walk (Armstrong and Sorescu, 2014). But on the other hand, research studies on Shanghai and Shenzhen stock markets and Less Developed Countries (LDCs) stock markets reject the RWH, providing mixed evidence for Argentina, Brazil, Chile, and Mexico (Fama, 1965). Three versions of the efficient market hypothesis are varying degrees of the same basic theory (Fama, 1965). The weak form suggests that today's stock prices reflect all the data of past prices. The semi-strong form believes that investors cannot utilize either technical or fundamental analysis to gain higher returns in the market. Moreover, the strong form version of the efficient market hypothesis states that all information is completely accounted for in current stock prices and there is no evidence that can give an investor an advantage on the market (Abeyratne and Power, 1995). However, in this study, initially, we have checked whether empirical data follows RWH or not. If the data do not follow RWH, it indicates that we can forecast future stock prices (Arulvel et al., 2011). Since understanding and predicting future conditions are crucial, traditional forecasting techniques and a novel technique have been used to predict stock market prices. For data with less volatility, studies usually consider autoregressive integrated moving average (ARIMA) models introduced by Box and Jenkins and developed a systematic class of models to handle time-correlated modeling and forecasting (Meyler, Kenny and Quinn, 1998). In several fields, this method is good enough to describe phenomena and make good predictions as well (Engle, 2001). Another process, namely a fractional Brownian motion process with adaptive parameters (FBMAP) is proposed in this study because it exhibits a long-range dependent property (Mandelbrot and Ness, 1968). The proposed FBMAP method differs from the Brownian motion process, since the Brownian motion process has the independent increments property, while FBMAP does not. After studying the behavior of each time series and based on the results of each model, the best will be chosen, and predictions will be made.

MATERIAL AND METHODS
The study was carried out based on secondary data, which were obtained from CSE, the annual report of the Central Bank of Sri Lanka. The sample period spans from January 2015 to December 2018. The period of time was selected by considering the regular behaviour of the stock market. This means that the results would be different if the market was facing instability due to a crisis or any other event. Therefore, the selected period of time reflected truthful results. The first 50 trading days from 2019 January to 2019 June have been used to validate the forecasted models since short-term forecasts are more accurate than long-term forecasts in financial time series. Based on the Standard and Poor's rating, a sample of 20 companies was selected for this study, because S&P SL 20 is the index based on the market capitalization that follows the performance of 20 leading publicly traded companies listed in the Colombo Stock Exchange. The main question to be answered within this study was: which one of the forecasting methods would produce better results in the financial time series context? To achieve this objective, initially, stock price behavior was identified using RWH, and traditional ARIMA models were used to forecast future stock prices. To identify the rejection or the acceptance of RWH, Chi-Square Test, Runs Test, and Autocorrelation Functions were performed. The augmented Dickey-Fuller Test (ADF Test) was used to confirm the stationary of the data set. Box-Jenkins test, Akaike Information Criteria (AIC), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) were used to identify the most accurate and suitable ARIMA models for each firm. Among the range of nonlinear models to finding out the volatility of stock price, the most stable is the Auto-Regressive Conditional Heteroscedasticity (ARCH) family (Baldauf and Santoni, 1991). Although the ARCH models have been very successful in capturing volatility clustering, ARCH models were used to capture the dynamic behavior of conditional variance using lagged disturbance in this study. Furthermore, the BDS test was used to verify the linearity of the data set. After identifying the most suitable ARIMA model, future daily closing prices were forecasted for the next first 50 days. At the second stage, the proposed FBMAP was used to predict future stock prices.

Autoregressive Integrated Moving Average Process
The ARIMA modeling procedure is explained by identifying the rejection or acceptance of RWH. To examine the hypothesis, first, the Normality of data was checked using the Chi-Square test, and Independency was checked using the Autocorrelation test and Runs test. If the data do not follow RWH, future stock prices can be forecasted. A generalized ARIMA (p, d, q) model can be written as: where φ 1 , φ 2 ,…,φ p and θ 1 , θ 2 ,...,θ q are the autoregressive and moving average parameters, respectively, and e's are the white noise. The autoregressive AR(p), order p, and moving average MA(q), order q, are determined from the analysis of the autocorrelation function. The number d indicates the number of differences applied to the time series to remove the trend. The autoregressive parameters φ's and moving average parameters θ's are estimated from the model based on p, d, and q. Building Steps: 1) Identification: Using graphs, statistics, Autocorrelation function(ACF) graphs and Partial Autocorrelation function (PACF) graphs, transformations, etc. to achieve stationary and tentatively identify patterns and model components. 2) Estimation: Determine coefficients and estimate through software application of least squares and maximum likelihood methods, 3) Diagnostics: Using graphs, statistics, ACFs, and PACFs of residuals to verify whether the model is valid. If the model is valid then use the selected model, (1) otherwise repeat the steps of Identification, Estimation, and Diagnostics. 4) Forecast: Using graphs, simple statistics, and confidence intervals to determine the validity of the forecast and track model performance to detect out-ofcontrol sample situations.

2Runs test
A Runs Test is commonly used to check the independence in a stochastic process. Runs Test hypothesizing mutual independence of successive price changes. Acceptance or rejection of the hypothesis will indicate whether the series is random. The mean ( r ) and the standard deviation ( r ) of the mean are calculated by using the following formula: where n 1 and n 2 denote the number of positive and negative values in the series, respectively.

Auto-Correlation test
Autocorrelation measures the correlation between a series of returns and lagged returns in the same series. A significant positive autocorrelation indicates relationship in the tested trend.
where p(k) refers to the serial correlation coefficient of the given time series X t and X t-k denotes the log return of the index at time t, and k is the lag of the period. Hence, v (X t , X t-k ) is the covariance between the return of the index, over the period ( , −1) and lagged return ( − ) periods (earlier) and (X t ) is the variance of return on a security over the period ( , −1).

ARCH model
The ARCH (Auto-Regressive Conditional Heteroscedasticity) family models emerged in the context of high volatility. It is normally used in the context of financial time series due to its characteristics, such as high variance and volatility. As the name suggests, heteroscedasticity, or unequal variance, may have an autoregressive structure over different periods and may be uncorrelated. In order to apply the ARCH model, a test must be performed to ensure the time-series includes the necessary features of heteroscedastic models. If the time series has zero mean, the ARCH model can be written as: where, The ARCH(1,0) model for the variance of the model X t is that conditional on X t-1 , and the variance at the time is: The analysis of the ACF and the PACF must be done to understand the orders of the model, and the model can be tested after this step. ARCH models have had several variations since they were first developed. GARCH (Generalized Autoregressive Conditional Heteroscedasticity) models were only presented as variations of the ARCH, hence, in the framework of this study, those models will not be used.

BDS test
The main concept behind the BDS test is the correlation integral, which is a measure of the frequency with which temporal patterns are repeated in the data. Furthermore, the BDS test can be recognized as one of the most powerful tests for nonlinearity. Consider a time series {X t } T t=1 and define its n-history as: The BDS tests null hypothesis indicates that the variable of interest is independently and identically distributed (iid). Under the null hypothesis, the BDS statistic is obtained by: where σ̂ (T, n, ɛ)is the sample standard deviation of C(T, n, ɛ) -C(T, n, ɛ) n and n is the sample size.

Stochastic model fractional brownian motion with adaptive parameters
In the generally accepted model, the randomness of the stock price is modeled by the Brownian motion process. A stock price process (X , ≥ 0) is represented by the stochastic differential equation (SDE) as shown in:

Parameters
and are the rate of return and volatility, respectively. The process ( , ≥ 0) in (10) is a standard Brownian motion process. The stochastic differential equation (10) is driven by the Brownian motion process ( , ≥ 0). In the real world, and in (10) are not constant at any time. Hence, these parameters in the research are the adaptable parameters based on time. But in practice, the dynamics of stock price have a long memory (long-range dependence). The Brownian Motion with Adaptive parameters model in (10) is not suitable to describe the dynamics of the stock price. Therefore, the driving process of the model (10) is replaced by a fractional Brownian motion process. The significant difference between the fractional Brownian motion process and the Brownian motion process is that while the increments in Brownian Motion are independent, increments for fractional Brownian motion are not. Therefore, fractional Brownian motion is proposed to predict future stock prices. The fractional Brownian motion process (B H t , ≥ 0) with Hurst index is: (8) where Γ is the gamma function, and .
The paraameter α = (H -0.5), where H є (0, 1). Since and are not constant, the driving process of model (10) can be replaced by a fractional Brownian motion process: where (B H t , ≥ 0) is a fractional Brownian motion. Also Mandelbrot et al. have proposed to use the process ( , ≥ 0) instead of (B H t , ≥ 0), since ( , ≥ 0) has absolutely continuous trajectory. So, the process ( , ≥ 0) has longrange dependence. Hence, model (12) can be considered as: The rate of return and volatility are adaptive parameters based on time. Hence, the above model (13) can be defined as a fractional Brownian motion process with adaptive parameters (FBMAP). The rate of return and volatility of prices are estimated using the following equations.
where R i is the return of stock price which can be computed by R i = (X i+1 -X i )/ X i , R ̅ is the average of return R i , and is the number of returns.

ARIMA model result
Since there were 20 companies, only the first company, Access Engineering PLC used to explain the process of the data analysis. The daily closing price is a time series process {X t }, analyzed to build the ARIMA model. Figure 2(a) reveals that the process {X t } was not stationary. Figure 2(b), it was clear that the Auto Correlation Function (ACF) dies down very slowly. The ACF given in the above figures revealed the autocorrelation function of the selected firm has significant correlation coefficients, so it confirmed that CSE stock prices were not independent. This means that technical analysis can be used by investors. The autocorrelations for stationary series were large for low order autocorrelations but died out rapidly as lag length increases. If the series is trended, autocorrelations at low lags are very high and decline slowly as the lag increases. So, we concluded that this data set is not stationary and there is a trend. These facts ensure the process was non-stationary. To make the process stationary, we transformed the {X t } series to the {Y t } = {log X t } series. Figure 3 is the window plot of the  log transformation of lag one difference for the stock price. From this plot, the data looks stationary and randomized. Stationarity was confirmed from the AfDF test with a p-value of 0.01, where the alternative hypothesis was the time series data are stationary. Next, the autoregressive and moving average orders p and q are determined from the PACF and ACF plot in Figure 3.

By inspecting
The data fluctuated around zero; thus, they were considered as stationary data. This fact indicated that the log transformation on the lag 1 difference is sufficient and further data treatments are unnecessary. Table 1(a) shows the ADF Test results for the original data. It shows p-value is not significant. That is original data are not stationary. To validate that log transformation on the lag 1 difference is sufficient to make data stationary, ADF Test was performed for transformed data. Table 1(b) shows ADF Test results in a p-value of 0.01, which indicates that lag 1 difference logarithmically transformed data is stationary with a 5% level of significance. ARIMA model identification was done by considering the ACF and PACF for the stationary time series data.
The fitted ARIMA models were chosen by Akaike Information Criteria (AIC) values. The ARIMA models consist of lower AIC considered as most suitable models and among that the model consists with least RMSE and MAPE is the best model. The "auto.arima" function in RStudio software gave the output of the best ARIMA model corresponding to the given data set. From Table 2, the ARIMA (1, 1, 1) model has the lowest values which fit the Access Engineering PLC daily closing price data most perfectly.
The residual analysis is a convenient graphical technique for model validation in that it tests the assumptions for the residuals on a single graph.
The residual plots followed a normal distribution and constant location and scale. Most were in the range (-1,1). The histogram and normal probability plot indicated that the normal distribution provides an adequate fit for this model. Besides, the autocorrelation plot and partial autocorrelation plot of the residuals from the ARIMA (1,1,1) model were generated. The autocorrelation and   partial autocorrelation plot showed that for the first 31 lags, all sample autocorrelations except those at lag 30 fall inside the 95% confidence bounds indicating the residuals appear to be random. Next, the Box-Ljung test was applied to the residuals from the ARIMA (1,1,1) model to confirm whether residuals are random. The Box-Ljung test showed that autocorrelations among the residuals are zero (p-value = 0.01503), indicated that the residuals are random and that the model provides an adequate fit to the data. It is important to present heteroscedasticity in the data series before confirming the selected model, and this can be achieved by doing the ARCH effect test.
The null hypothesis of this test is that there is no ARCH effect. In this case, the null hypothesis cannot be rejected (since the p-value is above 5%), so there was no ARCH effect. The BDS test is conducted to check the non-linearity of the data set. In the case of 20 companies, the BDS confirms the data was linear in nature. The respective p-values of the BDS statistic were not significant in the case of all the selected companies. Thus, we concluded that the residuals are not distinguishable from a white noise series. Overall, we concluded that the ARIMA (1,1,1) is the best model to forecast future closing prices for Access Engineering PLC. The same procedure followed for all other 19 companies and a total of first 50 trading days from 2019 January to 2019 June predicted by the selected ARIMA model and compared with the actual average prices which showed in Table 4 with prediction error calculated by the formula, (16)

Stochastic model FBMAP result
The model proposed in equation (13) was used to predict the future daily closing price for each company, and a total of the first 50 trading days from 2019 January to 2019 June were predicted. Predicted average values, actual average values, and individual errors are shown in Table 5, and the errors were calculated by the same formula (16).

Comparison
In this section, the combined output from the above three models is discussed. Table 6 shows the empirical results obtained from the models, and Figure 5(a), 5(b) and 5(c) reveal the results graphically.

CONCLUSION
The study attempts to develop a prediction model for forecasting the stock market trends based on the technical analysis using historical time series stock market data. The results obtained via relevant tests indicate that RWH does not hold in the context of CSE, and it is obvious that the successive price changes are not random. Therefore, the past daily closing prices can be used to predict future daily closing prices. The experimental results obtained demonstrated the potential of the ARIMA model and FBMAP model to predict the stock price indices on a shortterm basis. The best fitted ARIMA model was found using AIC, RMSE, and MAPE tests. Furthermore, the ARCH test was performed to identify the dynamic behavior of conditional variance in the data set. However, the data does not present the features that are necessary to achieve the ARCH test and can conclude that the ARCH effect is not present in this time-series study. Moreover, the BDS test confirms the data is linear in nature. The simulated prices from both models are compared with the empirical prices. With the results obtained from the comparison of the two models using the error term, the FBMAP model can compete reasonably well with emerging forecasting techniques in short-term prediction than the ARIMA model. By concerning the results, we can conclude that the FBMAP is more suitable to predict the daily closing prices in the future in the short term. Furthermore, the graphical outputs illustrate that the ARIMA model is not suitable to forecast daily closing price, especially for the companies which have a low daily closing price value. In summary, the research reached the goal of understanding which methods perform better in respect of the CSE scenario for the time frame studied. This could guide the investors in the stock market to make profitable investment decisions whether to buy, sell or hold a share of stocks.

ACKNOWLEDGMENT
Sri Lanka Technological Campus, Padukka, Sri Lanka is greatly acknowledged for providing facilities.

STATEMENT OF CONFLICT OF INTEREST
The authors declare no conflict of interest.