Figures
Abstract
In order to foster a modern economic system and facilitate high-quality economic development, it is crucial to establish a conducive business environment. Undoubtedly, the evaluation of the business environment for enterprises constitutes a prominent area of research. Nevertheless, ensuring the authenticity and security of the raw data sources provided by participating enterprises poses a challenge, thereby compromising the accuracy of the evaluation. To tackle this issue, an enterprise composite blockchain construction method for business environment is proposed in this paper, which stores the raw data of enterprises by the means of hybrid on-chain and off-chain. Initially, the enhanced hash function SHA256 is introduced to encrypt the raw data of enterprises. The encrypted data is subsequently stored in an off-chain Level DB database, which is based on non-volatile memory. This approach effectively alleviates the burden on communication and storage. Secondly, a composite storage strategy on-chain is adopted: the key values from the Level DB are stored in the DAG-based Conflux public blockchain, while the enterprise state data is stored in the consortium blockchain, so as to provide trusted evidence of business environment evaluation data. Finally, it is demonstrated through a large number of experimental comparisons that the enterprise composite blockchain construction method proposed in this paper exhibits better read and write performance, lower storage efficiency and storage overhead, and outperforms both the before-improved Level DB database and existing blockchain storage models.
Citation: Li S, Wang J, Song B (2024) An enterprise composite blockchain construction method for business environment. PLoS ONE 19(3): e0299162. https://doi.org/10.1371/journal.pone.0299162
Editor: Rashmi Sahay, ICFAI Foundation for Higher Education Faculty of Science and Technology, INDIA
Received: July 3, 2023; Accepted: February 6, 2024; Published: March 1, 2024
Copyright: © 2024 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The datasets are available from https://doi.org/10.6084/m9.figshare.25108220.v2.
Funding: This study was supported by the Applied Basic Research Program of Liaoning Province (No.2022JH2/101300250); the Ministry of Education University-Industry Collaborative Education Program (No. 230701160261310); the National Key R&D Program of China (No. 2023YFC3304900); the General Program of University Basic Scientific Research of Education Department of Liaoning Province (Science and Engineering) (Initiating Flagship Service for Local Projects) (No. JYTMS20230761). The funders had a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: NO authors have competing interests.
1 Introduction
The business environment covers the sum of the time and cost of complying with all aspects of policies and regulations from the start-up, operation to the end of the enterprise, which includes Starting a Enterprise, Dealing with Construction Permits, Getting Electricity and Credit, Registering Property, Protecting Minority Investors, Enforcing Contracts, Trading across Borders, Paying Taxes, Resolving Insolvency, as well as the social, economic, political and legal that impact enterprise activities. In summary, it is a systemic project that involves various policy reforms in multiple areas, including but not limited to, opening to the outside world and enhancing communication horizontally and vertically, and so forth [1]. For a city, a favorable business environment will accelerate the economic development and assist it to grow stronger [2]. As such, it is imperative to store the data generated from the whole process of enterprise operation, as proof for the business environment evaluation. Note that, blockchain, with decentralized, tamper-proof and traceable features [3–5], has substantial advantages in guaranteeing data security and integrity, which coincides with the urgent necessity to enhance data authenticity and security for business environment evaluation.
Unlike traditional databases, blockchain makes it almost impossible to delete or manipulate recorded information through the use of cryptographic methods [6], and it has the unique features of decentralization, traceability, trustworthiness, open sharing, independence, stability, accuracy, and efficiency [7], which fuel productivity gains, time and money savings, data security assurance, and rapid access to information. As a result, blockchain technology has been widely adopted in various fields such as finance, public services, Internet of Things, cyber security, supply chain, and so forth [8–12]. Nevertheless, the majority of traditional blockchains ignore diverse range of data types, large data quantity, and strong correlations between data during storage, resulting in the existing blockchain storage structure deficiencies in throughput, transaction confirmation speed and scalability, and making it will fail on short-time and high-concurrent data scenarios. At the same time, existing efforts are concentrated in the fields of finance, education, Internet of Things and logistics, but the research on "blockchain + enterprise business environment evaluation" is extremely rare. And the traditional method does not take into account the security and integrity of the raw data of enterprises when evaluating the enterprise business environment.
To this end, in order to better evaluate and analysis enterprise business environment, this paper delves deep into the storage method of enterprise raw data of business environment and proposes an enterprise composite blockchain construction method, which introduces Non-Volatile Memory (NVM)-based Level DB [13] and Directed Acyclic Graph (DAG) [14]-based Conflux [15]. In chronological order, the raw data of the enterprise is stored utilizing a classified and efficient manner, which is guaranteed to be tamper-proof, traceable, and mutually recognized by multiple parties. The method proposed in this paper can also be applied in various areas, including supply chain management, financial services, identity management, and the Internet of Things (IoT). In supply chain management, this paper’s method enables businesses to integrate multiple systems onto one platform, improving coordination and efficiency. In financial services, it eliminates intermediaries, reduces costs, and ensures secure transactions. For identity management, it provides a trusted system for secure verification. In IoT, it enables secure communication and ensures data integrity and privacy.
The main contributions of this paper are as follows:
- To the best of our knowledge, to assess the business environment of enterprises, this paper firstly applies the blockchain technology to the raw data storage process for enterprise business environment. Focusing on the security and efficiency of data storage, an enterprise composite blockchain construction method for business environment is devised to ensure the tamper-proof and traceability of data.
- To ensure the security demand of enterprise raw data tamper-proof, this paper puts forward an improved hash function SHA256 algorithm. In essence, it increases the number of hashing operations in the data block and the complexity of the algorithm logic and compression function, so as to enhance the collision resistance of the operation and the avalanche effect of the Hash results, thus further enhancing the security of the raw data.
- In view of the characteristics of enterprise business environment data such as diverse range of data types, large data quantity, and strong correlations, this paper presents a storage mode that synergizes on-chain and off-chain to further degrade the communication and storage burden on the blockchain system. For off-chain storage, an NVM-based off-chain Level DB storage model is proposed. In Level DB, the Hash value encrypted by the improved SHA256 algorithm is as the Key value, and the Value value is the enterprise raw data.
- For on-chain, disregard of data association relationships, vanilla blockchains result in slow transaction confirmation and low throughput. In response to the limitations, this paper proposes a composite blockchain construction method, where the public blockchain adopts the DAG-based Conflux to store the Key values in the off-chain Level DB database, and the consortium blockchain stores the state data of enterprises, i.e., the correlation relationship between enterprises. The novel storage mode further improves the storage efficiency and provides support for the evaluation of enterprise business environment.
The subsequent sections of this paper are organized as follows. Second 2 is mostly concerned about related work. In Section 3 and 4, the enterprise composite blockchain construction approach for the business environment proposed in the research is discussed in detail. Section 5 includes several experiments and analyses to demonstrate the superior performance of the enterprise composite blockchain construction approach. We form a conclusion and recommend further research in Section 6.
2 Related work
Until now, the scholars at home and abroad have carried out a surge of extensive and thorough research on Level DB optimization and blockchain construction, and have achieved significant advancements.
In regard to Level DB optimization. Dai [16] et al. proposed the Bourbon method, which uses greedy piecewise linear regression to learn the key distribution and achieve fast lookup with minimal computation, but the Bourbon method complicated garbage collection. FlatStore [17], a PM-based key-value storage engine, decoupled the roles of KV storage into persistent log structures and volatile indexes, and integrated two technologies to provide low-latency and high-throughput performance. However, this storage engine required excessive CPU resources due to the compressed log format. Zhang [18] et al. proposed ChameleonDB, which employed an LSM-tree structure to efficiently admit writes with low write amplification, while it used an in-DRAM hash table to bypass multiple levels of the LSM-tree for fast reads. However, ChameleonDB was not suitable for distributed environments. Yang [19] et al. proposed a design method based on nonvolatile memory and machine learning. It utilized persistent variable memory tables on the NVM, which reduced access latency to some extent, but led to more severe write stalls.
In the realm of blockchain construction. Zamyatin et al [20] devised Inclusive blockchain, which extends Nakamoto consensus and GHOST rules to DAG and designed a framework to include off-chain transactions. Sompolinsky et al [21] proposed the PHANTOM platform, where participating nodes firstly found an approximate k-cluster solution for their local block DAG to prune potentially malicious blocks. Subsequently, the remaining blocks were topologically sorted to obtain the final total block order. Nevertheless, both Inclusive blockchain and PHANTOM are susceptible to attacks when the block generation rate is high. Das et al [22] proposed to build the Bitcoin-NG blockchain, which improves the throughput by periodically electing a leader and allowing it to specify the full order of transactions over a period of time, but failed to improve the confirmation time of transactions. Gazsi et al [23] proposed the Vault blockchain, where Vault employed the sharding technique to construct the blockchain to decrease the storage cost while balancing network bandwidth cost. Although the combined throughput of all the shards was high, the throughput of inter-shard transactions was still limited.
2.1 Comparison
Due to the deficiencies such as slow data merging on disks, the need to reorder write-to-disk data, previous work such as Bourbon, FlatStore, ChameleonDB and other methods will lead to writing stall, writing amplification, and poor read performance during Level DB optimization. In addition, traditional blockchains such as Inclusive, PHANTOM, Bitcoin-NG and other storage structures fail to take into account the storage burden in the case of diverse data types and large data quantity, resulting in low throughput, slow confirmation speed and poor scalability. As such, this paper integrates Level DB optimization and blockchain storage efficiency and overhead, and proposes an enterprise composite blockchain construction method for business environment, which can improve aforementioned shortcomings and significantly improve read and write performance, storage efficiency and reduce storage overhead.
3 Enterprise raw data off-chain storage
As shown in Fig 1, the overall architecture of the enterprise composite blockchain comprises Level DB database, public blockchain and consortium blockchain. The Level DB database stores the raw data of the enterprise, where Value is the raw data, and Key is the Hash value encrypted by the improved SHA256 algorithm. The Public blockchain, responsible for transactions, stores the Key value corresponding to the raw data in the enterprise Level DB database, while the consortium blockchain stores the state data of the enterprise. Through the abovementioned architecture, the data collaboration between on-chain and off-chain is realized. The on-chain realizes the capacities of computing and storage through the off-chain, and the off-chain and on-chain docking realizes the sharing of heterogeneous information.
When storing, managing, and searching the vast data generated throughout the enterprise process from start-up, operation to closure in an off-chain, persistent and trustworthy manner, vanilla SQL databases may not be sufficient because of their limitations in performance and scalability. Instead, Level DB database exhibits excellent concurrent read and write performance, and can achieve horizontal and vertical expansion of data storage to meet availability and partition tolerance while ensuring ultimate consistency. Therefore, Level DB database with high reliability and scalability can meet the aforesaid requirements. When data is on the chain, the enterprise composite blockchain storage architecture is built, and the DAG-based public blockchain is utilized to store the Key values corresponding to the enterprise raw data in the Level DB database off the chain to achieve high throughput and fast confirmation. The consortium blockchain stores enterprise state data, enabling quick retrieval of corresponding enterprise state information for business environment evaluation.
3.1 Enterprise raw data encryption calculation
Prior to storing the raw data into the off-chain Level DB database, the hash function SHA256 algorithm is employed to encrypt the raw data of the enterprise. When queried, it enables verification of whether the data has been tampered with. SHA256 converts an arbitrarily long message into a shorter, fixed-length message digest, which is used as the Key value in the key-value pair, and the Value corresponds to the raw data, which is stored in Level DB database, meanwhile the Key value stores to the Conflux public blockchain in the enterprise composite blockchain. To ensure the security demand of enterprise raw data tamper-proof, this paper puts forward an improved hash function SHA256 algorithm to encrypting raw data. In essence, it increases the number of hashing operations in the data block and the complexity of the algorithm logic and compression function, so as to enhance the collision resistance of the operation and the avalanche effect of the Hash results, thus enhancing further security.
SHA256 algorithm adopts six logical functions and a set of constant Kt, the input is a 512bit message block xi, and then divided into 16 groups of 32bit byte M0,M1,…,M15, the output is a 256bit message digest [24], SHA256 algorithm process is as follows.
- Initialization
(1)
- Prepare message list Wt
(2)
(3) where the logical function is calculated as follows
(4)
(5)
- Initialize eight working variables A、B、C、D、E、F、G、H0 according to
, and then assign values to them using the intermediate results of each round of Hash values.
- When 0≤t≤63, carry out the compression function
(6)
(7)
(8) where the logical function is calculated as follows
(9)
(10)
(11)
(12)
- Calculate the middle hash for each grouping
(13) where i is the ith group of the message, and after all groups are processed, the Hash value of 256bit is output.
In this paper, the improved SHA256 algorithm firstly adds 16 operations to each 512-bit data block hash operation to increase nonlinear diffusion, thus ensuring that each bit of the message can affect more bits. Moreover, the complexity of the SHA256 algorithm logic and compression function is increased to accelerate the differential diffusion of messages and make the recursive process more "random", which thus eliminates the dependency condition of local collision. The pipeline of improved SHA256 algorithm is depicted as follows.
- (1) For message block M(i),i = 1,2,…,N, perform the following loop.
- 1) Prepare message list Wt:
(14)
(15)
- 2) Working variable initialization:
(16) where A~H denotes 8 working variables of length 32bit;
denotes the Hash value in the previous round of calculation, when i = 1,
are constant initial values, which can be chosen randomly, with reference to the unimproved SHA256 algorithm, set as
(17)
- 1) Prepare message list Wt:
- (2) When 0≤t≤79, carry out the following compression function
(18)
(19)
(20)
where T1,T2 is the intermediate variable and the compression function uses a sequence of 80 32-bit words of the
constant, the first 64 32-bit byte are given in Ref. [25] and the latter 16 32-bit byte are defined as follows
(21)
The six logical functions employed by the improved SHA256 algorithm are all based on 32-bit byte (e.g. x,y,z) for operation, and the result of each logical function is a new 32-bit byte, the specific definitions are as follows
(22)
The loop operation of each group is shown in Fig 2.
The aforementioned improvement measures enhance unidirectionality, nonlinearity, pseudo-randomness, collision resistance and avalanche effect of SHA256. The improved SHA256 algorithm is employed to encrypt the raw data of the enterprise and generate the corresponding Hash value as the Key value in the off-chain Level DB database for subsequent storage operations.
3.2 NVM-based off-chain Level DB storage model construction
After the encryption calculation of the improved hash function SHA256 algorithm, the encrypted Hash value of the raw enterprise data is used as the Key value and the raw data is used as the Value value, which are correspondingly deposited into the off-chain Key-Value type database Level DB. Level DB is a Key-Value non-relational database storage system based on Log-structured Merge Tree (LSM-tree) architecture, which is widely used in many fields due to its advantages of efficient writing and minimal space usage. However, the LSM-tree exhibit limitations such as writing stall, writing amplification and unfavorable reading. Therefore, this paper proposes an NVM-based LSM-tree storage model, and the off-chain Level DB storage model architecture is shown in Fig 3.
As shown in Fig 3, Layer L0 from LSM-tree architecture is placed on NVM, and layers L1 to Ln are stored on Solid State Disk (SSD) to address writing latency when data are merged. And a learning index is introduced to the files in L0 to make the files in layer L0 orderly and decrease read latency, and improve data searching speed.
Fig 4 showcases the organization form of the data in L0 on NVM. NVMTable is divided into three types of blocks based on storage content, namely Super Block, Meta Block, and Entry Block.
As shown in Fig 4, Super Block is the first Block of each NVMTable, which is responsible for storing the meta information of the file and contains the attribute fields as shown in Table 1; Entry Block is responsible for storing key-value pairs or pointers; Meta Block is responsible for storing the bitmap information of 1024 key-value pairs, which marks whether a key or value is saved in the Entry Block as the real data or a pointer.
When Minor Compaction occurs that involves writing Immutable MemTable to the disk L0 and converting it to SSTable, the data needs to be written to NVMTable, and meanwhile, files NVMTable and Container will be created. In the process of writing data, firstly, determine whether the lengths of the Key and Value data is shorter than 32 B. If so, it is an inline data and write it directly to the Entry Block; otherwise, write the corresponding data pointer in the file Container. Secondly, while writing data to the Entry Block, Meta Block is created in memory until NVMTable no longer writes data. Subsequently, the Meta Block is appended to the end of the file. Finally, the Super Block in the head of the file is updated to complete the write operation.
The use of a learning index on the NVM layer to predict the data position, thus enhancing the speed of data lookup. By index, the location of a key in L0 can be accurately determined. Firstly, each NVMTable is trained with a distinct index model. Secondly, an approximate mapping of the key to the data location is constructed. Then, starting from the approximate mapping and comparing the size of the mapping with the target key, the correct position is determined by searching linear search in the forward or backward direction. The Piecewise Geometric Model (PGM) index is a superior alternative. As shown in Fig 5, PGM index is a multi-level structure, where each level represents a segmented simple linear regression with different error rate lower bounds. The segmented linear regression divides the data into n+1 segments, which contain a set of points p0,p1,…,pn. It is represented as a segmented function, as shown in Eq (23).
As shown in Fig 5, in the first layer, the data is divided into three segments, with each segment represented by a simple linear model (f1,f2,f3). As a result, each linear model is designed to predict the corresponding key values within its respective segments on the basis of predefined error values. Then, another error-bounded segmented linear regression is computed by treating the segmentation boundary of the first layer as its own sorted data set. This process is repeated until the PGM at the top level reaches an acceptable size. As for PGM index, it is constructed from the bottom to the top, which first picks an error bound and then reaches the bound with a minimal linear segmentation model. This step is repeated until the segmentation model is smaller than the threshold. Each regression treats a fixed error bound as an approximate index, and recur it. In the process, firstly constructing a segmented regression model using the underlying data, and then constructing another with fixed error using the splitting point of the first regression, where the Key lookup is a search process for each regression model until the underlying data is found.
4 On-chain enterprise composite blockchain construction
4.1 Conflux public blockchain construction
The traditional public blockchain based on Bitcoin style exists deficiencies in scalability, throughput rate, and transaction confirmation delay, and so forth. As such, this paper adopts the DAG consensus protocol Conflux based on the backbone chain from the special structure of DAG to construct the Conflux public blockchain in the enterprise composite blockchain, and the architecture is shown in Fig 6.
In Conflux, the transactions consist of Key values corresponding to the enterprise raw data in the off-chain Level DB database as described in Section 3.2. Each transaction has been assigned a unique ID generated by the cryptographic digest function to ensure integrity. Additionally, a block consists of a list of transactions (four transactions in block B) and reference links to previous blocks (B links to the two previous blocks). Conflux starts with a predefined Genesis block to determine the initial state of the blockchain. All blocks and edges form a DAG, and potential variations in the contents of the DAG from node to node within a short period of time due to network latency, the ultimate goal of Conflux is to maintain consistency among nodes. In addition, all participating nodes are connected through the Gossip network, and whenever a node initiates a transaction, it will broadcast the transaction to all other nodes (Tx9). Similarly, when a node generates a new block, it also broadcasts the block to all other nodes (Block B).
In Conflux, each node maintains a pool of pending transactions, which covers the transactions that have been received by the node broadcast but have not yet been packed into any block. Upon receiving a new transaction, the node adds that transaction to its transaction pool (e.g., adding Tx9 to the transaction pool in Fig 6). Once a node generates a block or receives a block from another node, the node removes all transactions in the new block from its pending transaction pool. For example, in Fig 6, the node removes Tx1, Tx2, Tx3, and Tx5 after generating block B. Due to network latency and the existence of malicious nodes, concurrent blocks may pack duplicate or conflicting transactions, so it will be addressed by the consensus protocol. Each node runs a block generator to generate valid new blocks to pack pending transactions. Furthermore, each node maintains a local state containing all blocks perceived via broadcast. When a node discovers a new block, the node will update local DAG state accordingly.
The underlying block data structure of Conflux public blockchain is shown in Fig 7, which adopts the form of tree diagram to speed up the block-out speed by parallel processing, and does not compromise the system’s security due to the forking problem, thereby enabling each enterprise to upload transactions in parallel (the Hash value corresponding to the raw data of the enterprise). Ultimately, it allows the whole system to process blocks and transaction efficiently.
In Conflux public blockchain, a novel consensus mechanism is proposed, which can handle concurrent blocks without discarding any concurrent block as forks. In Conflux, except for the Genesis block, each block has one outgoing parent edge (solid arrow in Fig 7), which represents the voting relationship between blocks. Each block can have multiple outgoing reference edges (dashed arrows in Fig 7) that indicate the order of block generation. All parent edges together form a parental tree, where the Genesis block is the root. Conflux selects a chain from the Genesis block to one of the leaf blocks as the master chain. In the selection principle of the master chain in Conflux, it follows the Ghost protocol. This process begins at the Genesis block, and calculates the subtree size of each node. Generally, the largest subtree is as a node of the master chain. For example, in Fig 7, since A has more children than B, Conflux selects Genesis, A, C, E, and H as the master chain.
When a node generates a new block, it first computes the master chain in local DAG state and sets the last block in the master chain as the parent of the new block. Then, it identifies all end blocks without incoming edge and creates reference edges from the new block to each end block. In Fig 7, when a new block N is generated, the node selects H as the parent block of N and creates a reference edge from N to K. In the process of consensus, the sorting algorithm uses the master chain to divide all blocks into epochs, with each block on the master chain corresponding to an epoch. Blocks in each epoch can be reached from the corresponding block on the master chain through a combination of parent and reference edges, but does not include in previous epochs. In Fig 7, J belongs to the epoch H, because J can be reached from H, but not from any of the previous blocks of the master chain.
When performing consensus to determine the block total order, Conflux initially orders the blocks according to their corresponding epochs, followed by their topological order within each epoch. For the local DAG in Fig 7, Conflux obtains the following total order: Genesis, A, B, C, D, F, E, G, J, I, H, K. In addition, when ordering transactions, the order is determined by the block total order. If two transactions are in the same block, they are ordered by the appearance order. During the ordering process, if two transactions conflict with each other, Conflux will discard the second transaction. If a transaction appears in more than one block, Conflux will keep only the first transaction and discard the redundant transactions. By following the aforementioned process, data updating is performed on the Key value corresponding to the raw data uploaded by the enterprise.
4.2 Fabric consortium blockchain construction
The consortium blockchain of the enterprise composite blockchain is responsible for storing the state data of enterprises from start-up, operation to end, facilitating data analysis for evaluating the enterprise business environment evaluation. Due to the large number of companies, this paper selects a certain number of pre-determined enterprise nodes to establish the consortium blockchain, and these nodes implement consensus algorithm, public verification, and secure data storage, thus reducing network resource consumption. To implement the consortium blockchain module, the Hyperledger Fabric framework, a well-known representative of the consortium blockchain, is used, and the Fabric architecture diagram is shown in Fig 8.
As illustrated in Fig 8, the Hyperledger Fabric architecture contains four primary components, namely member management service, blockchain service, smart contract and event flow [26], which are responsible for enterprise identity access identification, data storage sharing, business logic processing and asynchronous communication respectively. Among them, blockchain serves as the core architecture of Fabric platform, which facilitates services for communication between enterprise nodes by P2P protocol, distributed ledger between nodes, ledger storage and consensus mechanism management.
The consortium blockchain is designed to store enterprise state data, and the symbols used in stored procedures and their meanings are shown in Table 2. When the contract execution conditions are met are met, the smart contract is automatically triggered to accesses, shares and stores data. According to the pre-defined constraints, it performs data storage. The main workflow are as follows:
- System initialization
Initially, each enterprise node needs to undergo the identity authentication by the regulator node. Once the authentication is successful, the node becomes a valid participant in the consortium blockchain network; then it obtains, which represents the public-private key pair used for data encryption and certificate respectively; Finally, it performs the initialization of the system, where the enterprise node downloads the current block data storage location index table of consortium blockchain from the record pool of the neighboring enterprise headquarter nodes.
- State data uploading
Initially, enterprise node Ni initiates an upload request to local enterprise headquarter node BSj,which includesand digital signature Sig_1 currently used by Ni to ensure the reliability and authenticity of the data source. Secondly, BSj receives and verifies Ni’s request and identity information to confirm legitimacy, and responds accordingly. Finally, Ni encrypts the state data Data using the public key
in the current public-private key pair and sends this state data together with digital signature to BSj, and BSj encrypts the record data sent by Ni through public key
to obtain the final uploaded data Record. The formal description of the above process is as follows.
(24) where
,
.
- Uploading data collection
BSj validates the uploaded Record. If the data is secure and valid, it is stored in the local record pool, otherwise, it is ignored. - Proof of workload for local enterprise headquarter nodes
After every period T (ten min in this paper), BSj merges all valid data collected within T into a data set Data_set() with data signature to ensure the legitimacy and verifiability of source. BSj, who finds a valid workload proof first, gets the right to record this data block and is rewarded accordingly. The effective workload proof is that BSj calculates the Hash of the current block based on the random number x and the values of Hash, Timestamp, Merkle tree root of the previous block P_data, where x needs to satisfy
, and Difficulty is the difficulty value used to adjust the speed of BSj to calculate the correct x value. BSj, which has obtained the bookkeeping right, needs to broadcast the current Data_set and the calculated x to other enterprise headquarter nodes for verification and validation. If pass, the BSj merges the Data_set into a new data block and stores it in the consortium blockchain, and receives the corresponding system reward.
- Block consensus among enterprise headquarter nodes
BSj becomes the master node (Leader) in the current consensus process, while other enterprise headquarter nodes are used as slave nodes (Slave), and the consortium blockchain adopts Practical Byzantine Fault Tolerance (PBFT) consensus mechanism for consensus, and the consensus process is as follows.
- 1) Firstly, Leader collects the Data_set from Slave and consolidates it into a new data block with Leader’s digital signature and the Hash of the new data block appended for verification. Then, Leader broadcasts the new data block to each Slave to wait for verification by other nodes. The specific consensus process is shown in Fig 9, and the formal description of the above process is as follows.
(25) where
,
.
- 2) Upon receiving the data block, Slave verifies the block Hash and digital signature and other information sent to confirm exactitude and legality, and broadcasts the verification result Result with own digital signature to other Slave, thus realizing interconnection and inter-checking among Slaves.
- 3) Certain slave node BSl receives and aggregates the verification results of other Slave, then compares them with own verification result, and after verification, it sends a Reply to Leader, which includes Slave’s own verification results (my_result), all received verification results (Rece_results), the final conclusion of the verification comparison (Comparision), and the corresponding digital signature. The formal description of the aforesaid process is as follows.
(26) where
,
.
- 4) Leader collects and aggregates the validation responses from all Slave. If all Slave verify the exactitude and legitimacy of the current block, Leader integrates the data block, the set of certificates of Slave involved in the verification ({CertBS}) and the corresponding digital signatures, then sends them to all Slave. Thereafter, the data block will be stored in the consortium blockchain in chronological order, from which Leader also receives the system’s reward. The formal description of the above process is as follows.
(27) where,
,
.
- 5) In case some enterprise headquarter nodes fail to pass the current verification results, Leader will analyze and check the verification results. If deemed necessary, the data block will be resent to these enterprise headquarter nodes for secondary validation, however if there still exist nodes that do not pass the validation, according to the principle of majority rule, only if more than a certain percentage of enterprise headquarter nodes have validated the data block, the data block will be loaded into the consortium blockchain through the process described in 4). Concurrently, Leader will further scrutinize the unverified results, and determine whether these nodes have malicious behaviors, and the malicious nodes will be promptly handled, so as to ensure the safe and stable operation of the system.
The aforementioned consensus process ensures that the enterprise state data will be stored in the consortium blockchain Fabric, so as to provide correlation data traceability and analysis support for the evaluation of enterprise business environment.
5 Experiments and analysis
5.1 Experimental setup
The experiments perform on the data set in XBlock, a blockchain data intelligence platform developed by InplusLab lab, a blockchain and smart finance research center. All blockchain datasets are standardized, cleaned and categorized, and unified into a standard format, and the specific contents of the datasets are shown in Table 3.
The experiments were first conducted to validate the modified read and write performance in the off-chain Level DB. The modified Level DB is deployed in a standalone environment, and the experimental environment is shown in Table 4.
The experimental comparison of storage efficiency and storage overhead for on-chain enterprise composite blockchain storage was then performed. The experiment was carried out on a cluster of 20 nodes, where each node, equipped with a 32-core 2.10GHz Intel Xeon CPU, 128GB RAM, 10T storage space, was running CentOS version 7 operation system. Docker virtualization technology is utilized to deploy the blockchain, while Kubernetes is employed to manage docker clusters. The servers communicate with each other via a gigabit network and communicate between containers via flannel technology. All codes are written in C++, and Level DB is used to store off-chain data. The source codes are available at https://github.com/123lsiop/Enterprise-composite-blockchain-construction-method.
5.2 Comparison of read and write performance of Level DB data off the chain
Prior to modifying the design of L0, the reading operation latency mainly consists of reading memory and reading disk. The former includes the reading of MemTable and Immutable MemTable, while the latter is reading each layer of SSTable file from low to high. More specifically, reading disk comprises the following steps:
- Reading the index in SSTable file to determine the location of data.
- Copying the data block to the memory and deserializing it to memory format.
- Finding the specific data in the data block.
Regarding the write operation latency, it encompasses three kinds of latency, one is incurred when writing the data and its checksum to disk before the data is written to the memory table. The second is the latency of writing data to MemTable, and the third is the tail latency brought by two merges (Minor Compaction and Major Compaction).
In the Level DB read and write performance test comparison, the upper limit number of L0 files is set to 10, and the latency is compared at three different sizes of 4kB, 8kB and 16kB Value. For each Value, 100,000 uniformly distributed test data are generated respectively, and then 100,000 queries uniformly distributed over the range of test data are generated. As shown in Figs 10 and 11, the read and write latency of Level DB before and after the modification are illustrated respectively.
As shown in Figs 10(A) and 11(A), on the one hand, the overall reading latency is reduced by about 50% by the modified L0 architecture. It can be attributed to placing the L0 layer on the NVM, which shortens the query path for the L0 layer. Additionally, CPU can directly access data on the NVM, as a result, the data copy and deserialization in Fig 10(A) are omitted. On the other hand, the lookup learning index is introduced, which is also more efficient than the binary search of the traditional SSTable.
As shown in Figs 10(B) and 11(B), the overall write latency is reduced by about 30% compared to both before and after the modified L0 layer design. Although the paths for writing memory tables and writing logs are basically unchanged, the process of memory table persistence is changed from DRAM to SSD to DRAM to NVM, the majority of the reduction in latency is attributed to the merge process.
5.3 On-chain enterprise composite blockchain storage efficiency comparison
Experiment evaluated the storage efficiency of the enterprise composite blockchain compared to the public blockchain Ethereum and the enterprise blockchains Hyperledger Fabric and Quorum. As illustrated in Fig 12, the horizontal coordinate indicates the stored enterprise entity data set and the vertical coordinate indicates the time required for storage.
As can be seen in Fig 12, the storage efficiency of the enterprise composite blockchain storage structure proposed in this paper is faster than others. It is because that the public blockchain in the composite blockchain adopts the DAG-based block structure and Conflux consensus. For one thing, the DAG-based block structure adopts the form of tree diagram, which speeds up the block-out speed by parallel processing and does not reduce its security due to the forking problem. For another thing, Conflux consensus adopts Ghost protocol to select the master chain and selects the largest subtree as a node of the master chain, so that every enterprise can upload transactions in parallel, which accelerates the block construction and consensus and ultimately enhances the storage efficiency.
5.4 On-chain enterprise composite blockchain storage overhead comparison
The experiment simulated the overhead of blockchain storage structure using enterprise composite blockchain compared with that using the public blockchain Ethereum and the enterprise blockchains Hyperledger Fabric and Quorum. The horizontal coordinate indicates the storage enterprise entity data set and the vertical coordinate indicates the size of space required for storage, and the experimental results on overhead are presented in Fig 13.
As can be seen from Fig 13, the storage overhead of the proposed enterprise composite blockchain storage structure is lower than that of Ethereum, Fabric and Quorum. The reduction in storage overhead is more pronounced as the volume of data increases, as the enterprise composite blockchain employs a hybrid on-chain and off-chain storage, while the consortium blockchain serves as an auxiliary storage for enterprise state data and can provide indexing support for the public blockchain, which thus further reduces the communication and storage burden of the blockchain system. Consequently, this approach effectively alleviates the problem of high pressure on-chain storage.
6 Conclusion
The business environment plays a critical role in the business condition of enterprises and local economic development. However, the evaluation of business environment is often hindered by the low quality of enterprise raw data. To address this issue, an enterprise composite blockchain construction method is proposed for business environment, which leverages a hybrid on-chain and off-chain storage mode. Specifically, the improved hash function SHA256 algorithm is introduced to encrypt the enterprise raw data and store it in the off-chain Level DB database based on non-volatile memory. Then, the data are stored on chain, and the Key values in Level DB are stored to the DAG-based Conflux public blockchain, and the enterprise state data are deposited to the consortium blockchain Hyperledger Fabric, correspondingly to provide credible deposition data for business environment evaluation respectively. Finally, experimental comparison and analysis verified the effectiveness of the proposed method.
Indeed, the work presented in this paper represents a significant step towards improving the performance and reliability of blockchain systems for data storage in business environment evaluation. However, there are several unresolved research questions that require further investigation. For example, future research could focus on investigating the potential of blockchain-based decentralized identity systems and their impact on privacy, security, and user control on private enterprise data. Additionally, the impact of emerging consensus algorithms and governance models on the scalability, efficiency and decentralization of blockchain networks could also be analyzed.
Acknowledgments
The authors thank the reviewers for their constructive comments in improving the quality of this paper.
References
- 1. Zhao X, Yi C, Zhan Y, et al. Business environment distance and innovation performance of EMNEs: The mediating effect of R&D internationalization[J]. Journal of Innovation & Knowledge, 2022, 7(4): 10185–10197. https://doi.org/10.1016/j.jik.2022.100241
- 2. Haschka R E, Herwartz H, Struthmann P, et al. The joint effects of financial development and the business environment on firm growth: Evidence from Vietnam[J]. Journal of Comparative Economics, 2022, 50(2): 486–506. https://doi.org/10.1016/j.jce.2021.11.002
- 3. Zhang W, Wang Z, Li X. Blockchain-based decentralized federated transfer learning methodology for collaborative machinery fault diagnosis[J]. Reliability Engineering & System Safety, 2023, 229: 108885–108897. https://doi.org/10.1016/j.ress.2022.108885
- 4. Ramesh D, Mishra R, Atrey P K, et al. Blockchain based efficient tamper-proof EHR storage for decentralized cloud-assisted storage[J]. Alexandria Engineering Journal, 2023, 68: 205–226. https://doi.org/10.1016/j.aej.2023.01.012
- 5. Ahmed W A H, MacCarthy B L. Blockchain-enabled supply chain traceability–How wide? How deep?[J]. International Journal of Production Economics, 2023, 263: 108963–108980. https://doi.org/10.1016/j.ijpe.2023.108963
- 6. Barj S, Ouaddah A, Mezrioui A. A Review of Privacy-Preserving Cryptographic Techniques Used in Blockchain Platforms[C]//International Conference on Digital Technologies and Applications. Cham: Springer Nature Switzerland, 2023: 230–240. https://doi.org/10.1007/978-3-031-29857-8_23
- 7. Wen Z, Liao H, Emrouznejad A. Information representation of blockchain technology: Risk evaluation of investment by personalized quantifier with cubic spline interpolation[J]. Information Processing & Management, 2021, 58(4): 102571–102584. https://doi.org/10.1016/j.ipm.2021.102571
- 8. Qureshi J N, Farooq M S, Abid A, et al. Blockchain Applications for the Internet of Things: Systematic Review and Challenges[J]. Microprocessors and Microsystems, 2022: 104632. https://doi.org/10.1016/j.micpro.2022.104632
- 9. Berdik D, Otoum S, Schmidt N, et al. A survey on blockchain for information systems management and security[J]. Information Processing & Management, 2021, 58(1): 10f2397. https://doi.org/10.1016/j.ipm.2020.102397
- 10. Lu Q, Binh Tran A, Weber I, et al. Integrated model‐driven engineering of blockchain applications for business processes and asset management[J]. Software: Practice and Experience, 2021, 51(5): 1059–1079. https://doi.org/10.1002/spe.2931
- 11. Issa W, Moustafa N, Turnbull B, et al. Blockchain-based federated learning for securing internet of things: A comprehensive survey[J]. ACM Computing Surveys, 2023, 55(9): 1–43. https://doi.org/10.1145/3560816
- 12. Wu H, Jiang S, Cao J. High-efficiency blockchain-based supply chain traceability[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(4): 3748–3758. https://doi.org/10.1109/TITS.2022.3205445
- 13. Cai M, Jiang X, Shen J, et al. SplitDB: Closing the Performance Gap for LSM-tree-based Key-Value Stores[J]. IEEE Transactions on Computers, 2023, 73(1): 206–220. https://doi.org/10.1109/TC.2023.3326982
- 14. He B, Chen W, Li F, et al. Directed acyclic graphs-based diagnosis approach using small data sets for sustainability[J]. Computers & Industrial Engineering, 2023, 176: 108944–108958. https://doi.org/10.1016/j.cie.2022.108944
- 15. Li C, Li P, Zhou D, et al. A decentralized blockchain with high throughput and fast confirmation[C]//2020 {USENIX} Annual Technical Conference ({USENIX}{ATC} 20). 2020: 515–528.
- 16. Dai Y, Xu Y, Ganesan A, et al. From {WiscKey} to Bourbon: A Learned Index for {Log-Structured} Merge Trees[C]//14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 2020: 155–171.
- 17. Chen Y, Lu Y, Yang F, et al. Flatstore: An efficient log-structured key-value storage engine for persistent memory[C]//Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 2020: 1077–1091. https://doi.org/10.1145/3373376.3378515
- 18. Zhang W, Zhao X, Jiang S, et al. ChameleonDB: a key-value store for optane persistent memory[C]//Proceedings of the Sixteenth European Conference on Computer Systems. 2021: 194–209. https://doi.org/10.1145/3447786.3456237
- 19. Yang Y U, Huiqi H U, Xuan Z. Optimization of LSM-tree storage systems based on non-volatile memory[J]. Journal of East China Normal University (Natural Science), 2021, 2021(5): 37–47.
- 20.
Zamyatin A, Stifter N, Judmayer A, et al. A Wild Velvet Fork Appears! Inclusive Blockchain Protocol Changes in Practice[C]//Financial Cryptography and Data Security: FC 2018. Springer Berlin Heidelberg, 2019: 31–42. https://doi.org/10.1007/978-3-662-58820-8_3
- 21.
Sompolinsky Y, Wyborski S, Zohar A. PHANTOM GHOSTDAG: a scalable generalization of Nakamoto consensus[C]//Proceedings of the 3rd ACM Conference on Advances in Financial Technologies. 2021: 57–70. https://doi.org/10.1145/3479722.3480990
- 22. Das D. Toward next generation of blockchain using improvized Bitcoin-NG [J]. IEEE Transactions on Computational Social Systems, 2021, 8(2): 512–521. https://doi.org/10.1109/TCSS.2021.3049477
- 23.
Gazsi J S, Zafreen S, Dagher G G, et al. Vault: A scalable blockchain-based protocol for secure data access and collaboration[C]//2021 IEEE International Conference on Blockchain (Blockchain). IEEE, 2021: 376–381. https://doi.org/10.1109/Blockchain53845.2021.00059
- 24. Fotohi R, Aliee F S. Securing communication between things using blockchain technology based on authentication and SHA-256 to improving scalability in large-scale IoT[J]. Computer Networks, 2021, 197: 108331–108344. https://doi.org/10.1016/j.comnet.2021.108331
- 25.
National Institutes of Standards and Technology (NIST)[Z]. Secure hash standard. FIPS 180–2. Aug. 2002.
- 26. Dong J, Jiang R, Xiao R, et al. Event stream learning using spatio-temporal event surface[J]. Neural Networks,2022,154: 543–559. https://doi.org/10.1016/j.neunet.2022.07.010