How will a distributed peer system control itself?
This is a complex problem. A peer to peer system consists many self-determinating nodes is hard to control, and when it goes crazy, it will be easy to launch a distributed denial-of-service (DDoS) attack.
Here are some of my thoughts, well, they are not mature and some of them might be misleading, but I will write them here.
The approach GridDistribute take will be an indepent component architecture. Under this consideration, the system will:
- Schedule independly on every nodes.
- The Host Browser will help the scheduling process by taking some jobs from one to another. This will drastically reduce the unsystematic behavior and prevent waste of bandwidth in detecting outside world.
- Every requests are proceeded when appropriate, not upon request. This will improve the usage of cache.
- Nodes co-operate with each other, but not necessarily to trust others. Scheduling is based on digital signature so it will be easy to block malicious nodes.
- Scheduling policy must be simple and efficient, when we could not obtain the both, a balancing or trade-off must be first efficient.
- The policy should be easy to proof, say, its correctness.
The scheduling plan will be also applied to the directory service. With this in concern, we will have to store more information about nodes we have contacted, so a measurement must be done with either connectivity, connection speed, and data integrity.
A distributed voting mechanism should be also applied if time permits. An artificial neural network should be over the overlaied P2P network to determine whether to divide or merge the networks. This is listed in our 2.0 TODO list, but I am not sure if we should implement this in the 1.0-RELEASE. The ANN voting mechanism will help the system to do right decisions in a timely manner.
On a high level, the system is well scheduled and it *will* work in a close to “bandwidth on demand” manner in most scenarios. The least efficient case should be at the same order of a FTP service.
At the local side, the scheduler simply schedule requests in a set of priority queue. Here is a problem how it will be organized because the data structure will be a bit complex in many faces. I have a plan of the structure, and is looking whether there exists better solution.