PARASITE as the word suggests is an entity that resides on another entity exploiting the resources of the latter. The term PARASITIC COMPUTING refers to the technique of using the resources of one computer by another computer without the knowledge of the former. Distributed computing networks turn home users' computers into part of a virtual supercomputer that can perform time-intensive operations.
This seminars provides an insight into the details of how parasitic computing uses the computation power of the computers connected to the internet in solving complex mathematical problems. This technique was developed by the scientist at the Notre Dame University, Indiana (USA). According to the scientists, the transmission control protocol (TCP), could be used to solve a piece of a mathematical problem whose answer could then be relayed back to the original user.
The implementation is discussed with the NP-Complete problem as example. Unlike hackers who exploit flaws to gain direct access to machines, the Notre Dame computer scientists created a virtual computer by using the fundamental components of distributed computing.
2 Details of Parasitic Computing
2.1 How it is differ from others
3 N-P Completeness Problems
5 Implementation using TCP
6 Application:2-SAT and 3-SAT Problems
7 Alternative Applications
8 Issues in Parasitic Computing
The net is a fertile place where new ideas/products surface quite often. We have already come across many innovative ideas such as Peer-to-Peer file sharing, distributed computing etc. Parasitic computing is a new in this category. Reliable communication on the Internet is guaranteed by a standard set of protocols, used by all computers. The Notre Dame computer scientist showed that these protocols could be exploited to compute with the communication infrastructure, transforming the Internet into a distributed computer in which servers unwittingly perform computation on behalf of a remote node.
In this model, known as parasitic computing, one machine forces target computers to solve a piece of a complex computational problem merely by engaging them in standard communication. Consequently, the target computers are unaware that they have performed computation for the benefit of a commanding node. As experimental evidence of the principle of parasitic computing, the scientists harnessed the power of several web servers across the globe, which unknown to themâ€œwork together to solve an NP complete problem.
Sending a message through the Internet is a sophisticated process regulated by layers of complex protocols. For example, when a user selects a URL (uniform resource locator), requesting a web page, the browser opens a transmission control protocol (TCP) connection to a web server. It then issues a hyper-text transmission protocol (HTTP) request over the TCP connection. The TCP message is carried via the Internet protocol (IP), which might break the message into several packages, which navigate independently through numerous routers between source and destination. When an HTTP request reaches its target web server, a response is returned via the same TCP connection to the user's browser. The original message is reconstructed through a series of consecutive steps, involving IP and TCP; it is finally interpreted at the HTTP level, eliciting the appropriate response (such as sending the requested web page). Thus, even a seemingly simple request for a web page involves a significant amount of computation in the network and at the computers at the end points.
In essence, a `parasitic computer' is a realization of an abstract machine for a distributed computer that is built upon standard Internet communication protocols. We use a parasitic computer to solve the well known NP-complete satisfiability problem, by engaging various web servers physically located in North America, Europe, and Asia, each of which unknowingly participated in the experiment. Like the [email protected]
project, parasitic computing decomposes a complex problem into computations that can be evaluated independently and solved by computers connected to the Internet; unlike the SETI project, however, it does so without the knowledge of the participating servers. Unlike `cracking' (breaking into a computer) or computer viruses, however, parasitic computing does not compromise the security of the targeted servers, and accesses only those parts of the servers that have been made explicitly available for Internet communication.
2. DETAILS OF PARASITIC COMPUTING
Parasitic computing is a concept by which one can use the resources of machines that are connected on the Internet. This technology exploits open Internet protocols to use the resources of remote machines. As the name suggests, the machine that requires the services of others does not need to be authorized by the latter. Any machine, which is connected to the Internet, has to carry out minimum processing of any packet they receive without any authorization. This concept is exploited by parasitic computing in order to make use of computing powers of remote machines and web servers all around the globe. So one cannot really stop their machines from being utilized in this manner.
2.1 How it differs from others
Â¢ Cluster computing
Parasitic computing differs from cluster computing. Cluster computing is an idea in which many computers pool in their resources willingly to create a cumulative power equivalent to that of a supercomputer capable of solving complex computational problems. In contrast, parasitic computing does not require the willingness of any target machine to participate in the problem solving. This way one can make use of many distributed resources across the Internet, which is otherwise remaining idle. Also if one
divides the task in such a manner that no computer is overloaded remote machine performance wonâ„¢t deteriorate much and our task will be also be accomplished.
Parasitic computing though works without authorization it is entirely different from the concept of cracking. In cracking data is sent to the remote computer with malicious intentions and in order to corrupt some resource of the remote machine whereas in this case we are utilizing those resources for a constructive purpose and also accessing only those parts of remote machines, which are made open on the Internet and that is done without any malicious intentions but it can cause delay of services in the remote machines as resources are being utilized without the knowledge of the owner.
3.THE NP-COMPLETE PROBLEM
A problem is assigned to the NP (nondeterministic polynomial time) class if it is verifiable in polynomial time by a Nondeterministic Turing Machine (A nondeterministic Turing Machine is a "parallel" Turing Machine which can take many computational paths simultaneously, with the restriction that the parallel Turing machines cannot communicate.). A problem is NP-hard if an algorithm for solving it can be translated into one for solving any other NP-problem. NP-hard therefore means "at least as hard as any NP-problem", although it might, in fact, be harder. A problem which is both NP and NP-hard is said to be an NP-Complete problem. Examples of NP-Complete problems are the travelling salesman problem and the satisfiability problem.
The `satisfiability' (or SAT) problem involves finding a solution to a Boolean equation that satisfies a number of logical clauses. For example, (x1 XOR x2) AND (x2 AND x3) in principle has 23 potential solutions, but it is satisfied only by the solution x1 = 1, x2 = 0, and x3 = 1. This is called a 2-SAT problem because each clause, shown in parentheses, involves two variables. The more difficult 3-SAT problem is known to be NP complete, which in practice means that there is no known polynomial-time algorithm which solves it. Indeed, the best known algorithm for an n-variable SAT problem scales exponentially with n . Here we follow a brute-force approach by instructing target computers to evaluate, in a distributed fashion, each of the 2n potential solutions.
Travelling salesmen problem involves working out the shortest route that a fictional salesman would have to take to visit all possible locations on a hypothetical map. The more locations on the hypothetical map means more potential routes, and the longer it would take any single computer to crank through all possible combinations. But by sharing the job of working out which route is shortest, the total time it takes to solve any particular travelling salesman problem can be vastly reduced.
To solve many NP complete problems, such as the traveling salesman or the satisfiability problem, a common technique is to generate a large number of candidate solutions and then test the candidates for their adequacy. Because the candidate solutions can be tested in parallel, an effective computer architecture for these problems is one that supports simultaneous evaluation of many tests. Four Notre Dame professors recently discovered a new Internet vulnerability that is commonly known as "parasitic computing." The researchers found a way to "trick" Web servers around the world into solving logic math problems without the server's permission. The researchers found that they could tag a logic problem onto the check sum (the bit amount that is sent when a Web page is requested) and the Web server would process the request. When a Web page was requested without the correct check sum, the server would not respond to the request. Each of the math problems that were tagged on to the request by the researchers was broken down into smaller pieces that were evaluated by servers in North America, Europe and Asia. The results from each were used to build a solution. Using a remote server, the team divided the problem into packages, each associated with a potential answer. The bits were then hidden inside components of the standard transmission control protocol of the Internet, and sent on their merry way. The major discovery in this experiment is that other computers are answering logical questions without knowledge of doing so. The work is performed without consent, creating an ethical dilemma. The technique does not violate the security of the unknowing server; it only uses areas that are open for public access. They find it useful because they found a way to use a computer elsewhere to solve a problem.
Here, the computer consists of a collection of target nodes connected to a network, where each of the target nodes contains an arithmetic and logic unit (ALU) that is capable of performing the desired test and a network interface (NIF) that allows the node to send and receive messages across the network. A single home parasite node initiates the computation, sends messages to the targets directing them to perform the tests, and tabulates the results.
Owing to the many layers of computation involved in receiving and interpreting a message, there are several Internet protocols that, in principle, could be exploited to perform a parasitic computation. For example, an IP-level interpretation could force routers to solve the problem, but such an implementation creates unwanted local bottlenecks. To truly delegate the computation to a remote target computer, we need to implement it at the TCP or higher levels. Potential candidate protocols include TCP, HITP, or encryption/ decryption with secure socket layer (SSL).
5.HOW TO TRICK OTHER PEOPLEâ„¢S COMPUTERS TO SOLVE A MATH PROBLEM
Fig1: Schematic diagram of our prototype parasitic computer.
A single parasite node coordinates the computations occurring remotely in the Internet protocols. It sends specially constructed messages to some number of targeted nodes , which are web servers consisting of an arithmetic and logic unit (ALU) and a network interface (NIF).ie. a single home parasite node initiates the computation, sends messages to the targets directing them to perform the tests, and tabulates the results.The communication system that brings you the Web page of your choice can be exploited to perform computations. In effect, one computer can co-opt other Internet computers to solve pieces of a complex computational problem. The enslaved computers simply handle what appear to be routine Web page requests and related messages, but the disguised messages ingeniously encode possible solutions to a mathematical problem. If the solution is correct, a message returns to the original sender. The target computers are unaware that they have performed computation for the benefit of a commanding node
The seemingly simple request for a Web page involves a significant amount of frenetic behind-the-scenes computation aimed at finding, delivering, and displaying the desired page. One key component, governed by the so-called transmission control protocol (TCP), involves a calculation to determine whether a chunk of data was delivered without error-checksum computation. Information sent across the Internet is typically split into small chunks, or packets, that travelâ€often independently of each otherâ€to their common destination. Each packet bears a header providing data about its source and destination and carrying a numerical value related to the packet's contents. When a computer receives a packet of information, it checks for errors by performing a calculation and comparing the result with the numerical value in the packet's header (see "How TCP error detection works," below). Such a calculation would detect, for example, the change of one bit from 0 to 1 or 1 to 0. Packets found to be corrupted are discarded.
6.IMPLEMENTATION USING TCP
Sending a message over an internet is a very sophisticated process as the message is processed across many layers from HTTP then to TCP then to IP layer, going through data link layer finally to the physical layer and in the same manner the message is constructed back at the destination. To implement this concept of parasitic computing we can choose to exploit processing theoretically any of these layers but below TCP layer it is not very beneficial. Till now there has been only one implementation, which has exploited this concept of parasitic computing. Idea is to use some feature of the protocol in such a manner that remote machines respond to the request unknowingly that they are involved in solving a complex problem and they believe that they are responding to a simple application request over TCP connection.
The main target problems for such distributed environments are NP-complete problems i.e. non deterministic polynomial problems. These problems are such that their steps cannot be expressed in terms of polynomial time and therefore to know the right solutions one has to evaluate many possible alternatives. The property, which can be exploited here, is that all the alternative solutions can be evaluated in parallel and therefore different machines across the Internet can be used simultaneously for evaluation thousands of possible candidate solutions for any such problem. Like in this case the protocol, which is being used for this purpose, is TCP. To understand the implementation
one first needs to have a brief idea of TCP checksum.
Fig 2:Levels of communication between the parasitic
Node and target node
The checksum field is the 16 bit one's complement of the one's complement sum of all 16-bit words in the header and text. If a segment contains an odd number of header and text octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes. The pad is not transmitted as part of the segment. While computing the checksum, the checksum field itself is replaced with zeros. This information is carried in the Internet Protocol and is transferred across the TCP/Network interface in the arguments or results of calls by the TCP on the IP.
7.SOLVING 2-SAT AND 3-SAT PROBLEMS
The problem that we are addressing is called the satisfiablity problem, which means, deciding whether a given Boolean formula in conjunctive normal form has an
assignment that makes the formula "true." In 1971, Cook showed that the problem is NPcomplete. Before we define what the problem is, we must know what a conjunctive normal form means and what is NP-completeness.
Conjunctive Normal Form
A statement is in conjunctive normal form if it is a conjunction (sequence of ANDs) consisting of one or more conjuncts, each of which is a disjunction (OR) of one or more literals. Every statement in logic consisting of a combination of multiple ,^, , and! s can be written in conjunctive normal form. A disjunctive normal form (DNF) is a similar expression, which joins the clauses together with ORs. Satisfiability arises whenever we seek a configuration or object that must be consistent with (i.e. satisfy) a given set of constraints. The input to the problem could be a set of clauses in conjunctive normal form. The problem is to determine whether there is a truth assignment to the Boolean variables such that every clauses is simultaneously satisfied. Satisfiability is the original NP-complete problem. Despite its applications to constraint satisfaction, logic, and automatic theorem proving, it is perhaps most important theoretically as the root problem from which all other NP-completeness proofs originate.Now let us understand how such problems were tackled by parasitic computing. Note that all packets are inserted at the IP level, bypassing the TCP layer of the host, to avoid the client side TCP( which builds it s own checksum).
Consider the following expression in 2-SAT
(x1 x2)^(x3 x4)^(x5 x6)^(x7 x8)^(x9 ^x10)^(x11 x12)^(x13^x14)^(x15 x16)
The expression contains 16 variables, which are in CNF. They are connected by XORs (exclusive ORs) or by conjunction. A XOR clause is true if one of the literals is true and a conjunctive clause is true iff both the literals are true. A message is created in which each variable is appended with a zero .
Message: 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0x10 0x11 0x12 0x13 0x14 0x15 0x16
For evaluating this expression using the TCP checksum method, we create a special TCP segment that contains the message in the data part and the checksum obtained by calculation. Suppose a candidate
solution is given by 0100010100010101 (S1) 0000010001010100 (S2), the transmitted packet would look like this:
p= (x1 x2)^(x3 x4)^(x5 x6)^(x7 x8)^(x9 ^x10)^(x11 x12)^(x13^x14)^(x15 x16)
X Y X Y
0 0 0 0 00
0 1 1 0 01
1 0 1 0 01
1 1 0 1 10
Where Tc is the complement of the checksum sent to verify the solution. The TCP checksum function as performed by the target computer will do the following:
Â¢ Suppose the data sent is in the form of S1, S2... Sk (each 16-bit). The target machine calculates the sum of each of these 16-bit data parts and adds it to the checksum sent
Â¢ Since a number and its compliment add up to all ones (111111.... 1), the right solution will result in a sum containing all ones, thereby
enabling TCP to send it to the higher layer (HTTP). All solutions that reach the HTTP layer are considered as right ones. Therefore the construction of the message ensures that the TCP checksum fails for all messages containing the invalid solution to the posed SAT
A 3-SAT problem can be similarly addressed, the reason being that 3 Boolean literals add up within two bits without overflow, i.e., in the worst case 1 + 1 + 1 = 11 in binary.
8. ALTERNATIVE IMPLEMENTATIONS
Due to many layers of communication involved in receiving and interpreting a message, there are, in principle many layers that can be used in parasitic computation. Parasitic computing can be implemented on all underlying communication layers that a packet has to go though before reaching the target program (which is the web server in this case) for processing it. Therefore it can be implemented in the IP layer too, but it will cause unwanted local bottlenecks since it will force the routers to solve the problem and therefore degrading the performance. From what was suggested by the inventor of this concept it is more efficient to implement it using TCP of higher layers, potential candidates being TCP, HTTP, or encryption/decryption with SSL.
9. ISSUES IN PARASTIC COMPUTING
Basic protocols are exploited by parasitic computing to use resources of remote computer without any authentication as messages are exchanged based on a trust relationship. As it uses basic Internet protocols one cannot stop anyone from launching it. Disrupting functions used by parasitic computing will eliminate remote computers ability to communicate with rest of the Internet!
It can cause delay in services of the remote computer (denial of service attacks). It also causes problems in Internet service. It can clog the network and effectively bring the Internet down. It never compromises the security of the computer as it sends standard packets and no malicious packets and also one has many other efficient ways of hacking than using standard protocols functions. It probably never breaks any law, but it still leads to certain ethical issues.
For the parasite, it may not be the best way to solve as it takes large number of computational cycles to process the possible solution but it introduces the way in which common protocols, like TCP, can be exploited. Also it cannot guarantee the correctness of the result due to the possibility of false negatives and false positives. So at present parasitic computing may be a slow technique to solve, but it could be used to load heavy requests on a server and also get the solution for its problem. So, It does raise the question for future by exploiting TCP layer.
Parasitic computing moves computation onto what is logically the communication infrastructure of the Internet, blurring the distinction between computing and communication. The Notre Dame scientists have shown that the current Internet infrastructure permits one computer to instruct other computers to perform computational tasks that are beyond the target's immediate scope. Enabling all computers to swap information and services they are needed could lead to unparalleled emergent behaviour, drastically altering the current use of the Internet.
The implementation offered above represents only a proof of concept of parasitic computing. As such, the solution merely serves to illustrate the idea behind parasitic computing, and it is not efficient for practical purposes in its current form. Indeed, the TCP checksum provides a series of additions and a comparison at the cost of hundreds of machine cycles to send and receive messages, which makes it computationally inefficient. To make the model viable, the computation-to-communication ratio must increase until the computation exported by the parasitic node is larger than the amount of cycles required by the node to solve the problem itself instead of sending it to the target. However, these are drawbacks of the presented implementation and do not represent fundamental obstacles for parasitic computing. It remains to be seen, however, whether a high-level implementation of a parasitic computer, perhaps exploiting HTTP or encryption/ decryption could execute in an efficient manner.