This program is an implementation of an algorithm designed by Segre et.al. . It follows a protocol intended to combine two datasets by their confidential identifiers, removing all the duplicate records without revealing the identifiers. For example, two hospitals want to combine their medical records to perform clinical studies on diseases. However, to identify duplicates, we specify patients by confidential information, usually a Social Security Number (SSN). By law, hospitals are not allowed to reveal SSNs. Therefore we need some sort of method to hide those SSNs. The protocol involves two parties (by convention named Alice and Bob and referred to as A and B) and nine steps. The two parties exchange encrypted files, typically through a secure direct line between the two parties or a trusted third party (trusted courier). Both parties have separate encryption and decryption keys and agree on a commutative encryption function. The execution of the protocol produces the union of the datasets in which all records with duplicate identifiers have been removed and none of the identifiers have been revealed. To implement this protocol, we chose the Massey-Omura encryption algorithm and decided to use trusted couriers to transport encrypted files. Since the protocol involves multiple steps and data must be transported at the end of each step, completion will likely require multiple runs over several days. The implementation allows easy identification and resending of any files that are lost, an automatic startup, and uses standards such as XML and Java that allow it to be run and extended easily. It does not access any network, so it can be run on highly secured machines. Analysis of the protocol revealed that the standard block-chaining algorithms did not work in this application, and alternative mechanisms were developed and implemented. This discovery and others suggest future research areas.
References  A. Segre, A. Wildenberg, V. Viekand, and Y. Zhang, “ Privacy-Preserving Data Set Union”
Dimo Dimitrov, ’07 Bulgaria
Majors: Computer Science, Mathematics
Sponsor: Andrew Wildenberg