Title : Data Organization

Code : 2

Responsible : BD

Activities : Social evaluation and Valorization of Queries

Start Date : 2014-06-01

End Date : 2017-11-30

Objectives : In SocioPlug, we aim at an alternative complementary solution to existing systems which opens the ability to valorize user's data to many more, while respecting users' privacy. This task focusses on users' queries. They have been shown to be a valuable ressources, so that democratizing access should support the flourishing of new creative activities.

We propose to explore a solution where queries' results are handled much like files in a peer-to-peer torrent file sharing system. As for files, the users may query the system to know which queries are under the system's concern. Thus, users interested in similar queries can cooperate within a community to compute them as well as to store and disseminate their results. An user running a query can also take advantage of results already obtained by others. Cascading query results from one user to another is what we call a Query Torrent or shortly QTor. We will first focus on such a torrent over continuous queries and streams.

Expected advantages are numerous :

  1. Access to users' queries is symmetric.
  2. The query language is not fixed or constrained by a centralization, users are autonomous with respect to this aspect.
  3. Many users may gather together to handle costly computations which would be out of reach of an isolated user.
  4. Users may benefit of results obtained by others.
  5. Each user keeps control on her resources: as data, resources can exclusively be used for purposes which have been explicitly authorized. We are convinced this last to be an important feature to ensure the users' adoption of the system.
  6. Resources availability naturally fit the needs: the more popular a query is, the more resources are devoted to it, the less expensive is its computation for an interested user and the more its results are replicated and available.
  7. Optimization and scalability are improved: each query is computed only one time and its results are shared among interested users.

Without any restriction on the expressiveness of the querying language, we find it very interesting to pay a particular attention to privacy preserving aggregation queries. Here our objective is to make possible the private data mining via aggregation queries while avoiding privacy issue. In this task, our two main objectives are:

  1. to enable a community of users to share and capitalize on their queries and provided results, and
  2. to permit the use of aggregation queries preserving privacy even if related to private data.

Scientific Challenges : Three main challenges has been identified:

  • Queries based organizations. Identification of communities and their relationships relies on users needs expressed by queries. Thus, theoretical issues about queries have to be addressed as well as possible links between queries and participants organization. We are convinced that the use of a declarative language which is undoubtedly behind the success of Data Bases may also bring here many benefits and make possible a thorough of users' queries analysis in order to infer the communities and their relationships. For each possible solution, a theoretical study is necessary to evaluate the balance between complexity (cf. query answering using view) and benefits.
  • Epidemic Protocols to preserve privacy. Recent research has demonstrated the effectiveness of epidemic protocols in a variety of settings. These include not only traditional peer-to-peer applications such as data dissemination or overlay maintenance, but also user-oriented tasks such as identifying communities with similar interests, or recommending interesting items. In SOCIOPLUG context, we aim to investigate the use of epidemic protocols for computing aggregate functions in a privacy preserving manner.
  • Correctness of algorithms and protocols. The distribution on participants of the Query Torrent requires protocols and algorithms to ensure the correctness of the obtained results, much like BitTorrent guaranties that the files have not been corrupted. This question is at the crossroad of declarative language, data bases, streams querying, peer-to-peer, voluntary computing, and distributed systems.

Deliverables :

  description Dec. 2013 + months
D21 Queries based organizations : Notions and Definitions 12
D22 Queries based organizations : Solutions proposal 36
D23 Queries based organizations : Distributed considerations 36
D24 Queries based organizations : Experimental evaluation 48

Sub-tasks :

Task21 QBO : Queries based organizations : Notions and Definitions
Task22 QBO : Queries based organizations : Solutions proposal
Task23 QBO : Queries based organizations : distributed considerations
Task24 QBO : Queries based organizations : experimental evaluation

Participants :