1. Private and Secure Secret
Shared MapReduce
Shlomi Dolev1, Yin Li2, and Shantanu Sharma1
1 Ben-Gurion University of the Negev, Israel
2 Xinyang Normal University, China
30th Annual IFIP WG 11.3 Conference on Data and Applications Security
and Privacy (DBSec 2016), Trento, Italy
2. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
2
3. • Why is it required to ensure privacy?
– Users send data on the clouds
– Curious mappers and reducers can
• Store useful data
• Know the given job
• Where is it required?
– Banking, financial, retail, and healthcare
Introduction
3
4. Introduction
• What others do?
4
Work on encrypted data
Authentication & Compress + Encrypt data
Encrypt data-at-rest
Secure storage of data in HDFS
Provide authentication before using Hadoop cluster
They are making ‘computational secure’ data.
But, for how long is it secured??
Make information secure data
5. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
5
6. STEP 4:
Interpolatio
n and obtain
the final
results
Database
STEP3: Master
Process
Secret-
shares of
the
database
Data owner User-side
M
M
M
R
R
STEP3: Master
Process
Secret-
shares of
the
database
M
M
M
R
R
STEP3: Master
Process
Secret-
shares of
the
database
M
M
M
R
R
Notations:
M: Mapper
R: Reducer
System
Settings
9. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
9
10. Overview of the Approach
• Accumulating Automata*
– Make shares of data (or input split)
– Send these shares to mappers
• Mappers do not know the computation and data
– Mappers have a defined accumulating automata
– Example: Search a pattern “LO” in the string
“LOXLO”
10
*S. Dolev, N. Giboa, X. Li, “Accumulating Automata and Cascaded Equations
Automata for Communicationless Information Theoretically Secure Multi-Party
Computation: Extended Abstract,” SCC@ASIACCS, pages 21—29, 2015.
11. Overview of the Approach
11
L = {v3,v4}
O = {v5,v10}
X ={v1,v1}
L = {v4,v5}
O = {v15,v20}
N1 N2 N3
Mapper 1
2 1 1
L = {v5,v6}
O = {v10,v19}
X = {v2,v2}
L = {v6,v7}
O = {v20,v29}
N1 N2 N3
Mapper 2
3 4 8
Example: Search a pattern “LO” in
the string “LOXLO”
L = {v7,v9}
O ={v15,v28}
X = {v3,v3}
L = {v7,v9}
O ={v15,v28}
N1 N2 N3
Mapper 3
4 9 27
L = {v9,v12}
O={v20,v37}
X = {v4,v4}
L = {v9,v12}
O={v20,v37}
N1 N2 N3
Mapper 4
5 16 64
Reducer
v140
v698
v1964
v4226
𝑁1 𝑀
𝑘+1
= 𝑣0
𝑁2 𝑀
𝑘+1
= 𝑁1 𝑀
𝑘
. 𝑣1
𝑁3 𝑚
𝑘+1
= 𝑁3 𝑀
𝑘
+ 𝑁2 𝑀
𝑘
. 𝑣2
LO, 2
12. Creating Secret-Shares
• Consider only English words
– Represent an alphabet as:
• ‘A’ is represented as (11, 02, 03, . . ., 026)
– Make secret-shares of every bit by selecting
different polynomials of an identical degree
– Since we use different polynomials for creating
secret-shares of each bit, multiple occurrences
of a word in a database have different secret-
shares
12
13. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
13
14. Count Operation
• String-matching based
– Matches a value of a relation with a pattern, where
the value and the pattern are of the form of
secret-shares
• Two phases
– Phase 1: Privacy-preserving counting in the clouds
– Phase 2: Result reconstruction at the user-side
14
15. Count Operation
• Working in the cloud: A mapper
– Creates an automaton of x+1 nodes where x is the
length of p
– Initializes values of these nodes
– The first node is assigned a value one (N1 = 1) and
all the other nodes are assigned values zero (Ni =
0)
15
16. Count Operation
• Working in the cloud: Count ‘John’
16
Name
Adam
John
John
N1 = 1
v1
N2 = 0 N3 = 0 N4 = 0 N5 = 0
v1 v2 v3 v4
v1 = J * A v2 = o * d v3 = a * h v4 = m * n
N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 1
v1 = J * J v2 = o * o v3 = h * h v4 = n * n
N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 2
17. Count Operation
• Working at the user side
• Result construction – a simple interpolation
operation
17
18. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
18
19. Search and Fetch Operation
• Working in the cloud: A mapper
– PHASE 1: Finding addresses of tuples containing p
– PHASE 2: Fetching all the tuples containing p
19
20. Search and Fetch Operation
Unary Occurrence
• Working in the cloud: A mapper
– No need to know the address
– Multiply
• Results will be 0 or 1 of the form of secret-shares
• Multiply the result with the tuple
– Add the values of an attribute
20
Name Department
Adam CS
John EC
John CS
Adam
Name Department
1 1
0 0
0 0
1 1
21. Search and Fetch Operation
Unary Occurrence
• Working at the user side
– A simple interpolation
21
22. Search and Fetch Operation
Multiple Occurrences
• Tradeoff
– Number of rounds vs computational load at the
user side
– Naïve algorithm and a database partitioning
algorithm
22
23. Search and Fetch Operation
Multiple Occurrences
• The first way: Naïve Algorithm
– Requires a lot of computation at the user side
while only 2 rounds are required
– Now the user can know the address
23
Name Department
Adam CS
John EC
John CS
John
Name
0
1
1Multiply
24. Search and Fetch Operation
Multiple Occurrences
• The first way: Naïve Algorithm – But HOW
TO FETCH
– Say L occurrences are there
– Create a matrix M of L*n
24
Name Department
Adam CS
John EC
John CS
0 1 0
0 0 1
Name
0
1
1
Name Department
John EC
John CS*
M
25. Search and Fetch Operation
Multiple Occurrences
• The second way
– Requires less computation at the user side while
more than 2 rounds are required
– Partitions database and knows address
– Then fetches tuples using the solution suggested
in the naïve algorithm
25
27. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
27
28. Other Operations
• Equijoin
– Use two layers of clouds, where the first layer
performs fetch operation and the second layer
performs equijoin operation
• Range query
– By using 2’s complement
– Count the occurrence of number that lies in the range and
then fetch those tuples
28
29. Outline
• Introduction
• System Settings
• Overview of the Approach
• Count Operation
• Search and Fetch Operation
• Other Operations
• Conclusion
29
31. Shlomi Dolev1, Yin Li2, and Shantanu Sharma1
1 Department of Computer Science, Ben-Gurion University of the
Negev, Israel
{dolev,sharmas}@cs.bgu.ac.il
2 Department of Computer Science, Xinyang Normal University, China
yunfeiyangli@gmail.com
Presentation is available at
http://www.cs.bgu.ac.il/~sharmas/publication.html