Anonymisation

How to implement anonymisation across animal records

Work Package 1
Tutorial
Author

Saba Noor, Miel Hostens, et al.

Published

June 21, 2023

Anonymization

For privacy and confidentiality of sensitive information, we explain the anonymization technique for both R and Python code. For this, the SHA256 algorithm is used because it does not allow reverse engineering, and the sensitive information could not be linked back to the specific entity or individual.

Function to apply SHA-256 hashing in R

require(openssl)
# Function to apply SHA-256 hashing
sha256_hash <- function(data) {
  openssl::sha256(data)
}

dplyr::mutate(
    Filenumber= sha256_hash(as.character(Filenumber)),
    Samplenumber= sha256_hash(as.character(Samplenumber))
)

Function to apply SHA-256 hashing in R

! pip install openpyxl
import hashlib

# Generate anonymized values for file number and sample number
FileNumber = hashlib.sha256(str(FileNumber).encode()).hexdigest()
SampleNumber = hashlib.sha256(SampleNumber).encode()).hexdigest()

# Add anonymized values to the RDF graph
g.add((Sample, onto.hasFileNumber, Literal(FileNumber, datatype=XSD.string)))
g.add((Sample, onto.hasSampleNumber, Literal(SampleNumber, datatype=XSD.string)))