DUSR (Distributed Ultrafast Shape Recognition) perform the shape based ligand screening on the basis of the pharmacokinetic properties and comparison of molecular shape moments. The DUSR uses commodity hardware to perform faster implementation of USR algorithm utilizing Hadoop framework.
Before setting up Hadoop cluster, all nodes should be configured first. It can be done by following steps in given link: http://blog.puneethabm.in/hadoop-cloudera-cluster-set-up/
Hadoop multi node cluster setup is done by following this link: http://tecadmin.net/set-up-hadoop-multi-node-cluster-on-centos-redhat/
Install CDK on Master node.
Download an active reference molecule in sdf format (say file 1) against which similar shaped compounds are to be searched.
Download a chemical database of compounds in sdf format (say file 2) from which structurally similar shaped compounds are to be extracted.
Make a directory ‘/data/cache/lib’ in Hadoop filesystem and copy following jar in this directory:
cdk-core.jar
cdk-data.jar
cdk-fingerprint.jar
cdk-interfaces.jar
cdk-io.jar
cdk-ioformats.jar
cdk-isomorphism.jar
cdk-nonotify.jar
cdk-qsar.jar
cdk-qsarmolecular.jar
cdk-standard.jar
cdk-valencycheck.jar
jgraph-0.5.3.jar
vectmath-1.3.1.jar
Copy file1 and file2 to Hadoop filesystem.
Download file ‘USR_n.jar’ and run following commands for DistMapVector (DMV) approach:
hadoop jar /path/to/USRn.jar USR.AggregateJob /path/to/file1 /path/to/file2 /path/to/output_file
In this output_file is the required output.
For running via DistMapVectorLibScreen (DMVLS):
i) Download ‘lib_noreduce.jar’ and run following command for building library:
hadoop jar /path/to/USR_library.jar USR_library.AggregateJob / path/to/file2 /path/to/output_file1
ii) Download ‘screening.jar’ and run following command for final screening of compounds:
hadoop jar /path/to/screening.jar lead_compounds.AggregateJob /path/to/file1 /path/to/output_file1 /path/to/final_output_file