Machine-Learning-in-Molecular-Sciences

2017-Summer-School-on-the-Machine-Learning-in-the-Molecular-Sciences. This project aims to help you understand some basic machine learning models including neural network optimization plan, random forest, parameter learning, incremental learning paradigm, clustering and decision tree, etc. based on kernel regression and dimensionality reduction, feature selection and clustering technology.

View the Project on GitHub nickcafferry/Machine-Learning-in-Molecular-Sciences

Documentation Status MIT License Mathematica Python version Wolfram Cloud Huawei Clodu

Aims
Panel Topics
Course Schedule
Internal Links
External Links

Welcome to Machine Learning in the Molecular Sciences


UosEmd.md.jpg

Aims

The NYU-ECNU Center for Computational Chemistry at New York University Shanghai (a.k.a, NYU Shanghai) announced a summer school dedicated to machine learning and its applications in the molecular sciences to be held June, 2017 at the NYU Shanghai Pudong Campus. Using a combination of technical lectures and hands-on exercises, the school aimed to instruct attendees in both the fundamentals of modern machine learning techniques and to demonstrate how these approaches can be applied to solve complex computational problems in chemistry, biology, and materials science. In order to promote the idea of free to code, this project is built to help you understand some basic machine learning models mentioned below.

Panel-Topics

Fundamental topics to be covered include basic machine learning models such as kernel methods and neural networks optimization schemes, parameter learning and delta learning paradigms, clustering, and decision trees. Application areas will feature machine learning models for representing and predicting properties of individual molecules and condensed phases, learning algorithms for bypassing explicit quantum chemical and statistical mechanical calculations, and techniques applicable to biomolecular structure prediction, bioinformatics, protein-ligand binding, materials and molecular design and various others.

Course-Schedule

Codes

One of the exciting aspects of Machine-Learning (ML) techniques is their possible to democratize molecular and materials modelling with relatively economical computational calculations and low level entry for common folks. (Pople’s Gassian software makes quantum chemistry calculations really approachable).

The success of machine-learning technology relies on three contributing factors: open data, open software and open education.

Open data:

Publicly accessible structure and property databases for molecules and solid materials.

Computed structures and properties:

AFLOWLIB (Structure and property repository from high-throughput ab initio calculations of inorganic materials)

Computational Materials Repository (Infrastructure to enable collection, storage, retrieval and analysis of data from electronic-structure codes)

GDB (Databases of hypothetical small organic molecules)

Harvard Clean Energy Project (Computed properties of candidate organic solar absorber materials)

Materials Project (Computed properties of known and hypothetical materials carried out using a standard calculation scheme)

NOMAD (Input and output files from calculations using a wide variety of electronicstructure codes)

Open Quantum Materials Database (Computed properties of mostly hypothetical structures carried out using a standard calculation scheme)

NREL Materials Database (Computed properties of materials for renewable-energy applications)

TEDesignLab (Experimental and computed properties to aid the design of new thermoelectric materials)

ZINC (Commercially available organic molecules in 2D and 3D formats)

Experimental structures and properties:

ChEMBL (Bioactive molecules with drug-like properties)

ChemSpider (Royal Society of Chemistry’s structure database, featuring calculated and experimental properties from a range of sources)

Citrination (Computed and experimental properties of materials)

Crystallography Open Database (Structures of organic, inorganic, metal–organic compounds and minerals )

CSD (Repository for small-molecule organic and metal–organic crystal structures)

ICSD (Inorganic Crystal Structure Database)

MatNavi (Multiple databases targeting properties such as superconductivity and thermal conductance)

MatWeb (Datasheets for various engineering materials, including thermoplastics, semiconductors and fibres)

NIST Chemistry WebBook (High-accuracy gas-phase thermochemistry and spectroscopic data)

NIST Materials Data Repository (Repository to upload materials data associated with specifc publications)

PubChem (Biological activities of small molecules)

Open Software:

General-purpose machine-learning frameworks:

Caret (Package for machine learning in R)

Deeplearning4j (Distributed deep learning for Java)

H2O.ai (Machine-learning platform written in Java that can be imported as a Python or R library)

Keras (High-level neural-network API written in Python)

Mlpack (Scalable machine-learning library written in C++)

Scikit-learn (Machine-learning and data-mining member of the scikit family of toolboxes built around the SciPy Python library)

Weka (Collection of machine-learning algorithms and tasks written in Java)

Machine-learning tools for molecules and materials:

Amp (Package to facilitate machine learning for atomistic calculations)

ANI (Neural-network potentials for organic molecules with Python interface)

COMBO (Python library with emphasis on scalability and eficiency)

DeepChem (Python library for deep learning of chemical systems)

GAP (Gaussian approximation potentials)

MatMiner (Python library for assisting machine learning in materials science)

NOMAD (Collection of tools to explore correlations in materials datasets)

PROPhet (Code to integrate machine-learning techniques with quantum-chemistry approaches)

TensorMol (Neural-network chemistry package)

Open education:

About
Committee
Speaks
Schedule
Location
Sponsor