Using molfeat-hype
Usage¶
Since molfeat-hype
is a molfeat
plugin, it follows the same integration principle as with any other molfeat
plugin.
The following shows examples of how to use the molfeat-hype
plugin package automatically when installed.
- Using this package directly:
from molfeat_hype.trans.llm_embeddings import LLMTransformer
mol_transf = LLMTransformer(kind="sentence-transformers/all-mpnet-base-v2")
- Enabling autodiscovery as a plugin in
molfeat
, and addition of all embedding classes as an importable attribute to the entry point groupmolfeat.trans.pretrained
:
# Put this somewhere in your code (e.g., in the root __init__ file).
# Plugins should include any subword of 'molfeat_hype'.
from molfeat.plugins import load_registered_plugins
load_registered_plugins(add_submodules=True, plugins=["hype"])
# This is now possible everywhere.
from molfeat.trans.pretrained import LLMTransformer
mol_transf = LLMTransformer(kind="sentence-transformers/all-mpnet-base-v2")
Once you have defined your molecule transformer, use it like any molfeat
MoleculeTransformer
:
import datamol as dm
smiles = dm.freesolv()["smiles"].values[:5]
mol_transf(smiles)
Caching¶
We strongly encourage you to set a cache for computing your molecular representation. By default, a temporary in-memory
cache will be set and will be purged when the Python runtime is stopped.
OpenAI¶
If you are using the OpenAI embeddings, you need to provide either an open_ai_key
argument or define one as an environment variable OPEN_AI_KEY
.
Please refer to OpenAI's API keys documentation to set up your API keys. You will need to activate billing.
llama.cpp¶
Llama integration requires the llama-cpp-python, which is a Python binding for the llama-cpp package. You will also need the Llama model weights.
Do not share any IPFS, magnet links, or any other links to model downloads anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.
Warning¶
This repository contains an experimental investigation of LLM embeddings for molecules. Please note that the consistency and usefulness of the returned molecular embeddings are not guaranteed. This project is meant for fun and exploratory purposes only, and should not be used as a demonstration of LLM capabilities for molecular embeddings. Any statements made in this repository are the opinions of the authors and do not necessarily reflect the views of any affiliated organizations or individuals. Use at your own risk.