Document loaders
DocumentLoaders load data into the standard LangChain Document format.
Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the .load method. An example use case is as follows:
from langchain_community.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(
... # <-- Integration specific parameters here
)
data = loader.load()
API Reference:CSVLoader
Webpagesโ
The below document loaders allow you to load webpages.
See this guide for a starting point: How to: load web pages.
Document Loader | Description | Package/API |
---|---|---|
Web | Uses urllib and BeautifulSoup to load and parse HTML web pages | Package |
Unstructured | Uses Unstructured to load and parse web pages | Package |
RecursiveURL | Recursively scrapes all child links from a root URL | Package |
Sitemap | Scrapes all pages on a given sitemap | Package |
Firecrawl | API service that can be deployed locally, hosted version has free credits. | API |
PDFsโ
The below document loaders allow you to load PDF documents.
See this guide for a starting point: How to: load PDF files.
Document Loader | Description | Package/API |
---|---|---|
PyPDF | Uses `pypdf` to load and parse PDFs | Package |
Unstructured | Uses Unstructured's open source library to load PDFs | Package |
Amazon Textract | Uses AWS API to load PDFs | API |
MathPix | Uses MathPix to load PDFs | Package |
PDFPlumber | Load PDF files using PDFPlumber | Package |
PyPDFDirectry | Load a directory with PDF files | Package |
PyPDFium2 | Load PDF files using PyPDFium2 | Package |
PyMuPDF | Load PDF files using PyMuPDF | Package |
PDFMiner | Load PDF files using PDFMiner | Package |
Cloud Providersโ
The below document loaders allow you to load documents from your favorite cloud providers.
Document Loader | Description | Partner Package | API reference |
---|---|---|---|
AWS S3 Directory | Load documents from an AWS S3 directory | โ | S3DirectoryLoader |
AWS S3 File | Load documents from an AWS S3 file | โ | S3FileLoader |
Azure AI Data | Load documents from Azure AI services | โ | AzureAIDataLoader |
Azure Blob Storage Container | Load documents from an Azure Blob Storage container | โ | AzureBlobStorageContainerLoader |
Azure Blob Storage File | Load documents from an Azure Blob Storage file | โ | AzureBlobStorageFileLoader |
Dropbox | Load documents from Dropbox | โ | DropboxLoader |
Google Cloud Storage Directory | Load documents from GCS bucket | โ | GCSDirectoryLoader |
Google Cloud Storage File | Load documents from GCS file object | โ | GCSFileLoader |
Google Drive | Load documents from Google Drive (Google Docs only) | โ | GoogleDriveLoader |
Huawei OBS Directory | Load documents from Huawei Object Storage Service Directory | โ | OBSDirectoryLoader |
Huawei OBS File | Load documents from Huawei Object Storage Service File | โ | OBSFileLoader |
Microsoft OneDrive | Load documents from Microsoft OneDrive | โ | OneDriveLoader |
Microsoft SharePoint | Load documents from Microsoft SharePoint | โ | SharePointLoader |
Tencent COS Directory | Load documents from Tencent Cloud Object Storage Directory | โ | TencentCOSDirectoryLoader |
Tencent COS File | Load documents from Tencent Cloud Object Storage File | โ | TencentCOSFileLoader |
Social Platformsโ
The below document loaders allow you to load documents from differnt social media platforms.
Document Loader | API reference |
---|---|
TwitterTweetLoader | |
RedditPostsLoader |