Quét Thư Mục Tệp

Đây là bài giới thiệu về cách LangChain tải tất cả các tài liệu trong một thư mục.

Ở mức thấp, mặc định là sử dụng UnstructuredLoader.

from langchain_community.document_loaders import DirectoryLoader

Chúng ta có thể sử dụng tham số glob để kiểm soát việc tải tệp. Lưu ý rằng ở đây chúng ta không tải các tệp .rst hoặc .html.

loader = DirectoryLoader('../', glob="**/*.md")
docs = loader.load()
len(docs)
1

Hiển Thị Thanh Tiến Độ

Mặc định, thanh tiến độ không được hiển thị. Để hiển thị thanh tiến độ, cài đặt thư viện tqdm (ví dụ, pip install tqdm) và đặt tham số show_progress thành True.

loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)
docs = loader.load()
Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)

0it [00:00, ?it/s]

Sử Dụng Đa Luồng

Mặc định, việc tải xảy ra trong một luồng. Để sử dụng nhiều luồng, đặt cờ use_multithreading thành true.

loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)
docs = loader.load()

Thay Đổi Lớp Tải

Mặc định, điều này sử dụng lớp UnstructuredLoader. Tuy nhiên, bạn có thể dễ dàng thay đổi loại tải.

from langchain_community.document_loaders import TextLoader
loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)
docs = loader.load()
len(docs)
1

Nếu bạn cần tải các tệp mã nguồn Python, hãy sử dụng PythonLoader.

from langchain_community.document_loaders import PythonLoader
loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)
docs = loader.load()
len(docs)
691

Tự Động Phát Hiện Mã Hóa Tệp Sử Dụng TextLoader

Trong ví dụ này, chúng ta sẽ thấy các chiến lược có thể được sử dụng khi sử dụng lớp TextLoader để tải bất kỳ số lượng tệp nào từ một thư mục.

Đầu tiên, để minh họa điểm này, hãy cố gắng tải nhiều văn bản với các mã hóa tùy ý.

path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)

Hành động mặc định

loader.load()
<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="color: #800000; text-decoration-color: #800000">╭─────────────────────────────── </span><span style="color: #800000; text-decoration-color: #800000; font-weight: bold">Traceback </span><span style="color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold">(most recent call last)</span><span style="color: #800000; text-decoration-color: #800000"> ────────────────────────────────╮</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">text.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">29</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span>                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">26 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span>text = <span style="color: #808000; text-decoration-color: #808000">""</span>                                                                           <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">27 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">with</span> <span style="color: #00ffff; text-decoration-color: #00ffff">open</span>(<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path, encoding=<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.encoding) <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> f:                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">28 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">try</span>:                                                                            <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>29 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   </span>text = f.read()                                                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">30 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">UnicodeDecodeError</span> <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> e:                                                 <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">31 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.autodetect_encoding:                                                <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">32 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   </span>detected_encodings = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.detect_file_encodings()                       <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/home/spike/.pyenv/versions/3.9.11/lib/python3.9/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">codecs.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">322</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">decode</span>                         <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 319 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   </span><span style="color: #0000ff; text-decoration-color: #0000ff">def</span> <span style="color: #00ff00; text-decoration-color: #00ff00">decode</span>(<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>, <span style="color: #00ffff; text-decoration-color: #00ffff">input</span>, final=<span style="color: #0000ff; text-decoration-color: #0000ff">False</span>):                                                 <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 320 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f"># decode input (taking the buffer into account)</span>                                   <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 321 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span>data = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.buffer + <span style="color: #00ffff; text-decoration-color: #00ffff">input</span>                                                        <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span> 322 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span>(result, consumed) = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>._buffer_decode(data, <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.errors, final)                <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 323 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f"># keep undecoded input until the next call</span>                                        <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 324 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span><span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.buffer = data[consumed:]                                                     <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 325 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">return</span> result                                                                     <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">UnicodeDecodeError: </span><span style="color: #008000; text-decoration-color: #008000">'utf-8'</span> codec can't decode byte <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">0xca</span> in position <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">0</span>: invalid continuation byte

<span style="font-style: italic">The above exception was the direct cause of the following exception:</span>

<span style="color: #800000; text-decoration-color: #800000">╭─────────────────────────────── </span><span style="color: #800000; text-decoration-color: #800000; font-weight: bold">Traceback </span><span style="color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold">(most recent call last)</span><span style="color: #800000; text-decoration-color: #800000"> ────────────────────────────────╮</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> in <span style="color: #00ff00; text-decoration-color: #00ff00"><module></span>:<span style="color: #0000ff; text-decoration-color: #0000ff">1</span>                                                                                    <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>1 loader.load()                                                                                <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">2 </span>                                                                                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">directory.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">84</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span>                        <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">81 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.silent_errors:                                              <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">82 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   │   </span>logger.warning(e)                                               <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">83 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">else</span>:                                                               <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>84 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">raise</span> e                                                         <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">85 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">finally</span>:                                                                <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">86 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> pbar:                                                            <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">87 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   │   </span>pbar.update(<span style="color: #0000ff; text-decoration-color: #0000ff">1</span>)                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">directory.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">78</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span>                        <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">75 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> i.is_file():                                                                 <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">76 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> _is_visible(i.relative_to(p)) <span style="color: #ff00ff; text-decoration-color: #ff00ff">or</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.load_hidden:                       <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">77 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">try</span>:                                                                    <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>78 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span>sub_docs = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.loader_cls(<span style="color: #00ffff; text-decoration-color: #00ffff">str</span>(i), **<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.loader_kwargs).load()     <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">79 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span>docs.extend(sub_docs)                                               <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">80 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">Exception</span> <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> e:                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">81 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.silent_errors:                                              <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">text.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">44</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span>                             <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>                                                                                                  <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">41 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">UnicodeDecodeError</span>:                                          <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">42 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">continue</span>                                                        <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">43 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">else</span>:                                                                       <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>44 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">raise</span> <span style="color: #00ffff; text-decoration-color: #00ffff">RuntimeError</span>(<span style="color: #808000; text-decoration-color: #808000">f"Error loading {</span><span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path<span style="color: #808000; text-decoration-color: #808000">}"</span>) <span style="color: #0000ff; text-decoration-color: #0000ff">from</span> <span style="color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline">e</span>            <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">45 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">Exception</span> <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> e:                                                          <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">46 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│   │   │   │   </span><span style="color: #0000ff; text-decoration-color: #0000ff">raise</span> <span style="color: #00ffff; text-decoration-color: #00ffff">RuntimeError</span>(<span style="color: #808000; text-decoration-color: #808000">f"Error loading {</span><span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path<span style="color: #808000; text-decoration-color: #808000">}"</span>) <span style="color: #0000ff; text-decoration-color: #0000ff">from</span> <span style="color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline">e</span>                <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span>   <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">47 </span>                                                                                            <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">RuntimeError: </span>Error loading ..<span style="color: #800080; text-decoration-color: #800080">/../../../../tests/integration_tests/examples/</span><span style="color: #ff00ff; text-decoration-color: #ff00ff">example-non-utf8.txt</span>
</pre>

Tệp example-non-utf8.txt đang sử dụng một bộ mã khác, vì vậy hàm load() sẽ thất bại và hiển thị một thông báo lỗi hữu ích chỉ ra tệp nào không thể giải mã.

Về hành vi mặc định của TextLoader, bất kỳ sự cố nào trong việc tải tài liệu sẽ dẫn đến toàn bộ quá trình tải thất bại, và không có tài liệu nào sẽ được tải.

B. Bỏ qua việc tải các tệp thất bại

Bạn có thể truyền tham số silent_errors vào DirectoryLoader để bỏ qua việc tải các tệp thất bại và tiếp tục quá trình tải.

loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)
docs = loader.load()
Lỗi khi tải ../../../../../tests/integration_tests/examples/example-non-utf8.txt
doc_sources = [doc.metadata['source']  for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
     '../../../../../tests/integration_tests/examples/example-utf8.txt']

C. Tự động phát hiện mã hóa

Chúng ta cũng có thể yêu cầu TextLoader tự động phát hiện mã encoding trước khi thất bại bằng cách truyền autodetect_encoding vào lớp loader.

text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
doc_sources = [doc.metadata['source']  for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
     '../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
     '../../../../../tests/integration_tests/examples/example-utf8.txt']