اسکن کردن دایرکتوری فایل
این یک مقدمه است که نشان می دهد چگونه LangChain تمام اسناد موجود در یک دایرکتوری را بارگذاری می کند.
در سطح پایین، استفاده پیش فرض از UnstructuredLoader است.
from langchain_community.document_loaders import DirectoryLoader
می توانیم از پارامتر glob
برای کنترل فایل هایی که باید بارگیری شوند استفاده کنیم. لطفا توجه داشته باشید که ما اینجا فایل های .rst
یا .html
را بارگیری نمی کنیم.
loader = DirectoryLoader('../', glob="**/*.md")
docs = loader.load()
len(docs)
1
نمایش یک نوار پیشرفت
به طور پیش فرض، نوار پیشرفت نمایش داده نمی شود. برای نمایش نوار پیشرفت، کتابخانه tqdm
را نصب کنید (به عنوان مثال pip install tqdm
) و پارامتر show_progress
را به True
تنظیم کنید.
loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)
docs = loader.load()
Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)
0it [00:00, ?it/s]
استفاده از چند موضوعی
به طور پیش فرض، بارگیری در یک رشته اتفاق می افتد. برای استفاده از چند موضوعی، پرچم use_multithreading
را به True
تنظیم کنید.
loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)
docs = loader.load()
تغییر کلاس بارگذار
به طور پیش فرض، از کلاس UnstructuredLoader
استفاده می شود. با این حال، شما به راحتی می توانید نوع بارگیر را تغییر دهید.
from langchain_community.document_loaders import TextLoader
loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)
docs = loader.load()
len(docs)
1
اگر نیاز به بارگیری فایل های منبع کد Python دارید، از PythonLoader
استفاده کنید.
from langchain_community.document_loaders import PythonLoader
loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)
docs = loader.load()
len(docs)
691
تشخیص خودکار رمزگذاری فایل با استفاده از TextLoader
در این مثال، به دیدن راهبردهایی که می تواند استفاده شود هنگام استفاده از کلاس TextLoader
برای بارگیری هر تعداد فایل از یک دایرکتوری می پردازیم.
ابتدا، برای توضیح نقطه، بیایید سعی کنیم متن های مختلف با رمزگذاری های دلخواه بارگیری کنیم.
path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)
رفتار پیشفرض
loader.load()
<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="color: #800000; text-decoration-color: #800000">╭─────────────────────────────── </span><span style="color: #800000; text-decoration-color: #800000; font-weight: bold">Traceback </span><span style="color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold">(most recent call last)</span><span style="color: #800000; text-decoration-color: #800000"> ────────────────────────────────╮</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">text.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">29</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">26 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span>text = <span style="color: #808000; text-decoration-color: #808000">""</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">27 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">with</span> <span style="color: #00ffff; text-decoration-color: #00ffff">open</span>(<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path, encoding=<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.encoding) <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> f: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">28 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">try</span>: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>29 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ </span>text = f.read() <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">30 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">UnicodeDecodeError</span> <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> e: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">31 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.autodetect_encoding: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">32 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ </span>detected_encodings = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.detect_file_encodings() <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/home/spike/.pyenv/versions/3.9.11/lib/python3.9/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">codecs.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">322</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">decode</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 319 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ </span><span style="color: #0000ff; text-decoration-color: #0000ff">def</span> <span style="color: #00ff00; text-decoration-color: #00ff00">decode</span>(<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>, <span style="color: #00ffff; text-decoration-color: #00ffff">input</span>, final=<span style="color: #0000ff; text-decoration-color: #0000ff">False</span>): <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 320 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f"># decode input (taking the buffer into account)</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 321 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span>data = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.buffer + <span style="color: #00ffff; text-decoration-color: #00ffff">input</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span> 322 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span>(result, consumed) = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>._buffer_decode(data, <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.errors, final) <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 323 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f"># keep undecoded input until the next call</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 324 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span><span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.buffer = data[consumed:] <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 325 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">return</span> result <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">UnicodeDecodeError: </span><span style="color: #008000; text-decoration-color: #008000">'utf-8'</span> codec can't decode byte <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">0xca</span> in position <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">0</span>: invalid continuation byte
<span style="font-style: italic">The above exception was the direct cause of the following exception:</span>
<span style="color: #800000; text-decoration-color: #800000">╭─────────────────────────────── </span><span style="color: #800000; text-decoration-color: #800000; font-weight: bold">Traceback </span><span style="color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold">(most recent call last)</span><span style="color: #800000; text-decoration-color: #800000"> ────────────────────────────────╮</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> in <span style="color: #00ff00; text-decoration-color: #00ff00"><module></span>:<span style="color: #0000ff; text-decoration-color: #0000ff">1</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>1 loader.load() <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">2 </span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">directory.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">84</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">81 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.silent_errors: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">82 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ │ </span>logger.warning(e) <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">83 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">else</span>: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>84 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">raise</span> e <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">85 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">finally</span>: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">86 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> pbar: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">87 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ │ </span>pbar.update(<span style="color: #0000ff; text-decoration-color: #0000ff">1</span>) <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">directory.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">78</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">75 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> i.is_file(): <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">76 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> _is_visible(i.relative_to(p)) <span style="color: #ff00ff; text-decoration-color: #ff00ff">or</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.load_hidden: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">77 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">try</span>: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>78 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span>sub_docs = <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.loader_cls(<span style="color: #00ffff; text-decoration-color: #00ffff">str</span>(i), **<span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.loader_kwargs).load() <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">79 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span>docs.extend(sub_docs) <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">80 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">Exception</span> <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> e: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">81 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">if</span> <span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.silent_errors: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langchain/document_loaders/</span><span style="color: #808000; text-decoration-color: #808000; font-weight: bold">text.py</span>:<span style="color: #0000ff; text-decoration-color: #0000ff">44</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">load</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">41 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">UnicodeDecodeError</span>: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">42 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">continue</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">43 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">else</span>: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>44 <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">raise</span> <span style="color: #00ffff; text-decoration-color: #00ffff">RuntimeError</span>(<span style="color: #808000; text-decoration-color: #808000">f"Error loading {</span><span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path<span style="color: #808000; text-decoration-color: #808000">}"</span>) <span style="color: #0000ff; text-decoration-color: #0000ff">from</span> <span style="color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline">e</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">45 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">except</span> <span style="color: #00ffff; text-decoration-color: #00ffff">Exception</span> <span style="color: #0000ff; text-decoration-color: #0000ff">as</span> e: <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">46 </span><span style="color: #7f7f7f; text-decoration-color: #7f7f7f">│ │ │ │ </span><span style="color: #0000ff; text-decoration-color: #0000ff">raise</span> <span style="color: #00ffff; text-decoration-color: #00ffff">RuntimeError</span>(<span style="color: #808000; text-decoration-color: #808000">f"Error loading {</span><span style="color: #00ffff; text-decoration-color: #00ffff">self</span>.file_path<span style="color: #808000; text-decoration-color: #808000">}"</span>) <span style="color: #0000ff; text-decoration-color: #0000ff">from</span> <span style="color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline">e</span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">47 </span> <span style="color: #800000; text-decoration-color: #800000">│</span>
<span style="color: #800000; text-decoration-color: #800000">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">RuntimeError: </span>Error loading ..<span style="color: #800080; text-decoration-color: #800080">/../../../../tests/integration_tests/examples/</span><span style="color: #ff00ff; text-decoration-color: #ff00ff">example-non-utf8.txt</span>
</pre>
فایل example-non-utf8.txt
از رمزگذاری متفاوتی استفاده میکند، بنابراین تابع load()
شکست خواهد خورد و یک پیام خطا کارآمد نمایش داده میشود که نشان میدهد کدام فایل قادر به رمزگشایی نمیباشد.
در مورد رفتار پیشفرض TextLoader
، هر گونه شکست در بارگیری یک سند منجر به شکست کامل فرآیند بارگیری میشود و هیچ سندی بارگیری نمیشود.
B. ردیابی بارگیری فایلهای ناموفق
میتوانید پارامتر silent_errors
را به DirectoryLoader
منتقل کنید تا از بارگیری فایلهای ناموفق گذشته و فرآیند بارگیری ادامه یابد.
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)
docs = loader.load()
خطا در بارگیری ../../../../../tests/integration_tests/examples/example-non-utf8.txt
منابع_سند = [سند.metadata['source'] for سند in docs]
منابع_سند
['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']
C. تشخیص خودکار رمزگذاری
همچنین میتوانیم از TextLoader
بخواهیم که قبل از شکست، رمزگذاری فایل را به صورت خودکار شناسایی کند با منتقل کردن autodetect_encoding
به کلاس بارگیر.
text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
منابع_سند = [سند.metadata['source'] for سند in docs]
منابع_سند
['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
'../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']