Modules
Top-level package for Document Tools.
LayoutLMv2Encoder
¶
Bases: BaseEncoder
LayoutLMv2Encoder is the encoder for datasets using LayoutLMv2.
Source code in document_tools/encoders/encoders.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
__call__(batch)
¶
Call the LayoutLMv2Encoder.
Source code in document_tools/encoders/encoders.py
83 84 85 86 87 88 |
|
__init__(**kwargs)
¶
Initialize the LayoutLMv2Encoder.
Parameters¶
Dict[str, Any]
Check the documentation of the LayoutLMv2Processor for the available parameters : https://huggingface.co/docs/transformers/model_doc/layoutlmv2#transformers.LayoutLMv2Processor
Source code in document_tools/encoders/encoders.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
LayoutLMv3Encoder
¶
Bases: BaseEncoder
LayoutLMv3Encoder is the encoder for datasets using LayoutLMv3.
Source code in document_tools/encoders/encoders.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
__call__(batch)
¶
Call the LayoutLMv3Encoder.
Source code in document_tools/encoders/encoders.py
117 118 119 120 121 122 |
|
__init__(**kwargs)
¶
Initialize the LayoutLMv3Encoder.
Parameters¶
Dict[str, Any]
Check the documentation of the LayoutLMv3Processor for the available parameters : https://huggingface.co/docs/transformers/model_doc/layoutlmv3#transformers.LayoutLMv3Processor
Source code in document_tools/encoders/encoders.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
LayoutXLMEncoder
¶
Bases: BaseEncoder
LayoutXLMEncoder is the encoder for datasets using LayoutXLM.
Source code in document_tools/encoders/encoders.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
__call__(batch)
¶
Call the LayoutXLMEncoder.
Source code in document_tools/encoders/encoders.py
153 154 155 156 157 158 |
|
__init__(**kwargs)
¶
Initialize the LayoutXLMEncoder.
Parameters¶
Dict[str, Any]
Check the documentation of the LayoutXLMProcessor for the available parameters : https://huggingface.co/docs/transformers/model_doc/layoutxlm#transformers.LayoutXLMProcessor
Source code in document_tools/encoders/encoders.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
tokenize_dataset(dataset, target_model=None, image_column='image', label_column='label', batched=True, batch_size=2, cache_file_names=None, keep_in_memory=False, num_proc=None, processor_config=None, save_to_disk=False, save_path=None)
¶
Tokenize a dataset using a target model and return a new dataset with the encoded features and labels.
Parameters¶
Dataset or DatasetDict, required
Dataset to be tokenized.
str, optional (default=None)
Target model to use for tokenization.
str (default="image")
Name of the column containing the image.
str (default="label")
Name of the column containing the label.
bool (default=True)
Whether to use batched encoding.
int, optional (default=2)
Batch size for batched encoding.
Dict[str, Optional[str]], optional (default=None)
Dictionary containing the cache file names for each target model.
bool (default=False)
Whether to keep the dataset in memory.
int, optional (default=None)
Number of processes to use for batched encoding.
Dict[str, Any], optional (default=None)
Configuration for the processor of the target model.
bool (default=False)
Whether to save the dataset to disk or not.
str (default=None)
Path to save the dataset to disk if save_to_disk
is True.
Returns¶
DatasetDict Dataset with the encoded features and labels.
Raises¶
ValueError If there is no target model for the dataset. Or if saving to disk is requested but the save path is not provided. KeyError If the target model is not supported. TypeError If the dataset is not a Dataset or DatasetDict.
Source code in document_tools/tokenize.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|