Indeks Lucene: Dokumen hilang

Kami memiliki pengaturan Lucene yang cukup mendasar. Kami baru-baru ini memperhatikan bahwa beberapa dokumen tidak ditulis ke indeks.

Inilah cara kami membuat dokumen:

private void addToDirectory(SpecialDomainObject specialDomainObject) throws IOException     {
    Document document = new Document();
    document.add(new TextField("id", String.valueOf(specialDomainObject.getId()), Field.Store.YES));
    document.add(new TextField("name", specialDomainObject.getName(), Field.Store.YES));
    document.add(new TextField("tags", joinTags(specialDomainObject.getTags()), Field.Store.YES));
    document.add(new TextField("contents", getContents(specialDomainObject), Field.Store.YES));

    for (Language language : getAllAssociatedLanguages(specialDomainObject)) {
        document.add(new IntField("languageId", language.getId(), Field.Store.YES));
    }
    specialDomainObjectIndexWriter.updateDocument(new Term("id", document.getField("id").stringValue()), document);
    specialDomainObjectIndexWriter.commit();
}

Inilah cara kami membuat penganalisis dan penulis indeks:

<bean id="luceneVersion" class="org.apache.lucene.util.Version" factory-method="valueOf">
    <constructor-arg value="LUCENE_46"/>
</bean>

<bean id="analyzer" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
    <constructor-arg ref="luceneVersion"/>
</bean>

<bean id="specialDomainObjectIndexWriter" class="org.apache.lucene.index.IndexWriter">
    <constructor-arg ref="specialDomainObjectDirectory" />
    <constructor-arg>
        <bean class="org.apache.lucene.index.IndexWriterConfig">
            <constructor-arg ref="luceneVersion"/>
            <constructor-arg ref="analyzer" />
            <property name="openMode" value="CREATE_OR_APPEND"/>
        </bean>
    </constructor-arg>
</bean>

Pengindeksan dilakukan dengan tugas terjadwal:

@Component
public class ScheduledSpecialDomainObjectIndexCreationTask implements ScheduledIndexCreationTask {

    private static final Logger logger = LoggerFactory.getLogger(ScheduledSpecialDomainObjectIndexCreationTask.class);

    @Autowired
    private IndexOperator specialDomainObjectIndexOperator;

    @Scheduled(fixedDelay = 3600 * 1000)
    @Override
    public void createIndex() {
        Date indexCreationStartDate = new Date();
        try {
            logger.info("Updating complete special domain object index...");
            specialDomainObjectIndexOperator.createIndex();
            if (logger.isDebugEnabled()) {
                Date indexCreationEndDate = new Date();
                logger.debug("Index creation duration: {} ms", indexCreationEndDate.getTime() - indexCreationStartDate.getTime());
            }
        } catch (IOException e) {
            logger.error("Could update complete special domain object index.", e);
        }
    }
}

createIndex() diimplementasikan sebagai berikut:

@Override
public void createIndex() throws IOException {
    logger.trace("Preparing for index generation...");
    IndexWriter indexWriter = getIndexWriter();

    Date start = new Date();

    logger.trace("Deleting all documents from index...");
    indexWriter.deleteAll();

    logger.trace("Starting index generation...");
    long numberOfProcessedObjects = fillIndex();

    logger.debug("Index written in " + (new Date().getTime() - start.getTime()) + " milliseconds.");
    logger.debug("Number of processed objects: {}", numberOfProcessedObjects);
    logger.debug("Number of documents in index: {}", indexWriter.numDocs());

    indexWriter.commit();
    indexWriter.forceMerge(1);
}

@Override
protected long fillIndex() throws IOException {
    Page<SpecialDomainObject> specialDomainObjectsPage = specialDomainObjectRepository.findAll(new PageRequest(0, MAXIMUM_PAGE_ELEMENTS));
    while (true) {
        addToDirectory(specialDomainObjectsPage);
        if (specialDomainObjectsPage.hasNextPage()) {
            specialDomainObjectsPage =
                specialDomainObjectRepository.findAll(new PageRequest(specialDomainObjectsPage.getNumber() + 1, specialDomainObjectsPage.getSize()));
        } else {
            break;
        }
    }
    return specialDomainObjectsPage.getTotalElements();
}

Ada sekitar 2000 instance specialDomainObject dan sekitar 80 tidak ditulis ke indeks (kami memeriksanya dengan Luke).

Apakah ada hal yang menyebabkan dokumen hilang?


person Alexander Müller    schedule 31.01.2014    source sumber
comment
Saya harus bertanya. Apakah Anda menutup IndexWriter dengan anggun, bukan?   -  person Martín Schonaker    schedule 02.02.2014
comment
Ya, benar. Kami mengidentifikasi masalahnya.   -  person Alexander Müller    schedule 03.04.2014


Jawaban (1)


Kami menemukan masalahnya: Pengodean default sistem operasi tidak disetel ke UTF-8.

person Alexander Müller    schedule 03.04.2014