'IT-Consultant' 카테고리의 글 목록 (127 Page)

본문 바로가기

IT-Consultant

query와 searcher로 검색 수행 과정 Hits(Searcher s, Query q, Filter f) throws IOException { weight = q.weight(s); searcher = s; filter = f; getMoreDocs(50); // retrieve 100 initially } weight = q.weight(s); 여기서 TF, IDF를 구한다. getMoreDocs(50);에서 해당 Document를 가져온다. 그러면 weight를 좀더 확인해보자. public Weight weight(Searcher searcher) throws IOException { Query query = searcher.rewrite(this); Weight weight = query.createWeight(searcher); flo.. 더보기

query와 searcher로 검색 수행 과정 Hits(Searcher s, Query q, Filter f) throws IOException { weight = q.weight(s); searcher = s; filter = f; getMoreDocs(50); // retrieve 100 initially } weight = q.weight(s); 여기서 TF, IDF를 구한다. getMoreDocs(50);에서 해당 Document를 가져온다. 그러면 weight를 좀더 확인해보자. public Weight weight(Searcher searcher) throws IOException { Query query = searcher.rewrite(this); Weight weight = query.createWeight(searcher); flo.. 더보기

특정 Term에 대한 Terminfo 찾기 /** Returns the TermInfo for a Term in the set, or null. */ TermInfo get(Term term) throws IOException { if (size == 0) return null; ensureIndexIsRead(); // optimize sequential access: first try scanning cached enum w/o seeking SegmentTermEnum enumerator = getEnum(); if (enumerator.term() != null // term is at or past current && ((enumerator.prev() != null && term.compareTo(enumerator.prev())> 0.. 더보기

특정 Term에 대한 Terminfo 찾기 /** Returns the TermInfo for a Term in the set, or null. */ TermInfo get(Term term) throws IOException { if (size == 0) return null; ensureIndexIsRead(); // optimize sequential access: first try scanning cached enum w/o seeking SegmentTermEnum enumerator = getEnum(); if (enumerator.term() != null // term is at or past current && ((enumerator.prev() != null && term.compareTo(enumerator.prev())> 0.. 더보기

Lucene에서 TF, IDF 구하는 소스 package org.apache.lucene.search; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the.. 더보기

Lucene에서 TF, IDF 구하는 소스 package org.apache.lucene.search; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the.. 더보기

Title 필드에 test란 데이터를 색인시 6개 파일에 어떻게 저장될까? fnm, fdx, fdt 이 세개 파일엔 특별한 알고리즘이 없다. 필드를 저장할때 제일 앞에 필드 갯수 그리고 그 다음엔 필드 길이 필드 데이터 순으로 저장한다. 따라서 별도로 공부할 필요가 없을 것 같다. tis 파일에는 term text가 저장되고, frequence와 position정보의 pointer가 저장된다. frq 파일에는 frequence 정보만 저장된다. prx 파일에는 position정보만 저장된다. 뭐 어려운건 아닌데.. 예를 들면 Title 필드에 "test test" 데이터를 색인할때 fnm,fdx,fdt 파일에 Title 필드에 대한 정보가 들어가고 tis 파일에 term에 대한 정보가 들어간다. 기본적으로 tis에도 frequence 정보가 들어가는데 16이상일 경우 offse.. 더보기

Title 필드에 test란 데이터를 색인시 6개 파일에 어떻게 저장될까? fnm, fdx, fdt 이 세개 파일엔 특별한 알고리즘이 없다. 필드를 저장할때 제일 앞에 필드 갯수 그리고 그 다음엔 필드 길이 필드 데이터 순으로 저장한다. 따라서 별도로 공부할 필요가 없을 것 같다. tis 파일에는 term text가 저장되고, frequence와 position정보의 pointer가 저장된다. frq 파일에는 frequence 정보만 저장된다. prx 파일에는 position정보만 저장된다. 뭐 어려운건 아닌데.. 예를 들면 Title 필드에 "test test" 데이터를 색인할때 fnm,fdx,fdt 파일에 Title 필드에 대한 정보가 들어가고 tis 파일에 term에 대한 정보가 들어간다. 기본적으로 tis에도 frequence 정보가 들어가는데 16이상일 경우 offse.. 더보기

루씬 색인 파일 종류 원문은 http://lucene.apache.org/java/docs/fileformats.html 이곳에 있다. 여러가지 파일이 있으나 데이터 저장과 관련된 파일만 보자 . .fnm : 필드정보를 저장한다. .fdx : 필드데이터에 대한 포인터 정보가 저장된다. .fdt : 실제 필드데이터가 저장된다. .tis : Term Dictionary를 저장한다. .frq : Term Frequence를 저장한다. .prx : 위치정보를 저장한다. 더보기

루씬 색인 파일 종류 원문은 http://lucene.apache.org/java/docs/fileformats.html 이곳에 있다. 여러가지 파일이 있으나 데이터 저장과 관련된 파일만 보자 . .fnm : 필드정보를 저장한다. .fdx : 필드데이터에 대한 포인터 정보가 저장된다. .fdt : 실제 필드데이터가 저장된다. .tis : Term Dictionary를 저장한다. .frq : Term Frequence를 저장한다. .prx : 위치정보를 저장한다. 더보기

이전 1 ··· 124 125 126 127 128 다음

티스토리툴바