このページは Cloud Translation API によって翻訳されました。

コネクタのデプロイ

Cloud Search チュートリアルのこのページでは、データをインデックス登録するためにデータソースとコンテンツコネクタを設定する方法について説明します。このチュートリアルの最初から開始するには、Cloud Search スタートガイドチュートリアルをご覧ください。

コネクタをビルドする

作業ディレクトリを cloud-search-samples/end-to-end/connector ディレクトリに変更し、次のコマンドを実行します。

mvn package -DskipTests

このコマンドにより、コンテンツコネクタのビルドに必要な依存関係がダウンロードされ、コードがコンパイルされます。

サービスアカウントの認証情報を作成する

コネクタには、Cloud Search API を呼び出すためのサービスアカウントの認証情報が必要です。認証情報を作成するには:

Google Cloud コンソールに戻ります。
左側のナビゲーションで、[認証情報] をクリックします。[認証情報] ページが表示されます。
[+ 認証情報を作成] プルダウンリストをクリックし、[サービスアカウント] を選択します。[サービスアカウントの作成] ページが表示されます。
[サービスアカウント名] フィールドに「tutorial」と入力します。
サービスアカウント名の直後にあるサービスアカウント ID の値をメモします。この値は後で使用されます。
[作成] をクリックします。[サービスアカウントの権限（オプション）] ダイアログが表示されます。
[続行] をクリックします[ユーザーにこのサービスアカウントへのアクセスを許可（省略可）] ダイアログが表示されます。
[完了] をクリックします。[認証情報] 画面が表示されます。
[サービスアカウント] で、サービスアカウントのメールアドレスをクリックします。[サービスアカウントの詳細] ページが表示されます。
[キー] で [鍵を追加] プルダウンリストをクリックし、[新しい鍵を作成] を選択します。[秘密鍵の作成] ダイアログが表示されます。
[作成] をクリックします。
（省略可）[console.cloud.google.com でのダウンロードを許可しますか？] ダイアログが表示されたら、[許可] をクリックします。
秘密鍵ファイルがパソコンに保存されます。ダウンロードしたファイルの場所をメモします。このファイルは、Google Cloud Search API を呼び出すときに自身を認証できるようにコンテンツコネクタを構成するために使用されます。

サードパーティサポートを初期化する

他の Cloud Search API を呼び出す前に、Google Cloud Search のサードパーティサポートを初期化する必要があります。

Cloud Search のサードパーティサポートを初期化するには:

Cloud Search プラットフォームプロジェクトには、サービスアカウントの認証情報が含まれています。ただし、サードパーティのサポートを初期化するために、ウェブアプリケーションの認証情報を作成する必要があります。ウェブアプリケーションの認証情報を作成する方法については、認証情報を作成するをご覧ください。この手順を完了すると、クライアント ID とクライアントシークレットファイルが作成されます。
Google の OAuth 2 Playground を使用してアクセストークンを取得します。
1. [設定] をクリックし、[User your own auth credentials] をオンにします。
2. ステップ 1 のクライアント ID とクライアントシークレットを入力します。
3. [Close] をクリックします。
4. スコープフィールドに「https://www.googleapis.com/auth/cloud_search.settings」と入力し、[承認] をクリックします。OAuth 2 Playground から認証コードが返されます。
5. [Exchange authorization code for token] をクリックします。トークンが返されます。
Cloud Search に対するサードパーティサポートを初期化するには、次の curl コマンドを使用します。[YOUR_ACCESS_TOKEN] は、手順 2 で取得したトークンに置き換えてください。
```
curl --request POST \
'https://cloudsearch.googleapis.com/v1:initializeCustomer' \
  --header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --data '{}' \
  --compressed
```
成功した場合、レスポンスの本文には operation のインスタンスが含まれます。次に例を示します。
```
{
name: "operations/customers/01b3fqdm/lro/AOIL6eBv7fEfiZ_hUSpm8KQDt1Mnd6dj5Ru3MXf-jri4xK6Pyb2-Lwfn8vQKg74pgxlxjrY"
}
```
解決しない場合は、Cloud Search サポートにお問い合わせください。

operations.get を使用して、サードパーティサポートが初期化されていることを確認します。

curl \
'https://cloudsearch.googleapis.com/v1/operations/customers/01b3fqdm/lro/AOIL6eBv7fEfiZ_hUSpm8KQDt1Mnd6dj5Ru3MXf-jri4xK6Pyb2-Lwfn8vQKg74pgxlxjrY?key=
[YOUR_API_KEY]' \
--header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
--header 'Accept: application/json' \
--compressed

サードパーティの初期化が完了すると、フィールド done が true に設定されます。次に例を示します。

{
name: "operations/customers/01b3fqdm/lro/AOIL6eBv7fEfiZ_hUSpm8KQDt1Mnd6dj5Ru3MXf-jri4xK6Pyb2-Lwfn8vQKg74pgxlxjrY"
done: true
}

データソースを作成する

次に、管理コンソールでデータソースを作成します。データソースは、コネクタを使用してコンテンツをインデックスに登録するための名前空間を提供します。

Google 管理コンソールを開きます。
アプリアイコンをクリックします。[アプリの管理] ページが表示されます。
[Google Workspace] をクリックします。[Apps Google Workspace 管理] ページが表示されます。
下にスクロールして [Cloud Search] をクリックします。[Google Workspace の設定] ページが表示されます。
[サードパーティのデータソース] をクリックします。[データソース] ページが表示されます。
黄色の丸い + をクリックします。[新しいデータソースを追加] ダイアログが表示されます。
[表示名] フィールドに「tutorial」と入力します。
[サービスアカウントのメールアドレス] フィールドに、前のセクションで作成したサービスアカウントのメールアドレスを入力します。サービスアカウントのメールアドレスがわからない場合は、サービスアカウントのページで値を確認します。
[追加] をクリックします。[データソースを作成しました] ダイアログが表示されます。
[*OK] をクリックします。新しく作成されたデータソースのソース ID をメモします。ソース ID はコンテンツコネクタの設定に使用されます。

GitHub API 用の個人用アクセストークンを生成する

十分な割り当てを確保するため、コネクタには GitHub API への認証済みアクセスが必要です。わかりやすくするため、コネクタでは OAuth ではなく個人用のアクセストークンを利用します。個人トークンを使用すると、OAuth と同様に、限定された権限セットを持つユーザーとして認証を行うことができます。

GitHub にログインします。
右上にあるプロフィール写真をクリックします。プルダウンメニューが表示されます。
[設定] をクリックします。
[デベロッパー向けの設定] をクリックします。
[個人用アクセストークン] をクリックします。
[Generate personal access token] をクリックします。
[Note] フィールドに「Cloud Searchtutorial」と入力します。
public_repo スコープを確認します。
[Generate token] をクリックします。
生成されたトークンをメモします。コネクタは GitHub API を呼び出し、インデックス登録を実行するための API 割り当てを提供します。

コネクタを構成する

認証情報とデータソースを作成したら、次の値が含まれるようにコネクタ構成を更新します。

コマンドラインで cloud-search-samples/end-to-end/connector/ ディレクトリに移動します。
テキストエディタで sample-config.properties ファイルを開きます。
api.serviceAccountPrivateKeyFile パラメータを、先ほどダウンロードしたサービス認証情報のファイルパスに設定します。
api.sourceId パラメータを、先ほど作成したデータソースの ID に設定します。
github.user パラメータを GitHub ユーザー名に設定します。
以前に作成したアクセストークンに github.token パラメータを設定します。
ファイルを保存します。

スキーマを更新する

コネクタは、構造化コンテンツと非構造化コンテンツの両方をインデックスに登録します。データをインデックスに登録する前に、データソースのスキーマを更新する必要があります。次のコマンドを実行してスキーマを更新します。

mvn exec:java -Dexec.mainClass=com.google.cloudsearch.tutorial.SchemaTool \
    -Dexec.args="-Dconfig=sample-config.properties"

コネクタを実行する

コネクタを実行してインデックス登録を開始するには、次のコマンドを実行します。

mvn exec:java -Dexec.mainClass=com.google.cloudsearch.tutorial.GithubConnector \
    -Dexec.args="-Dconfig=sample-config.properties"

コネクタのデフォルト構成では、googleworkspace 組織内の単一のリポジトリがインデックスに登録されます。リポジトリのインデックス登録には約 1 分かかります。最初のインデックス登録の後、コネクタは Cloud Search インデックスに反映する必要があるリポジトリの変更を引き続きポーリングします。

コードの確認

残りのセクションでは、コネクタの構築方法について説明します。

アプリケーションの起動

コネクタのエントリポイントは GithubConnector クラスです。main メソッドは、SDK の IndexingApplication をインスタンス化して起動します。

GithubConnector.java

GitHub で表示

/**
 * Main entry point for the connector. Creates and starts an indexing
 * application using the {@code ListingConnector} template and the sample's
 * custom {@code Repository} implementation.
 *
 * @param args program command line arguments
 * @throws InterruptedException thrown if an abort is issued during initialization
 */
public static void main(String[] args) throws InterruptedException {
  Repository repository = new GithubRepository();
  IndexingConnector connector = new ListingConnector(repository);
  IndexingApplication application = new IndexingApplication.Builder(connector, args)
      .build();
  application.start();
}

SDK が提供する ListingConnector は、Cloud Search キューを利用してインデックス内のアイテムの状態を追跡する走査戦略を実装します。GitHub のコンテンツにアクセスするために、サンプルコネクタによって実装された GithubRepository にデリゲートします。

GitHub リポジトリの走査

フル走査中に、インデックス登録が必要なアイテムをキューに push するために getIds() メソッドが呼び出されます。

コネクタは、複数のリポジトリまたは組織をインデックスに登録できます。障害の影響を最小限に抑えるため、一度に 1 つの GitHub リポジトリを走査します。チェックポイントは、後続の getIds() の呼び出しでインデックスに登録するリポジトリのリストを含む走査の結果とともに返されます。エラーが発生した場合、インデックス登録は最初からではなく、現在のリポジトリで再開されます。

GithubRepository.java

GitHub で表示

/**
 * Gets all of the existing item IDs from the data repository. While
 * multiple repositories are supported, only one repository is traversed
 * per call. The remaining repositories are saved in the checkpoint
 * are traversed on subsequent calls. This minimizes the amount of
 * data that needs to be reindex in the event of an error.
 *
 * <p>This method is called by {@link ListingConnector#traverse()} during
 * <em>full traversals</em>. Every document ID and metadata hash value in
 * the <em>repository</em> is pushed to the Cloud Search queue. Each pushed
 * document is later polled and processed in the {@link #getDoc(Item)} method.
 * <p>
 * The metadata hash values are pushed to aid document change detection. The
 * queue sets the document status depending on the hash comparison. If the
 * pushed ID doesn't yet exist in Cloud Search, the document's status is
 * set to <em>new</em>. If the ID exists but has a mismatched hash value,
 * its status is set to <em>modified</em>. If the ID exists and matches
 * the hash value, its status is unchanged.
 *
 * <p>In every case, the pushed content hash value is only used for
 * comparison. The hash value is only set in the queue during an
 * update (see {@link #getDoc(Item)}).
 *
 * @param checkpoint value defined and maintained by this connector
 * @return this is typically a {@link PushItems} instance
 */
@Override
public CheckpointCloseableIterable<ApiOperation> getIds(byte[] checkpoint)
    throws RepositoryException {
  List<String> repositories;
  // Decode the checkpoint if present to get the list of remaining
  // repositories to index.
  if (checkpoint != null) {
    try {
      FullTraversalCheckpoint decodedCheckpoint = FullTraversalCheckpoint
          .fromBytes(checkpoint);
      repositories = decodedCheckpoint.getRemainingRepositories();
    } catch (IOException e) {
      throw new RepositoryException.Builder()
          .setErrorMessage("Unable to deserialize checkpoint")
          .setCause(e)
          .build();
    }
  } else {
    // No previous checkpoint, scan for repositories to index
    // based on the connector configuration.
    try {
      repositories = scanRepositories();
    } catch (IOException e) {
      throw toRepositoryError(e, Optional.of("Unable to scan repositories"));
    }
  }

  if (repositories.isEmpty()) {
    // Nothing left to index. Reset the checkpoint to null so the
    // next full traversal starts from the beginning
    Collection<ApiOperation> empty = Collections.emptyList();
    return new CheckpointCloseableIterableImpl.Builder<>(empty)
        .setCheckpoint((byte[]) null)
        .setHasMore(false)
        .build();
  }

  // Still have more repositories to index. Pop the next repository to
  // index off the list. The remaining repositories make up the next
  // checkpoint.
  String repositoryToIndex = repositories.get(0);
  repositories = repositories.subList(1, repositories.size());

  try {
    log.info(() -> String.format("Traversing repository %s", repositoryToIndex));
    Collection<ApiOperation> items = collectRepositoryItems(repositoryToIndex);
    FullTraversalCheckpoint newCheckpoint = new FullTraversalCheckpoint(repositories);
    return new CheckpointCloseableIterableImpl.Builder<>(items)
        .setHasMore(true)
        .setCheckpoint(newCheckpoint.toBytes())
        .build();
  } catch (IOException e) {
    String errorMessage = String.format("Unable to traverse repo: %s",
        repositoryToIndex);
    throw toRepositoryError(e, Optional.of(errorMessage));
  }
}

collectRepositoryItems() メソッドは、単一の GitHub リポジトリの走査を処理します。このメソッドは、キューに push されるアイテムを表す ApiOperations のコレクションを返します。アイテムは、リソース名およびアイテムの現在の状態を表すハッシュ値として push されます。

ハッシュ値は GitHub リポジトリの後続の走査で使用されます。この値は、追加のコンテンツをアップロードせずに、コンテンツが変更されたかどうかを判断するための簡単なチェックを提供します。コネクタは、すべてのアイテムをやみくもにキューに入れます。アイテムが新しい場合、またはハッシュ値が変更された場合は、キュー内のポーリングに使用できるようになります。それ以外の場合、アイテムは未改変とみなされます。

GithubRepository.java

GitHub で表示

/**
 * Fetch IDs to  push in to the queue for all items in the repository.
 * Currently captures issues & content in the master branch.
 *
 * @param name Name of repository to index
 * @return Items to push into the queue for later indexing
 * @throws IOException if error reading issues
 */
private Collection<ApiOperation> collectRepositoryItems(String name)
    throws IOException {
  List<ApiOperation> operations = new ArrayList<>();
  GHRepository repo = github.getRepository(name);

  // Add the repository as an item to be indexed
  String metadataHash = repo.getUpdatedAt().toString();
  String resourceName = repo.getHtmlUrl().getPath();
  PushItem repositoryPushItem = new PushItem()
      .setMetadataHash(metadataHash);
  PushItems items = new PushItems.Builder()
      .addPushItem(resourceName, repositoryPushItem)
      .build();

  operations.add(items);
  // Add issues/pull requests & files
  operations.add(collectIssues(repo));
  operations.add(collectContent(repo));
  return operations;
}

キューの処理

フル走査が完了すると、コネクタはキューへのポーリングを開始し、インデックス登録する必要があるアイテムを特定します。getDoc() メソッドは、キューから取得されるアイテムごとに呼び出されます。このメソッドは、GitHub からアイテムを読み取り、インデックス登録のための適切な表現に変換します。

コネクタは随時変更される可能性のあるライブデータに対して実行されるため、getDoc() はキュー内のアイテムがまだ有効であることを確認し、インデックスから存在しなくなったアイテムを削除します。

GithubRepository.java

GitHub で表示

/**
 * Gets a single data repository item and indexes it if required.
 *
 * <p>This method is called by the {@link ListingConnector} during a poll
 * of the Cloud Search queue. Each queued item is processed
 * individually depending on its state in the data repository.
 *
 * @param item the data repository item to retrieve
 * @return the item's state determines which type of
 * {@link ApiOperation} is returned:
 * {@link RepositoryDoc}, {@link DeleteItem}, or {@link PushItem}
 */
@Override
public ApiOperation getDoc(Item item) throws RepositoryException {
  log.info(() -> String.format("Processing item: %s ", item.getName()));
  Object githubObject;
  try {
    // Retrieve the item from GitHub
    githubObject = getGithubObject(item.getName());
    if (githubObject instanceof GHRepository) {
      return indexItem((GHRepository) githubObject, item);
    } else if (githubObject instanceof GHPullRequest) {
      return indexItem((GHPullRequest) githubObject, item);
    } else if (githubObject instanceof GHIssue) {
      return indexItem((GHIssue) githubObject, item);
    } else if (githubObject instanceof GHContent) {
      return indexItem((GHContent) githubObject, item);
    } else {
      String errorMessage = String.format("Unexpected item received: %s",
          item.getName());
      throw new RepositoryException.Builder()
          .setErrorMessage(errorMessage)
          .setErrorType(RepositoryException.ErrorType.UNKNOWN)
          .build();
    }
  } catch (FileNotFoundException e) {
    log.info(() -> String.format("Deleting item: %s ", item.getName()));
    return ApiOperations.deleteItem(item.getName());
  } catch (IOException e) {
    String errorMessage = String.format("Unable to retrieve item: %s",
        item.getName());
    throw toRepositoryError(e, Optional.of(errorMessage));
  }
}

コネクタがインデックスに登録する GitHub オブジェクトごとに、対応する indexItem() メソッドが Cloud Search 用のアイテム表現の作成を処理します。たとえば、コンテンツアイテムの表現を作成するには、次のようにします。

GithubRepository.java

GitHub で表示

/**
 * Build the ApiOperation to index a content item (file).
 *
 * @param content      Content item to index
 * @param previousItem Previous item state in the index
 * @return ApiOperation (RepositoryDoc if indexing,  PushItem if not modified)
 * @throws IOException if unable to create operation
 */
private ApiOperation indexItem(GHContent content, Item previousItem)
    throws IOException {
  String metadataHash = content.getSha();

  // If previously indexed and unchanged, just requeue as unmodified
  if (canSkipIndexing(previousItem, metadataHash)) {
    return notModified(previousItem.getName());
  }

  String resourceName = new URL(content.getHtmlUrl()).getPath();
  FieldOrValue<String> title = FieldOrValue.withValue(content.getName());
  FieldOrValue<String> url = FieldOrValue.withValue(content.getHtmlUrl());

  String containerName = content.getOwner().getHtmlUrl().getPath();
  String programmingLanguage = FileExtensions.getLanguageForFile(content.getName());

  // Structured data based on the schema
  Multimap<String, Object> structuredData = ArrayListMultimap.create();
  structuredData.put("organization", content.getOwner().getOwnerName());
  structuredData.put("repository", content.getOwner().getName());
  structuredData.put("path", content.getPath());
  structuredData.put("language", programmingLanguage);

  Item item = IndexingItemBuilder.fromConfiguration(resourceName)
      .setTitle(title)
      .setContainerName(containerName)
      .setSourceRepositoryUrl(url)
      .setItemType(IndexingItemBuilder.ItemType.CONTAINER_ITEM)
      .setObjectType("file")
      .setValues(structuredData)
      .setVersion(Longs.toByteArray(System.currentTimeMillis()))
      .setHash(content.getSha())
      .build();

  // Index the file content too
  String mimeType = FileTypeMap.getDefaultFileTypeMap()
      .getContentType(content.getName());
  AbstractInputStreamContent fileContent = new InputStreamContent(
      mimeType, content.read())
      .setLength(content.getSize())
      .setCloseInputStream(true);
  return new RepositoryDoc.Builder()
      .setItem(item)
      .setContent(fileContent, IndexingService.ContentFormat.RAW)
      .setRequestMode(IndexingService.RequestMode.SYNCHRONOUS)
      .build();
}

次に、検索インターフェースをデプロイします。

前へ次へ

コネクタのデプロイ

コネクタをビルドする

サービス アカウントの認証情報を作成する

サードパーティ サポートを初期化する

データソースを作成する

GitHub API 用の個人用アクセス トークンを生成する

コネクタを構成する

スキーマを更新する

コネクタを実行する

コードの確認

アプリケーションの起動

GitHub リポジトリの走査

キューの処理

サービスアカウントの認証情報を作成する

サードパーティサポートを初期化する

GitHub API 用の個人用アクセストークンを生成する