将 Ruby 与 Google Data API 搭配使用

Jochen Hartmann,Google Data API 团队
2008 年 4 月

简介

Ruby 是一种动态脚本语言,近年来因热门的 Rails Web 开发框架而备受关注。本文将介绍如何使用 Ruby 与 Google Data API 服务进行交互。我们不会重点介绍 Rails,而是更侧重于说明 Feed 的底层 HTTP 命令和结构。您可以使用 Ruby 的交互式 shell irb 从命令行中执行此处提供的所有示例。

您可能还记得,在 c网址 文章中,我们提到 Google Data API 使用 Atom 发布协议 来表示、创建和更新 Web 资源。此协议的优点在于,它使用标准 HTTP 动词来制定请求,并使用标准 HTTP 状态代码来回答请求。

本文中将使用的动词包括:用于检索内容的 GET、用于上传新内容的 POST 和用于更新现有内容的 PUT。在使用 Google Data API 时,您可能会遇到一些标准代码,例如 200 表示成功检索到 Feed 或条目,201 表示成功创建或更新资源。如果出现问题(例如发送的请求格式不正确),系统会返回 400 代码(表示“请求错误”)。响应正文中会提供更详细的消息,说明具体出了什么问题。

Ruby 在“Net”模块中提供了一个不错的调试选项。为了使这些代码示例尽可能简短,我在此处未启用它。

获取和安装 Ruby

如果您使用的是 Linux,则可以使用大多数软件包管理系统来安装 Ruby。如需了解其他操作系统并获取完整源代码,请访问 http://www.ruby-lang.org/en/downloads/Irb(我们将在此类示例中使用的交互式 shell)应默认安装。如需按照此处列出的代码示例进行操作,您还需要安装 XmlSimple,这是一个用于将 XML 解析为 Ruby 数据结构的小型库。如需获取/安装 XmlSimple,请访问 http://xml-simple.rubyforge.org/

在您的机器上运行 Ruby 的副本后,您可以使用 Net:HTTP 软件包向 Google 的数据服务发出基本请求。以下代码段展示了如何从 Ruby 的交互式 shell 进行必要的导入。我们要做的是要求使用“net/http”软件包,解析 YouTube 上热门视频 Feed 的网址,然后执行 HTTP GET 请求。

irb(main):001:0> require 'net/http'
=> true
irb(main):002:0> youtube_top_rated_videos_feed_uri = \
'http://gdata.youtube.com/feeds/api/standardfeeds/top_rated'
=> "http://gdata.youtube.com/feeds/api/standardfeeds/top_rated"
irb(main):003:0> uri = \
URI.parse(youtube_top_rated_videos_feed_uri)
=> #<URI::HTTP:0xfbf826e4 URL:http://gdata.youtube.com/feeds/api/standardfeeds/top_rated>

irb(main):004:0> uri.host
=> "gdata.youtube.com"
irb(main):005:0> Net::HTTP.start(uri.host, uri.port) do |http|
irb(main):006:1* puts http.get(uri.path)
irb(main):007:1> end
#<Net::HTTPOK:0xf7ef22cc>

该请求应该在命令行中回显了大量 XML。您可能已经注意到,所有商品都包含在 <feed> 元素中,并称为 <entry> 元素。我们暂时不必担心 XML 格式,我只是想说明如何使用 HTTP 发出基本的 Google Data API 请求。我们现在将切换 API 并专注于 Google 表格,因为我们可以发送和检索的信息更适合通过命令行操作。

身份验证 | 使用 Google Spreadsheets API

我们将再次从检索条目元素的 Feed 开始。不过,这次我们想使用自己的电子表格。为此,我们必须先通过 Google 账号服务进行身份验证。

您可能还记得 GData 身份验证文档中介绍了两种向 Google 服务进行身份验证的方式。AuthSub 适用于基于 Web 的应用,简而言之,它涉及令牌交换流程。AuthSub 的真正好处在于,您的应用无需存储用户凭据。ClientLogin 适用于“已安装”的应用。在 ClientLogin 流程中,用户名和密码会通过 HTTPS 发送到 Google 的服务,同时还会发送一个用于标识您要使用的服务的字符串。Google Spreadsheets API 服务由字符串 wise 标识。

切换回交互式 shell,然后通过 Google 进行身份验证。请注意,我们使用 https 发送身份验证请求和凭据:

irb(main):008:0> require 'net/https'
=> true
irb(main):009:0> http = Net::HTTP.new('www.google.com', 443)
=> #<Net::HTTP www.google.com:443 open=false>
irb(main):010:0> http.use_ssl = true
=> true
irb(main):011:0> path = '/accounts/ClientLogin'
=> "/accounts/ClientLogin"

# Now we are passing in our actual authentication data. 
# Please visit OAuth For Installed Apps for more information 
# about the accountType parameter
irb(main):014:0> data = \
irb(main):015:0* 'accountType=HOSTED_OR_GOOGLE&Email=your email' \
irb(main):016:0* '&Passwd=your password' \
irb(main):017:0* '&service=wise'

=> accountType=HOSTED_OR_GOOGLE&Email=your email&Passwd=your password&service=wise"

# Set up a hash for the headers
irb(main):018:0> headers = \
irb(main):019:0* { 'Content-Type' => 'application/x-www-form-urlencoded'}
=> {"Content-Type"=>"application/x-www-form-urlencoded"}

# Post the request and print out the response to retrieve our authentication token
irb(main):020:0> resp, data = http.post(path, data, headers)
warning: peer certificate won't be verified in this SSL session
=> [#<Net::HTTPOK 200 OK readbody=true>, "SID=DQAAAIIAAADgV7j4F-QVQjnxdDRjpslHKC3M ... [ snipping out the rest of the authentication strings ]

# Strip out our actual token (Auth) and store it
irb(main):021:0> cl_string = data[/Auth=(.*)/, 1]
=> "DQAAAIUAAADzL... [ snip ]

# Build our headers hash and add the authorization token
irb(main):022:0> headers["Authorization"] = "GoogleLogin auth=#{cl_string}"
=> "GoogleLogin auth=DQAAAIUAAADzL... [ snip ]

好的。现在我们已经完成身份验证,接下来尝试使用以下请求来检索自己的电子表格:

http://spreadsheets.google.com/feeds/spreadsheets/private/full

由于这是一个经过身份验证的请求,我们还希望传入标头。实际上,由于我们将针对各种 Feed 发出多个请求,因此不妨将此功能封装到一个名为 get_feed 的简单函数中。

# Store the URI to the feed since we may want to use it again
irb(main):023:0> spreadsheets_uri = \
irb(main):024:0* 'http://spreadsheets.google.com/feeds/spreadsheets/private/full'

# Create a simple method to obtain a feed
irb(main):025:0> def get_feed(uri, headers=nil)
irb(main):026:1> uri = URI.parse(uri)
irb(main):027:1> Net::HTTP.start(uri.host, uri.port) do |http|
irb(main):028:2* return http.get(uri.path, headers)
irb(main):029:2> end
irb(main):030:1> end
=> nil

# Lets make a request and store the response in 'my_spreadsheets'
irb(main):031:0> my_spreadsheets = get_feed(spreadsheets_uri, headers)
=> #<Net::HTTPOK 200 OK readbody=true>

irb(main):032:0> my_spreadsheets
=> #<Net::HTTPOK 200 OK readbody=true>

# Examine our XML (showing only an excerpt here...)
irb(main):033:0> my_spreadsheets.body
=> "<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'>
<id>http://spreadsheets.google.com/feeds/spreadsheets/private/full</id><updated>2008-03-20T20:49:39.211Z</updated>
<category scheme='http://schemas.google.com/spreadsheets/2006' term='http://schemas.google.com/spreadsheets/2006#spreadsheet'/>
<title type='text'>Available Spreadsheets - test.api.jhartmann@gmail.com</title><link rel='alternate' type='text/html' href='http://docs.google.com'/>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/spreadsheets/private/full'/><link rel='self' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/spreadsheets/private/full?tfe='/>
<openSearch:totalResults>6</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><entry>
<id>http://spreadsheets.google.com/feeds/spreadsheets/private/full/o04927555739056712307.4365563854844943790</id><updated>2008-03-19T20:44:41.055Z</updated><category scheme='http://schemas.google.com/spreadsheets/2006' term='http://schemas.google.com/spreadsheets/2006#spreadsheet'/><title type='text'>test02</title><content type='text'>test02</content><link rel='http://schemas.google.com/spreadsheets/2006#worksheetsfeed' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/worksheets/o04927555739056712307.4365563854844943790/private/full'/><link rel='alternate' type='text/html' href='http://spreadsheets.google.com/ccc?key=o04927555739056712307.4365563854844943790'/><link rel='self' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/spreadsheets/private/full/o04927555739056712307.4365563854844943790'/><author><name>test.api.jhartmann</name><email>test.api.jhartmann@gmail.com</email></author></entry><entry> ...

我们再次看到很多 XML,我在上面已经淡化了这些 XML,因为您无需担心从命令行解读它们。为了提高用户友好度,我们不妨使用 XmlSimple 将其解析为数据结构:

# Perform imports
irb(main):034:0> require 'rubygems'
=> true
irb(main):035:0> require 'xmlsimple'
=> true
irb(main):036:0> doc = \
irb(main):037:0* XmlSimple.xml_in(my_spreadsheets.body, 'KeyAttr' => 'name')

# Import the 'pp' module for 'pretty printing'
irb(main):038:0> require 'pp'
=> true

# 'Pretty-print' our XML document
irb(main):039:0> pp doc
{"totalResults"=>["6"],
 "category"=>
  [{"term"=>"http://schemas.google.com/spreadsheets/2006#spreadsheet",
    "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
 "title"=>
  [{"type"=>"text",
    "content"=>"Available Spreadsheets - Test-account"}],
 "startIndex"=>["1"],
 "id"=>["http://spreadsheets.google.com/feeds/spreadsheets/private/full"],
 "entry"=>
  [{"category"=>
     [{"term"=>"http://schemas.google.com/spreadsheets/2006#spreadsheet",
       "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
    "title"=>[{"type"=>"text", "content"=>"blank"}],
    "author"=>
     [{"name"=>["Test-account"],
       "email"=>["my email"]}],
    "id"=>
     ["http://spreadsheets.google.com/feeds/spreadsheets/private/full/o04927555739056712307.3387874275736238738"],
    "content"=>{"type"=>"text", "content"=>"blank"},
    "link"=>
    [ snipping out the rest of the XML ]

获取工作表

因此,如上方的输出所示,我的 Feed 包含 6 个电子表格。为简短起见,我已截断上述 XML 输出的其余部分(以及大多数其他列表中的输出)。如需深入了解此电子表格,我们需要执行以下几个步骤:

  1. 获取电子表格键
  2. 使用电子表格密钥获取我们的工作表 Feed
  3. 获取要使用的工作表的 ID
  4. 请求 cellsFeed 或 listFeed 以访问工作表的实际内容

这可能听起来工作量很大,但我会向您展示,如果我们编写一些简单的方法,这一切都会变得非常简单。cellsFeed 和 listFeed 是工作表的实际单元格内容的两种不同表示形式。listFeed 表示一整行信息,建议用于 POST 新数据。cellFeed 表示单个单元格,用于单个单元格更新或多个单个单元格的批量更新(均使用 PUT)。如需了解详情,请参阅 Google Spreadsheets API 文档

首先,我们需要提取电子表格键(在上面的 XML 输出中突出显示),然后获取工作表 Feed:

# Extract the spreadsheet key from our datastructure
irb(main):040:0> spreadsheet_key = \ 
irb(main):041:0* doc["entry"][0]["id"][0][/full\/(.*)/, 1]
=> "o04927555739056712307.3387874275736238738"

# Using our get_feed method, let's obtain the worksheet feed
irb(main):042:0> worksheet_feed_uri = \ 
irb(main):043:0* "http://spreadsheets.google.com/feeds/worksheets/#{spreadsheet_key}/private/full"
=> "http://spreadsheets.google.com/feeds/worksheets/o04927555739056712307.3387874275736238738/private/full"

irb(main):044:0> worksheet_response = get_feed(worksheet_feed_uri, headers)
=> #<Net::HTTPOK 200 OK readbody=true>

# Parse the XML into a datastructure
irb(main):045:0> worksheet_data = \ 
irb(main):046:0* XmlSimple.xml_in(worksheet_response.body, 'KeyAttr' => 'name')
=> {"totalResults"=>["1"], "category"=>[{"term ... [ snip ]

# And pretty-print it
irb(main):047:0> pp worksheet_data
{"totalResults"=>["1"],
 "category"=>
  [{"term"=>"http://schemas.google.com/spreadsheets/2006#worksheet",
    "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
 "title"=>[{"type"=>"text", "content"=>"blank"}],
 "author"=>
  [{"name"=>["test.api.jhartmann"],
    "email"=>["test.api.jhartmann@gmail.com"]}],
 "startIndex"=>["1"],
 "id"=>
  ["http://spreadsheets.google.com/feeds/worksheets/o04927555739056712307.3387874275736238738/private/full"],
 "entry"=>
  [{"category"=>
     [{"term"=>"http://schemas.google.com/spreadsheets/2006#worksheet",
       "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
    "title"=>[{"type"=>"text", "content"=>"Sheet 1"}],
    "rowCount"=>["100"],
    "colCount"=>["20"],
    "id"=>
     ["http://spreadsheets.google.com/feeds/worksheets/o04927555739056712307.3387874275736238738/private/full/od6"],
    "content"=>{"type"=>"text", "content"=>"Sheet 1"},
    "link"=>
     [{"href"=>
        "http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full",
       "rel"=>"http://schemas.google.com/spreadsheets/2006#listfeed",
       "type"=>"application/atom+xml"},
      {"href"=>
        "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full",
       "rel"=>"http://schemas.google.com/spreadsheets/2006#cellsfeed",
       "type"=>"application/atom+xml"},
    [ snip: cutting off the rest of the XML ]

如上图所示,我们现在可以找到用于访问 listFeed 和 cellsFeed 的链接 (highlighted above)。在深入了解 listFeed 之前,我先快速说明一下示例电子表格中目前存在哪些数据,以便您了解我们要查找的内容:

我们的电子表格非常简单,如下所示:

language网站
Javahttp://java.com
phphttp://php.net

以下是相应数据在 listFeed 中的显示效果:

irb(main):048:0> listfeed_uri = \
irb(main):049:0* worksheet_data["entry"][0]["link"][0]["href"]
=> "http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full"

irb(main):050:0> response = get_feed(listfeed_uri, headers)
=> #<Net::HTTPOK 200 OK readbody=true>
irb(main):051:0> listfeed_doc = \ 
irb(main):052:0* XmlSimple.xml_in(response.body, 'KeyAttr' => 'name')
=> {"totalResults"=>["2"], "category"=>[{"term" ... [ snip ]

# Again we parse the XML and then pretty print it
irb(main):053:0> pp listfeed_doc
{"totalResults"=>["2"],
 "category"=>
  [{"term"=>"http://schemas.google.com/spreadsheets/2006#list",
    "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
 "title"=>[{"type"=>"text", "content"=>"Programming language links"}],
 "author"=>
  [{"name"=>["test.api.jhartmann"],
    "email"=>["test.api.jhartmann@gmail.com"]}],
 "startIndex"=>["1"],
 "id"=>
  ["http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full"],
 "entry"=>
  [{"category"=>
     [{"term"=>"http://schemas.google.com/spreadsheets/2006#list",
       "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
    "language"=>["java"],
    "title"=>[{"type"=>"text", "content"=>"ruby"}],
    "website"=>["http://java.com"],
    "id"=>
     ["http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cn6ca"],
    "content"=>
     {"type"=>"text", "content"=>"website: http://java.com"},
    "link"=>
     [{"href"=>
        "http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cn6ca",
       "rel"=>"self",
       "type"=>"application/atom+xml"},
      {"href"=>
        "http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cn6ca/1j81anl6096",
       "rel"=>"edit",
       "type"=>"application/atom+xml"}],
    "updated"=>["2008-03-20T22:19:51.739Z"]},
   {"category"=>
     [{"term"=>"http://schemas.google.com/spreadsheets/2006#list",
       "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
    "language"=>["php"],
    "title"=>[{"type"=>"text", "content"=>"php"}],
    "website"=>["http://php.net"],
    "id"=>
     ["http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cokwr"],
    "content"=>{"type"=>"text", "content"=>"website: http://php.net"},
    [ snip ]

如您所见,listFeed 通过为每一行创建一个条目来返回工作表的内容。它会假定电子表格的第一行包含单元格标题,然后根据该行中的数据动态生成 XML 标题。查看实际的 XML 将有助于进一步说明这一点:

<?xml version='1.0' encoding='UTF-8'?><feed [ snip namespaces ]>
<id>http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full</id>
<updated>2008-03-20T22:19:51.739Z</updated>
<category scheme='http://schemas.google.com/spreadsheets/2006' term='http://schemas.google.com/spreadsheets/2006#list'/>

<title type='text'>Programming language links</title>
[ snip: cutting out links and author information ]
<entry>
    <id>http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cn6ca</id>
    [ snip: updated and category ]
    <title type='text'>java</title>
    <content type='text'>website: http://java.com</content>
    <link rel='self' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cn6ca'/>
    <link rel='edit' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cn6ca/1j81anl6096'/>
    <gsx:language>java</gsx:language>
    <gsx:website>http://java.com</gsx:website>
</entry>
<entry>
    <id>http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cokwr</id>
    [ snip: updated and category ]
    <title type='text'>php</title>
    <content type='text'>website: http://php.net</content>
    <link rel='self' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cokwr'/>
    <link rel='edit' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full/cokwr/41677fi0nc'/>
    <gsx:language>php</gsx:language>
    <gsx:website>http://php.net</gsx:website>
</entry>
</feed>

为了便于快速比较,我们来看看 cellsFeed 中如何表示相同的信息:

# Extract the cellfeed link
irb(main):054:0> cellfeed_uri = \
irb(main):055:0* worksheet_data["entry"][0]["link"][1]["href"]
=> "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full"
irb(main):056:0> response = \ 
irb(main):057:0* get_feed(cellfeed_uri, headers)
=> #<Net::HTTPOK 200 OK readbody=true>

# Parse into datastructure and print
irb(main):058:0> cellfeed_doc = \ 
irb(main):059:0* XmlSimple.xml_in(response.body, 'KeyAttr' => 'name')
=> {"totalResults"=>["6"], [ snip ]

irb(main):060:0> pp cellfeed_doc
{"totalResults"=>["6"],
 "category"=>
  [{"term"=>"http://schemas.google.com/spreadsheets/2006#cell",
    "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
 "title"=>[{"type"=>"text", "content"=>"Programming language links"}],
 "rowCount"=>["101"],
 "colCount"=>["20"],
 "author"=>
  [{"name"=>["test.api.jhartmann"],
    "email"=>["test.api.jhartmann@gmail.com"]}],
 "startIndex"=>["1"],
 "id"=>
  ["http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full"],
 "entry"=>
  [{"category"=>
     [{"term"=>"http://schemas.google.com/spreadsheets/2006#cell",
       "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
    "cell"=>
     [{"col"=>"1",
       "row"=>"1",
       "content"=>"language",
       "inputValue"=>"language"}],
    "title"=>[{"type"=>"text", "content"=>"A1"}],
    "id"=>
     ["http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R1C1"],
    "content"=>{"type"=>"text", "content"=>"language"},
    "link"=>
     [{"href"=>
        "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R1C1",
       "rel"=>"self",
       "type"=>"application/atom+xml"},
      {"href"=>
        "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R1C1/8srvbs",
       "rel"=>"edit",
       "type"=>"application/atom+xml"}],
    "updated"=>["2008-03-20T22:19:51.739Z"]},
    [ snip ]

如您所见,系统返回了 6 个条目,每个单元格对应一个条目。我已剪掉除包含“language”一词的单元格 A1 的值之外的所有其他输出。另请注意上方显示的修改链接。此链接末尾包含一个版本字符串 (8srvbs)。在更新单元格数据时,版本字符串非常重要,我们将在本文末尾进行此操作。这样可确保更新不会被覆盖。每当您发出 PUT 请求来更新单元格数据时,都必须在请求中包含相应单元格的最新版本字符串。每次更新后,系统都会返回新的版本字符串。

将内容发布到 listFeed

要发布内容,我们首先需要 listFeed 的 POST 链接。请求列表 Feed 时,系统会返回此链接。它将包含网址 http://schemas.google.com/g/2005#post 作为 rel 属性的值。您需要解析此链接元素并提取其 href 属性。首先,我们将创建一个小方法来简化发布操作:

irb(main):061:0> def post(uri, data, headers)
irb(main):062:1> uri = URI.parse(uri)
irb(main):063:1> http = Net::HTTP.new(uri.host, uri.port)
irb(main):064:1> return http.post(uri.path, data, headers)
irb(main):065:1> end
=> nil
# Set up our POST url
irb(main):066:0> post_url = \ 
irb(main):067:0* "http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full"
=> "http://spreadsheets.google.com/feeds/list/o04927555739056712307.3387874275736238738/od6/private/full"

# We must use 'application/atom+xml' as MIME type so let's change our headers 
# which were still set to 'application/x-www-form-urlencoded' when we sent our 
# ClientLogin information over https
irb(main):068:0> headers["Content-Type"] = "application/atom+xml"
=> "application/atom+xml"

# Setting up our data to post, using proper namespaces
irb(main):069:0> new_row = \ 
irb(main):070:0* '<atom:entry xmlns:atom="http://www.w3.org/2005/Atom">' << 
irb(main):071:0* '<gsx:language xmlns:gsx="http://schemas.google.com/spreadsheets/2006/extended">' <<
irb(main):072:0* 'ruby</gsx:language>' << 
irb(main):073:0* '<gsx:website xmlns:gsx="http://schemas.google.com/spreadsheets/2006/extended">' <<
irb(main):074:0* 'http://ruby-lang.org</gsx:website>' << 
irb(main):075:0* '</atom:entry>'
=> "<atom:entry xmlns:atom=\"http://www.w3.org/2005/Atom\"><gsx:language ... [ snip ] 

# Performing the post
irb(main):076:0> post_response = post(post_url, new_row, headers) 
=> #<Net::HTTPCreated 201 Created readbody=true>

状态 201 表示我们的发布操作已成功。

使用 cellsFeed 更新内容

文档中可以看出,单元格 Feed 倾向于对现有内容使用 PUT 请求。但是,由于我们之前从 cellsFeed 中检索到的信息只是实际电子表格中已有的数据,我们如何添加新信息呢?我们只需针对要输入数据的每个空单元格发出请求即可。以下代码段展示了如何检索空单元格 R5C1(第 5 行第 1 列),我们希望在其中插入一些有关 Python 编程语言的信息。

我们的原始变量 cellfeed_uri 仅包含 cellfeed 本身的 URI。现在,我们想要附加要编辑的单元格,并获取该单元格的版本字符串以进行编辑:

# Set our query URI
irb(main):077:0> cellfeed_query = cellfeed_uri + '/R5C1'
=> "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1"

# Request the information to extract the edit link
irb(main):078:0> cellfeed_data = get_feed(cellfeed_query, headers)
=> #<Net::HTTPOK 200 OK readbody=true>
irb(main):079:0> cellfeed_data.body
=> "<?xml version='1.0' encoding='UTF-8'?>
<entry xmlns='http://www.w3.org/2005/Atom' xmlns:gs='http://schemas.google.com/spreadsheets/2006' xmlns:batch='http://schemas.google.com/gdata/batch'>
<id>http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1</id>
<updated>2008-03-24T21:55:36.462Z</updated>
<category scheme='http://schemas.google.com/spreadsheets/2006' term='http://schemas.google.com/spreadsheets/2006#cell'/>
<title type='text'>A5</title>
<content type='text'>
</content>
<link rel='self' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1'/>
<link rel='edit' type='application/atom+xml' href='http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1/47pc'/>
<gs:cell row='5' col='1' inputValue=''>
</gs:cell>
</entry>"

如上面的代码清单所示,版本字符串为 47pc。(您可能需要一直向右滚动。)为了简化操作,我们来创建一个便捷方法,用于获取我们感兴趣的任何单元格的版本字符串:

irb(main):080:0> def get_version_string(uri, headers=nil)
irb(main):081:1> response = get_feed(uri, headers)
irb(main):082:1> require 'rexml/document'
irb(main):083:1> xml = REXML::Document.new response.body
irb(main):084:1> edit_link = REXML::XPath.first(xml, '//[@rel="edit"]')
irb(main):085:1> edit_link_href = edit_link.attribute('href').to_s
irb(main):086:1> return edit_link_href.split(/\//)[10]
irb(main):087:1> end
=> nil

# A quick test
irb(main):088:0> puts get_version_string(cellfeed_query, headers)
47pc
=> nil

既然如此,我们不妨再编写一个方法来执行 PUT 请求,或者更好的是,编写一个方法来执行整个批量更新。我们的函数将采用包含以下变量的哈希数组:

  • :batch_id - 批处理请求中每个部分的唯一标识符。
  • :cell_id - 要更新的单元格的 ID,采用 R#C# 格式,其中单元格 A1 表示为 R1C1。
  • :data - 要插入的数据。

irb(main):088:0> def batch_update(batch_data, cellfeed_uri, headers)
irb(main):089:1> batch_uri = cellfeed_uri + '/batch'
irb(main):090:1> batch_request = <<FEED
irb(main):091:1" <?xml version="1.0" encoding="utf-8"?> \
irb(main):092:1" <feed xmlns="http://www.w3.org/2005/Atom" \
irb(main):093:1" xmlns:batch="http://schemas.google.com/gdata/batch" \
irb(main):094:1" xmlns:gs="http://schemas.google.com/spreadsheets/2006" \
irb(main):095:1" xmlns:gd="http://schemas.google.com/g/2005">
irb(main):096:1" <id>#{cellfeed_uri}</id>
irb(main):097:1" FEED
irb(main):098:1> batch_data.each do |batch_request_data|
irb(main):099:2* version_string = get_version_string(cellfeed_uri + '/' + batch_request_data[:cell_id], headers)
irb(main):100:2> data = batch_request_data[:data]
irb(main):101:2> batch_id = batch_request_data[:batch_id]
irb(main):102:2> cell_id = batch_request_data[:cell_id]
irb(main):103:2> row = batch_request_data[:cell_id][1,1]
irb(main):104:2> column = batch_request_data[:cell_id][3,1]
irb(main):105:2> edit_link = cellfeed_uri + '/' + cell_id + '/' + version_string
irb(main):106:2> batch_request<< <<ENTRY
irb(main):107:2" <entry>
irb(main):108:2" <gs:cell col="#{column}" inputValue="#{data}" row="#{row}"/>
irb(main):109:2" <batch:id>#{batch_id}</batch:id>
irb(main):110:2" <batch:operation type="update" />
irb(main):111:2" <id>#{cellfeed_uri}/#{cell_id}</id>
irb(main):112:2" <link href="#{edit_link}" rel="edit" type="application/atom+xml" />
irb(main):113:2" </entry>
irb(main):114:2" ENTRY
irb(main):115:2> end
irb(main):116:1> batch_request << '</feed>'
irb(main):117:1> return post(batch_uri, batch_request, headers)
irb(main):118:1> end
=> nil

# Our sample batch data to insert information about the Python programming language into our worksheet
irb(main):119:0> batch_data = [ \
irb(main):120:0* {:batch_id => 'A', :cell_id => 'R5C1', :data => 'Python'}, \ 
irb(main):121:0* {:batch_id => 'B', :cell_id => 'R5C2', :data => 'http://python.org' } ]
=> [{:cell_id=>"R5C1", :data=>"Python", :batch_id=>"A"}=>{:cell_id=>"R5C2", :data=>"http://python.org", :batch_id=>"B"}]

# Perform the update
irb(main):122:0> response = batch_update(batch_data, cellfeed_uri, headers)
=> #<Net::HTTPOK 200 OK readbody=true>

# Parse the response.body XML and print it
irb(main):123:0> response_xml = XmlSimple.xml_in(response.body, 'KeyAttr' => 'name')
=> [ snip ]

irb(main):124:0> pp response_xml
{"title"=>[{"type"=>"text", "content"=>"Batch Feed"}],
 "xmlns:atom"=>"http://www.w3.org/2005/Atom",
 "id"=>
  ["http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full"],
 "entry"=>
  [{"status"=>[{"code"=>"200", "reason"=>"Success"}],
    "category"=>
     [{"term"=>"http://schemas.google.com/spreadsheets/2006#cell",
       "scheme"=>"http://schemas.google.com/spreadsheets/2006"}],
    "cell"=>
     [{"col"=>"1", "row"=>"5", "content"=>"Python", "inputValue"=>"Python"}],
    "title"=>[{"type"=>"text", "content"=>"A5"}],
    "id"=>
     ["http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1",
      "A"],
    "operation"=>[{"type"=>"update"}],
    "content"=>{"type"=>"text", "content"=>"Python"},
    "link"=>
     [{"href"=>
        "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1",
       "rel"=>"self",
       "type"=>"application/atom+xml"},
      {"href"=>
        "http://spreadsheets.google.com/feeds/cells/o04927555739056712307.3387874275736238738/od6/private/full/R5C1/49kwzg",
       "rel"=>"edit",
       "type"=>"application/atom+xml"}],
    "updated"=>["2008-03-27T15:48:48.470Z"]},
    [ snip ]

如您所见,我们收到了 200 OK 响应代码,因此批处理请求成功。解析响应 XML 后,我们可以看到,系统会针对我们在 response_data 数组中设置的每个单独的 :batch_id 返回一条单独的消息。如需详细了解批量处理,请参阅 GData 中的批量处理文档。

总结

如您所见,使用 Ruby 的交互式 shell 试用 Google Data API 非常简单。我们能够使用 listFeed 和 cellsFeed 访问电子表格和工作表。此外,我们还使用 POST 请求插入了一些新数据,然后编写了方法来执行批量更新,代码只有大约 120 行。从这一点来看,将这些简单的方法封装到类中并构建可重复使用的框架应该不会太难。

如果您在将这些工具与您喜爱的 Google Data API 搭配使用时遇到任何问题,请加入我们的论坛

您可以在 http://code.google.com/p/google-data-samples-ruby 中找到包含上述代码示例的类文件

讨论本文!