最新消息:20210917 已从crifan.com换到crifan.org

[记录]用Python把WordPress网站crifan.org的分类爬下来导出为JSON格式

Python crifan 507浏览 0评论
折腾:
[记录]继续折腾 D3 可视化:把自己的wordpress网站crifan.org的分类可视化
后,
希望实现的:
用python
最好的是:用wordpress网站的接口,类似于WLW一样,可以得到自己的分类
最不济的是:爬自己的crifan.org的分类
然后导出所需要的wordpress网站的分类
生成对应的JSON格式
-》即可利用D3去实现wordpress网站分类的可视化
wordpress api categories
Categories API Reference | WP REST API v2 Documentation
Developer Resources | Create cool applications that integrate with WordPress.com
https://developer.wordpress.com/docs/api/1.1/get/sites/$site/categories/slug:$category/
Getting Started with the API | Developer Resources
Developer Resources | Create cool applications that integrate with WordPress.com
How to retrieve a list of categories/ tag in WordPress REST API – Stack Overflow
去试了试:
http://blog.crifan.org/?json=get_category_index
https://www.crifan.org/?json=get_category_index
结果打不开。。。
Developer Resources | Create cool applications that integrate with WordPress.com
https://developer.wordpress.com/docs/api/1.1/get/sites/$site/categories/
https://public-api.wordpress.com/rest/v1.1/sites/www.crifan.org/categories
的确可以得到数据:
-》
格式化后:
但是感觉数据不完整啊
比如:
对应链接是:
https://www.crifan.org/category/know_how/
但是:
返回的分类中,竟然没有
get categories – Get subcategories with JSON API plugin – WordPress Development Stack Exchange
php – How to get list of categories of a wordpress blog using wordpress REST api – Stack Overflow
-》
意思是:
有个wordpress插件叫做:JSON API
去找找
JSON API — WordPress Plugins
Categories API Reference | WP REST API v2 Documentation
https://www.crifan.org/wp-json/wp/v2/categories
结果出错:
{“code”:”rest_no_route”,”message”:”\u672a\u627e\u5230\u5339\u914dURL\u548c\u8bf7\u6c42\u65b9\u5f0f\u7684\u8def\u7531\u3002″,”data”:{“status”:404}}
不过:
https://www.crifan.org/wp-json/
输出了:
{“name”:”\u5728\u8def\u4e0a”,”description”:”on the way – \u8d70\u522b\u4eba\u6ca1\u8d70\u8fc7\u7684\u8def\uff0c\u8ba9\u522b\u4eba\u6709\u8def\u53ef\u8d70″,”url”:”https:\/\/www.crifan.org”,”home”:”https:\/\/www.crifan.org”,”namespaces”:[“oembed\/1.0″],”authentication”:[],”routes”:{“\/”:{“namespace”:””,”methods”:[“GET”],”endpoints”:[{“methods”:[“GET”],”args”:{“context”:{“required”:false,”default”:”view”}}}],”_links”:{“self”:”https:\/\/www.crifan.org\/wp-json\/”}},”\/oembed\/1.0″:{“namespace”:”oembed\/1.0″,”methods”:[“GET”],”endpoints”:[{“methods”:[“GET”],”args”:{“namespace”:{“required”:false,”default”:”oembed\/1.0″},”context”:{“required”:false,”default”:”view”}}}],”_links”:{“self”:”https:\/\/www.crifan.org\/wp-json\/oembed\/1.0″}},”\/oembed\/1.0\/embed”:{“namespace”:”oembed\/1.0″,”methods”:[“GET”],”endpoints”:[{“methods”:[“GET”],”args”:{“url”:{“required”:true},”format”:{“required”:false,”default”:”json”},”maxwidth”:{“required”:false,”default”:600}}}],”_links”:{“self”:”https:\/\/www.crifan.org\/wp-json\/oembed\/1.0\/embed”}}},”_links”:{“help”:[{“href”:”http:\/\/v2.wp-api.org\/”}]}}
后来又试了试:
https://www.crifan.org/wp-json/wp/v2/categories
结果有输出:
[{“id”:7,”count”:3,”description”:””,”link”:”https:\/\/demo.wp-api.org\/category\/liveblogs\/apple-event\/”,”name”:”Apple Event”,”slug”:”apple-event”,”taxonomy”:”category”,”parent”:6,”_links”:{“self”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories\/7″}],”collection”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories\/6″}],”wp:post_type”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/posts?categories=7″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:4,”count”:19,”description”:””,”link”:”https:\/\/demo.wp-api.org\/category\/even\/”,”name”:”Even”,”slug”:”even”,”taxonomy”:”category”,”parent”:0,”_links”:{“self”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories\/4″}],”collection”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”wp:post_type”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/posts?categories=4″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:6,”count”:0,”description”:””,”link”:”https:\/\/demo.wp-api.org\/category\/liveblogs\/”,”name”:”Liveblogs”,”slug”:”liveblogs”,”taxonomy”:”category”,”parent”:0,”_links”:{“self”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories\/6″}],”collection”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”wp:post_type”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/posts?categories=6″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:5,”count”:35,”description”:””,”link”:”https:\/\/demo.wp-api.org\/category\/odd\/”,”name”:”Odd”,”slug”:”odd”,”taxonomy”:”category”,”parent”:0,”_links”:{“self”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories\/5″}],”collection”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”wp:post_type”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/posts?categories=5″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:1,”count”:64,”description”:””,”link”:”https:\/\/demo.wp-api.org\/category\/uncategorized\/”,”name”:”Uncategorized”,”slug”:”uncategorized”,”taxonomy”:”category”,”parent”:0,”_links”:{“self”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories\/1″}],”collection”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”wp:post_type”:[{“href”:”https:\/\/demo.wp-api.org\/wp-json\/wp\/v2\/posts?categories=1″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}}]
REST API Resources | Developer Resources
Taxonomy
View and manage a site’s tags and categories.
Resource    Description
GET/sites/$site/post-types/$post_type/taxonomies    Get a list of taxonomies associated with a post type.
GET/sites/$site/categories/slug:$category    Get information about a single category.
POST/sites/$site/categories/slug:$category    Edit a category.
GET/sites/$site/categories    Get a list of a site’s categories.
GET/sites/$site/tags    Get a list of a site’s tags.
GET/sites/$site/tags/slug:$tag    Get information about a single tag.
POST/sites/$site/tags/slug:$tag    Edit a tag.
POST/sites/$site/categories/new    Create a new category.
POST/sites/$site/tags/new    Create a new tag.
POST/sites/$site/categories/slug:$category/delete    Delete a category.
POST/sites/$site/tags/slug:$tag/delete    Delete a tag.
GET/sites/$site/taxonomies/$taxonomy/terms    Get a list of a site’s terms by taxonomy.
GET/sites/$site/taxonomies/$taxonomy/terms/slug:$slug    Get information about a single term.
POST/sites/$site/taxonomies/$taxonomy/terms/slug:$slug    Edit a term.
POST/sites/$site/taxonomies/$taxonomy/terms/new    Create a new term.
POST/sites/$site/taxonomies/$taxonomy/terms/slug:$slug/delete    Delete a term.
Developer Resources | Create cool applications that integrate with WordPress.com
-》
https://public-api.wordpress.com/rest/v1.1/sites/www.crifan.org/post-types/post/taxonomies
返回:
{“found”:3,”taxonomies”:[{“name”:”category”,”label”:”\u5206\u7c7b\u76ee\u5f55″,”labels”:{“name”:”\u5206\u7c7b\u76ee\u5f55″,”singular_name”:”\u5206\u7c7b\u76ee\u5f55″,”search_items”:”\u641c\u7d22\u5206\u7c7b\u76ee\u5f55″,”popular_items”:null,”all_items”:”\u6240\u6709\u5206\u7c7b\u76ee\u5f55″,”parent_item”:”\u7236\u7ea7\u5206\u7c7b\u76ee\u5f55″,”parent_item_colon”:”\u7236\u7ea7\u5206\u7c7b\u76ee\u5f55\uff1a”,”edit_item”:”\u7f16\u8f91\u5206\u7c7b\u76ee\u5f55″,”view_item”:”\u67e5\u770b\u5206\u7c7b\u76ee\u5f55″,”update_item”:”\u66f4\u65b0\u5206\u7c7b\u76ee\u5f55″,”add_new_item”:”\u6dfb\u52a0\u65b0\u5206\u7c7b\u76ee\u5f55″,”new_item_name”:”\u65b0\u5206\u7c7b\u76ee\u5f55\u540d”,”separate_items_with_commas”:null,”add_or_remove_items”:null,”choose_from_most_used”:null,”not_found”:”\u672a\u627e\u5230\u5206\u7c7b\u3002″,”no_terms”:”\u6ca1\u6709\u5206\u7c7b\u76ee\u5f55″,”items_list_navigation”:”\u5206\u7c7b\u5217\u8868\u5bfc\u822a”,”items_list”:”\u5206\u7c7b\u5217\u8868″,”menu_name”:”\u5206\u7c7b\u76ee\u5f55″,”name_admin_bar”:”category”},”description”:””,”hierarchical”:true,”public”:true,”capabilities”:{“manage_terms”:”manage_categories”,”edit_terms”:”manage_categories”,”delete_terms”:”manage_categories”,”assign_terms”:”edit_posts”}},{“name”:”post_tag”,”label”:”\u6807\u7b7e”,”labels”:{“name”:”\u6807\u7b7e”,”singular_name”:”\u6807\u7b7e”,”search_items”:”\u641c\u7d22\u6807\u7b7e”,”popular_items”:”\u70ed\u95e8\u6807\u7b7e”,”all_items”:”\u6240\u6709\u6807\u7b7e”,”parent_item”:null,”parent_item_colon”:null,”edit_item”:”\u7f16\u8f91\u6807\u7b7e”,”view_item”:”\u67e5\u770b\u6807\u7b7e”,”update_item”:”\u66f4\u65b0\u6807\u7b7e”,”add_new_item”:”\u6dfb\u52a0\u65b0\u6807\u7b7e”,”new_item_name”:”\u65b0\u6807\u7b7e\u540d”,”separate_items_with_commas”:”\u591a\u4e2a\u6807\u7b7e\u8bf7\u7528\u82f1\u6587\u9017\u53f7\uff08,\uff09\u5206\u5f00″,”add_or_remove_items”:”\u6dfb\u52a0\u6216\u5220\u9664\u6807\u7b7e”,”choose_from_most_used”:”\u4ece\u5e38\u7528\u6807\u7b7e\u4e2d\u9009\u62e9″,”not_found”:”\u672a\u627e\u5230\u6807\u7b7e\u3002″,”no_terms”:”\u6ca1\u6709\u6807\u7b7e”,”items_list_navigation”:”\u6807\u7b7e\u5217\u8868\u5bfc\u822a”,”items_list”:”\u6807\u7b7e\u5217\u8868″,”menu_name”:”\u6807\u7b7e”,”name_admin_bar”:”post_tag”},”description”:””,”hierarchical”:false,”public”:true,”capabilities”:{“manage_terms”:”manage_categories”,”edit_terms”:”manage_categories”,”delete_terms”:”manage_categories”,”assign_terms”:”edit_posts”}},{“name”:”post_format”,”label”:”\u5f62\u5f0f”,”labels”:{“name”:”\u5f62\u5f0f”,”singular_name”:”\u5f62\u5f0f”,”search_items”:”\u641c\u7d22\u6807\u7b7e”,”popular_items”:”\u70ed\u95e8\u6807\u7b7e”,”all_items”:”\u5f62\u5f0f”,”parent_item”:null,”parent_item_colon”:null,”edit_item”:”\u7f16\u8f91\u6807\u7b7e”,”view_item”:”\u67e5\u770b\u6807\u7b7e”,”update_item”:”\u66f4\u65b0\u6807\u7b7e”,”add_new_item”:”\u6dfb\u52a0\u65b0\u6807\u7b7e”,”new_item_name”:”\u65b0\u6807\u7b7e\u540d”,”separate_items_with_commas”:”\u591a\u4e2a\u6807\u7b7e\u8bf7\u7528\u82f1\u6587\u9017\u53f7\uff08,\uff09\u5206\u5f00″,”add_or_remove_items”:”\u6dfb\u52a0\u6216\u5220\u9664\u6807\u7b7e”,”choose_from_most_used”:”\u4ece\u5e38\u7528\u6807\u7b7e\u4e2d\u9009\u62e9″,”not_found”:”\u672a\u627e\u5230\u6807\u7b7e\u3002″,”no_terms”:”\u6ca1\u6709\u6807\u7b7e”,”items_list_navigation”:”\u6807\u7b7e\u5217\u8868\u5bfc\u822a”,”items_list”:”\u6807\u7b7e\u5217\u8868″,”menu_name”:”\u5f62\u5f0f”,”name_admin_bar”:”\u5f62\u5f0f”,”archives”:”\u5f62\u5f0f”},”description”:””,”hierarchical”:false,”public”:true,”capabilities”:{“manage_terms”:”manage_categories”,”edit_terms”:”manage_categories”,”delete_terms”:”manage_categories”,”assign_terms”:”edit_posts”}}]}
https://public-api.wordpress.com/rest/v1.1/sites/www.crifan.org/categories
—》
返回很多分类:
但是还是和之前一样,
感觉不全。
而且没有中文的分类。。。
https://public-api.wordpress.com/rest/v2/sites/www.crifan.org/categories
->
半天没反应
JSON API — WordPress Plugins
Method: get_category_index
Returns an array of active categories.
Optional argument
parent – returns categories that are direct children of the parent ID
Response
{
  “status”: “ok”,
  “count”: 3,
  “categories”: [
    { … },
    { … },
    { … }
  ]
}
4.2. Category response object
id – Integer
slug – String
title – String
description – String
parent – Integer
post_count – Integer
感觉是可以通过:
get_category_index
去获得分类的?
http://blog.example.com/?json=get_category_index
-》
https://www.crifan.org/?json=get_category_index
->打开了主页。。。
进入网站后台,搜:
JSON API
找到:
去安装:
正在安装插件:JSON API 1.1.1
正在从https://downloads.wordpress.org/plugin/json-api.1.1.3.zip下载安装包…
正在解压缩安装包…
正在安装插件…
安装插件JSON API 1.1.1成功。
然后去访问:
https://www.crifan.org/?json=get_category_index
还是没用。。。
WP REST API v2 Documentation
去搜索插件:WP REST API
找到:
WP REST API (WP API)
WordPress REST API (Version 2) — WordPress Plugins
根据上面的地址,所以再去搜:
rest-api
找到:
去安装:
WordPress is moving towards becoming a fully-fledged application framework, and we need new APIs. This project was born to create an easy-to-use, easy-to-understand and well-tested framework for creating these APIs, plus creating APIs for core.
This plugin provides an easy to use REST API, available via HTTP. Grab your site’s data in simple JSON format, including users, posts, taxonomies and more. Retrieving or updating data is as simple as sending a HTTP request.
Want to get your site’s posts? Simply send a GET request to /wp-json/wp/v2/posts. Update user with ID 4? Send a PUT request to /wp-json/wp/v2/users/4. Get all posts with the search term “awesome”? GET /wp-json/wp/v2/posts?filter[s]=awesome. It’s that easy.
WP API exposes a simple yet easy interface to WP Query, the posts API, post meta API, users API, revisions API and many more. Chances are, if you can do it with WordPress, WP API will let you do it.
WP API also includes an easy-to-use Javascript API based on Backbone models, allowing plugin and theme developers to get up and running without needing to know anything about the details of getting connected.
Check out our documentation for information on what’s available in the API and how to use it. We’ve also got documentation on extending the API with extra data for plugin and theme developers!
All tickets for the project are being tracked on GitHub. You can also take a look at the recent updates for the project.
-》
直接访问类似于:
/wp-json/wp/v2/posts
的URL即可。
文档;
WP REST API v2 Documentation
-》
Categories API Reference | WP REST API v2 Documentation
正在安装插件:WordPress REST API (Version 2) 2.0-beta13.1
正在从https://downloads.wordpress.org/plugin/rest-api.2.0-beta13.1.zip下载安装包…
正在解压缩安装包…
正在安装插件…
安装插件WordPress REST API (Version 2) 2.0-beta13.1成功。
https://www.crifan.org/wp-json/wp/v2/categories
返回:
[{“id”:4637,”count”:2,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/life\/computer_digit_soft\/soft_360\/”,”name”:”360″,”slug”:”soft_360″,”taxonomy”:”category”,”parent”:4618,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/4637″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/4618″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=4637″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:4498,”count”:29,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/web\/crawl_emulatelogin\/amazon\/”,”name”:”Amazon”,”slug”:”amazon”,”taxonomy”:”category”,”parent”:3390,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/4498″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/3390″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=4498″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:3543,”count”:249,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/operating_system_and_platform\/mobile_platform\/os_android\/”,”name”:”Android”,”slug”:”os_android”,”taxonomy”:”category”,”parent”:4627,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/3543″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/4627″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=3543″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:4613,”count”:3,”description”:”regular expression for Android: almost same with java, use regex, but small different”,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/regular_expression\/android_regex\/”,”name”:”Android regex”,”slug”:”android_regex”,”taxonomy”:”category”,”parent”:702,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/4613″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/702″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=4613″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:7836,”count”:15,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/operating_system_and_platform\/mobile_platform\/os_android\/android-studio\/”,”name”:”Android Studio”,”slug”:”android-studio”,”taxonomy”:”category”,”parent”:3543,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/7836″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/3543″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=7836″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:3902,”count”:92,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/compilerlinkerparser\/antlr\/”,”name”:”ANTLR”,”slug”:”antlr”,”taxonomy”:”category”,”parent”:3901,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/3902″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/3901″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=3902″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:7905,”count”:4,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/web\/webserver\/apache\/”,”name”:”apache”,”slug”:”apache”,”taxonomy”:”category”,”parent”:7904,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/7905″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/7904″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=7905″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:3825,”count”:2,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/develop_ide_editors\/apatana-studio-3\/”,”name”:”Apatana Studio 3″,”slug”:”apatana-studio-3″,”taxonomy”:”category”,”parent”:2130,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/3825″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/2130″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=3825″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:6185,”count”:0,”description”:”Arduino”,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/hardware-work_and_job\/opensource_hardware\/arduino\/”,”name”:”Arduino”,”slug”:”arduino”,”taxonomy”:”category”,”parent”:6183,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/6185″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/6183″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=6185″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}},{“id”:6104,”count”:4,”description”:””,”link”:”https:\/\/www.crifan.org\/category\/work_and_job\/embedded\/mcu_soc\/arm-embedded\/”,”name”:”ARM”,”slug”:”arm-embedded”,”taxonomy”:”category”,”parent”:4812,”_links”:{“self”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/6104″}],”collection”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories”}],”about”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/taxonomies\/category”}],”up”:[{“embeddable”:true,”href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/categories\/4812″}],”wp:post_type”:[{“href”:”https:\/\/www.crifan.org\/wp-json\/wp\/v2\/posts?categories=6104″}],”curies”:[{“name”:”wp”,”href”:”https:\/\/api.w.org\/{rel}”,”templated”:true}]}}]
-》
还是和之前类似:
获得到的分类,太少了。。。
只有有限的几个
-》
我网站上总共的分类,有几百个。。。
难道是因为:
我之前刚刚设置:
对于分类目录的显示
只有主页才能显示?
那把可视化去掉:
保证所有都能显示:
结果:
问题依旧
-》
还只是返回有限的分类,而不是全部的。
参考:
How to retrieve a list of categories/ tag in WordPress REST API – Stack Overflow
-》
https://www.crifan.org/api/get_category_index
返回:
{“status”:”ok”,”count”:332,”categories”:[{“id”:4637,”slug”:”soft_360″,”title”:”360″,”description”:””,”parent”:4618,”post_count”:2},{“id”:4498,”slug”:”amazon”,”title”:”Amazon”,”description”:””,”parent”:3390,”post_count”:29},{“id”:3543,”slug”:”os_android”,”title”:”Android”,”description”:””,”parent”:4627,”post_count”:249},{“id”:4613,”slug”:”android_regex”,”title”:”Android regex”,”description”:”regular expression for Android: almost same with java, use regex, but small different”,”parent”:702,”post_count”:3},{“id”:7836,”slug”:”android-studio”,”title”:”Android Studio”,”description”:””,”parent”:3543,”post_count”:15},{“id”:3902,”slug”:”antlr”,”title”:”ANTLR”,”description”:””,”parent”:3901,”post_count”:92},{“id”:7905,”slug”:”apache”,”title”:”apache”,”description”:””,”parent”:7904,”post_count”:4},。。。。。。。。
。。。。。。。
。。。。。。。
{“id”:2956,”slug”:”%e9%9f%b3%e4%b9%90%e4%b8%8b%e8%bd%bd”,”title”:”\u97f3\u4e50\u4e0b\u8f7d”,”description”:””,”parent”:345,”post_count”:1},{“id”:345,”slug”:”recommend_music”,”title”:”\u97f3\u4e50\u5929\u5802″,”description”:”\u63a8\u8350\u597d\u6b4c\uff0c\u5206\u4eab\u597d\u6b4c”,”parent”:0,”post_count”:41},{“id”:313,”slug”:”musical_knowledge”,”title”:”\u97f3\u4e50\u77e5\u8bc6″,”description”:””,”parent”:345,”post_count”:9},{“id”:17,”slug”:”default_category”,”title”:”\u9ed8\u8ba4\u5206\u7c7b”,”description”:””,”parent”:0,”post_count”:209},{“id”:860,”slug”:”default_classification”,”title”:”\u9ed8\u8ba4\u5206\u7c7b”,”description”:””,”parent”:17,”post_count”:3}]}
格式化后,看起来分类是完整的,至少数量很多:
然后去统计了总数,有332个:
这个看起来,是我所要的:
[总结]
(安装了 JSON API的wordpress插件之后)
单独一个接口:
https://www.crifan.org/api/get_category_index
去获得了,所有的分类的信息:
{
    “status”: “ok”,
    “count”: 332,
    “categories”: [{
        “id”: 4637,
        “slug”: “soft_360”,
        “title”: “360”,
        “description”: “”,
        “parent”: 4618,
        “post_count”: 2
    }, {
        “id”: 4498,
        “slug”: “amazon”,
        “title”: “Amazon”,
        “description”: “”,
        “parent”: 3390,
        “post_count”: 29
    }, {
。。。
    }, {
        “id”: 860,
        “slug”: “default_classification”,
        “title”: “\u9ed8\u8ba4\u5206\u7c7b”,
        “description”: “”,
        “parent”: 17,
        “post_count”: 3
    }]
}
(此处一共有332个)
且每个分类:
举例:
{
        “id”: 345,
        “slug”: “recommend_music”,
        “title”: “\u97f3\u4e50\u5929\u5802”,
        “description”: “\u63a8\u8350\u597d\u6b4c\uff0c\u5206\u4eab\u597d\u6b4c”,
        “parent”: 0,
        “post_count”: 41
    },
中都包含对应的:
自己的名字(此处的title,虽然需要转义才能得到中文)
自己的id和parent的id(此处的id和parent)
对应的路径(此处可以通过slug,拼出来)
该分类下的帖子数量(此处的post_count)
此处还有额外的一个属性:description
只不过多数的分类的description是空的(自己之前建分类时,没有添加描述而导致的)
然后有空就可以继续去:
用得到的所有分类的json
用python解析得到数据,然后再去生成D3所需要的json格式,即可。
-》
其他的接口,可参考:
JSON API — WordPress Plugins

转载请注明:在路上 » [记录]用Python把WordPress网站crifan.org的分类爬下来导出为JSON格式

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
92 queries in 0.208 seconds, using 23.41MB memory