Hacker News
The hackeRnews package was created in order to simplify
the process of getting data from Hacker News. Hacker News is
a user-generated content website that focuses on stories related to
computer science. The website is composed of user submitted stories
where each one provides a link to the original data source. Moreover,
users have the ability to upvote a story if they have found it
interesting. Each story contains a comment section which allows users to
discuss about the presented subject. Besides news stories Hacker News
contains the following sections:
- ‘Ask’ section where users can ask questions to the Hacker News community
- ‘Show’ section where users can share something that they have created
- ‘Jobs’ section where users can browse job offers
Hacker News API
The Hacker News API official documentation can be found here. The API serves data
in JSON format. The hackeRnews package allows the retrieve
this data in form of convenient R objects. Each object (story, comment,
…) has a unique id and can be retrieved using this id. The API also
provides a way to fetch up to 500 top and new stories, latest best
stories, ask stories, show stories and job stories.
Examples of using the hackeRnews package to retrieve
data from the official Hacker News API are presented below:
hackeRnews usage
news stories
To fetch best/new/top stories the user can use the
get_*_stories function. Each function takes one optional
argument max_items that limits the number of returned
stories.
For example to fetch the top 5 best stories:
best_stories <- get_best_stories(max_items = 5)
best_stories[[1]]
#> List of 9
#> $ by : chr "smnrg"
#> $ descendants: int 1353
#> $ id : int 43595269
#> $ kids : int [1:267] 43596485 43600039 43595394 43596326 43596158 43595524 43599675 43596393 43601623 43595404 ...
#> $ score : int 1895
#> $ time : POSIXct[1:1], format: "2025-04-05 17:57:46"
#> $ title : chr "What if we made advertising illegal?"
#> $ type : chr "story"
#> $ url : chr "https://simone.org/advertising/"
#> - attr(*, "class")= chr "hn_item"There is a method that allows to fetch just raw ids of best/new/top
stories as well get_*_stories_ids()
best_stories_ids <- get_best_stories_ids()
best_stories_ids[1:5] # output truncated for legibility
#> [1] 43595269 43561253 43558671 43615912 43573156ask / job / show stories
Similar to news stories. There are get_latest_*_stories
that returns latest * stories and get_latest_*_stories_ids
that returns latest * stories ids.
For example to fetch the 3 latest ask stories:
ask_stories <- get_latest_ask_stories(max_items = 3)
ask_stories[[1]]
#> List of 9
#> $ by : chr "davidkuennen"
#> $ descendants: int 225
#> $ id : int 43619768
#> $ kids : int [1:138] 43624991 43622046 43625709 43625280 43625989 43624877 43626090 43625372 43626179 43626072 ...
#> $ score : int 80
#> $ text : chr "Today, I noticed that my behavior has shifted over the past few months. Right now, I exclusively use ChatGPT fo"| __truncated__
#> $ time : POSIXct[1:1], format: "2025-04-08 09:23:18"
#> $ title : chr "Ask HN: Do you still use search engines?"
#> $ type : chr "story"
#> - attr(*, "class")= chr "hn_item"comments
The discussion in story threads is represented as system of comments.
Each story has top level comments ids stored under the kids
property. Each comment post can have it’s own set of comments ids under
kids property (sub-comments) and so on. In order to
retrieve all of the comments of a specific story, just use the
get_comments function.
top_story <- get_top_stories(max_items = 1)[[1]]
get_comments(top_story)
#> # A tibble: 91 × 7
#> id deleted by time text dead parent
#> <int> <lgl> <chr> <dttm> <chr> <lgl> <int>
#> 1 43625519 FALSE neomantra 2025-04-08 19:19:22 "I'll tos… FALSE 4.36e7
#> 2 43625428 FALSE evaneykelen 2025-04-08 19:09:15 "After many tr… FALSE 4.36e7
#> 3 43625094 FALSE rorylaitila 2025-04-08 18:36:15 "ECharts is in… FALSE 4.36e7
#> 4 43624471 FALSE FredPret 2025-04-08 17:44:22 "The line race… FALSE 4.36e7
#> 5 43625691 FALSE miiiiiike 2025-04-08 19:40:50 "I’d keep it.<… FALSE 4.36e7
#> 6 43624701 FALSE simlevesque 2025-04-08 18:03:45 "If you'r… FALSE 4.36e7
#> 7 43625213 FALSE smjburton 2025-04-08 18:47:47 "I was just lo… FALSE 4.36e7
#> 8 43624496 FALSE JacobiX 2025-04-08 17:46:23 "In a quick we… FALSE 4.36e7
#> 9 43624773 FALSE paulirish 2025-04-08 18:10:08 "Impressive wo… FALSE 4.36e7
#> 10 43624954 FALSE lxe 2025-04-08 18:27:02 "What a comple… FALSE 4.36e7
#> # ℹ 81 more rowsuser
To fetch data about user ‘jl’ just use the
get_user_by_username function:
user <- get_user_by_username("jl")
user
#> List of 5
#> $ about : chr "This is a test"
#> $ created : POSIXct[1:1], format: "2007-03-15 01:50:46"
#> $ id : chr "jl"
#> $ karma : int 4307
#> $ submitted: int [1:850] 35686379 35675818 25172559 25172553 19464269 18498213 16659709 16659632 16659556 14237416 ...
#> - attr(*, "class")= chr "hn_user"all items / latest items
It’s possible to iterate over latest items by fetching the id of the
latest item by using the get_max_item_id function and then
walking backwards to discover latest items. Using that method it’s
possible to fetch all items on Hacker News.
For example to fetch 10 latest items:
max_item_id <- get_max_item_id()
latest_items <- get_items_by_ids(seq(max_item_id, max_item_id - 10))updates
Latest items and profile changes can be retrieved using
get_updates
updates <- get_updates()
updates$profiles[1:5] # output truncated for legibility
#> [1] "PaulHoule" "pointlessone" "wavemode" "theamk"
#> [5] "tomtomistaken"
updates$items[1:5] # output truncated for legibility
#> [1] 43626104 43623968 43623219 43625818 43624991