Asynchronous API calls with curl

May 13, 2019 @konradino Konrad Semsch

@konradino wrote:

Hi all!

We're working on building up our ETL jobs on R Studio Connect and we some long running jobs that I would like to speed up by making APIs calls parallel and not sequential as if I do that with purrr iterating through a list or a simple for loop.

In my search for a good solution I came across this curl vignette and specifically the 'async requests' part of it. I though: "awesome, somebody built it for me I just need to adjust it to me needs"... I guess I couldn't be more wrong :stuck_out_tongue:

I started searching around and found this implementation link, but the problem is that it's not directly applicable to my needs + I need to do a POST. Below I'll list shortly what I was trying to implement unsuccessfully so far:

A single POST handler - theoretically the body will need to change with each list but for now let's not go that far. Adjusting that to a POST was not an issue, the question for me is more how to feed that handler correctly into the remaining pipeline.

h <- new_handle(
  copypostfields = toJSON(body, pretty = TRUE, auto_unbox = TRUE)
  ) %>%
  handle_setheaders(
    `Authorization` = "XXX",
    `Content-Type` = "application/json"
    )

Let's say below that I would like to make 3 asynchronous calls using that same handler (or let's say handler versions, but maybe let's get there in a minute) against the same server, hence, I repeat the API host 3 times.

pool <- new_pool()

# Results only available through call back function
cb <- function(req){cat("done:", req$url, ": HTTPS:", req$status, "\n", "content:", rawToChar(req$content), "\n")}

# Example vector of uris to loop through
uris <- c(
  "https://api-link.com",
  "https://api-link.com",
  "https://api-link.com"
)
sapply(uris, curl_fetch_multi, done = cb, pool = pool)
out <- multi_run(pool = pool)

After those lines the execution should take place, but instead of a great result I get two types of errors below:

  1. Either just 404 because that handler I defined above is not tied to any of those calls (it's just a generic curl GET call)

  2. If I change that last in order to tie the handler into:

sapply(uris, curl_fetch_multi, done = cb, pool = pool, handle = h)
out <- multi_run(pool = pool)
Error in multi_add(handle = handle, done = done, fail = fail, data = data,  : 
  Handle is locked. Probably in use in a connection or async request.

So it even says in the documentation that a handler can't be used more than once, but then I having trouble understanding how to organise this pipeline of asynchronous calls the right way. Did anyone came across a similar issue and found a viable solution?

Thanks!

Posts: 4

Participants: 2

Read full topic

Previous Article
Introducing SAML in RStudio Connect
Introducing SAML in RStudio Connect

RStudio Connect 1.7.4 builds off of the prior release with significant improvements for RStudio Connect adm...

Next Article
Removing users from RStudio Connect
Removing users from RStudio Connect

@russellsmithies wrote: I know this has been covered ( How-do-I-remove-users-fro...