It's not the size of the table but the extremely gnarly nodes in the first two rows.
So, just edit out the problem nodes.
xml2
supports a much wider array of libxml2
operations, now:
library(rvest)
library(tidyverse)
pg <- read_html("http://www.svs.cl/institucional/mercados/consulta.php?mercado=V&Estado=VI&entidad=RVEMI")
xml_remove(html_nodes(pg, xpath=".//table/tr[1]"))
xml_remove(html_nodes(pg, xpath=".//table/tr[1]"))
html_nodes(pg, xpath=".//table") %>%
html_table() %>%
.[[1]] %>%
as_tibble()
## # A tibble: 368 × 3
## X1 X2 X3
## <chr> <chr> <chr>
## 1 76675290-K AD RETAIL S.A. VI
## 2 98000000-1 ADMINISTRADORA DE FONDOS DE PENSIONES CAPITAL S.A. VI
## 3 98000100-8 ADMINISTRADORA DE FONDOS DE PENSIONES HABITAT S.A. VI
## 4 76240079-0 ADMINISTRADORA DE FONDOS DE PENSIONES CUPRUM S.A. VI
## 5 76762250-3 ADMINISTRADORA DE FONDOS DE PENSIONES MODELO S.A. VI
## 6 98001200-K ADMINISTRADORA DE FONDOS DE PENSIONES PLANVITAL S.A. VI
## 7 76265736-8 ADMINISTRADORA DE FONDOS DE PENSIONES PROVIDA S.A. VI
## 8 94272000-9 AES GENER S.A. VI
## 9 96566940-K AGENCIAS UNIVERSALES S.A. VI
## 10 91253000-0 AGRICOLA NACIONAL S.A.C. E I. VI
## # ... with 358 more rows
Note you can do:
xml_remove(html_nodes(pg, xpath=".//table/tr[position() >= 1 and position() <=2]"))
instead of the two remove ops but it's almost as verbose and there's no real performance gain here.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…