Rocket/core/lib/src/form/parser.rs

278 lines
9.2 KiB
Rust
Raw Normal View History

UTF-8 routes. Forms revamp. Temp files. Capped. So. Many. Changes. This is an insane commit: simultaneously one of the best (because of all the wonderful improvements!) and one of the worst (because it is just massive) in the project's history. Routing: * All UTF-8 characters are accepted everywhere in route paths. (#998) * `path` is now `uri` in `route` attribute: `#[route(GET, path = "..")]` becomes `#[route(GET, uri = "..")]`. Forms Revamp * All form related types now reside in a new `form` module. * Multipart forms are supported. (resolves #106) * Collections are supported in forms and queries. (resolves #205) * Nested structures in forms and queries are supported. (resolves #313) * Form fields can be ad-hoc validated with `#[field(validate = expr)]`. * `FromFormValue` is now `FromFormField`, blanket implements `FromForm`. * Form field values are always percent-decoded apriori. Temporary Files * A new `TempFile` data and form guard allows streaming data directly to a file which can then be persisted. * A new `temp_dir` config parameter specifies where to store `TempFile`. * The limits `file` and `file/$ext`, where `$ext` is the file extension, determines the data limit for a `TempFile`. Capped * A new `Capped` type is used to indicate when data has been truncated due to incoming data limits. It allows checking whether data is complete or truncated. * `DataStream` methods return `Capped` types. * `DataStream` API has been revamped to account for `Capped` types. * Several `Capped<T>` types implement `FromData`, `FromForm`. * HTTP 413 (Payload Too Large) errors are now returned when data limits are exceeded. (resolves #972) Hierarchical Limits * Data limits are now hierarchical, delimited with `/`. A limit of `a/b/c` falls back to `a/b` then `a`. Core * `&RawStr` no longer implements `FromParam`. * `&str` implements `FromParam`, `FromData`, `FromForm`. * `FromTransformedData` was removed. * `FromData` gained a lifetime for use with request-local data. * The default error HTML is more compact. * `&Config` is a request guard. * The `DataStream` interface was entirely revamped. * `State` is only exported via `rocket::State`. * A `request::local_cache!()` macro was added for storing values in request-local cache without consideration for type uniqueness by using a locally generated anonymous type. * `Request::get_param()` is now `Request::param()`. * `Request::get_segments()` is now `Request::segments()`, takes a range. * `Request::get_query_value()` is now `Request::query_value()`, can parse any `FromForm` including sequences. * `std::io::Error` implements `Responder` like `Debug<std::io::Error>`. * `(Status, R)` where `R: Responder` implements `Responder` by overriding the `Status` of `R`. * The name of a route is printed first during route matching. * `FlashMessage` now only has one lifetime generic. HTTP * `RawStr` implements `serde::{Serialize, Deserialize}`. * `RawStr` implements _many_ more methods, in particular, those related to the `Pattern` API. * `RawStr::from_str()` is now `RawStr::new()`. * `RawStr::url_decode()` and `RawStr::url_decode_lossy()` only allocate as necessary, return `Cow`. * `Status` implements `Default` with `Status::Ok`. * `Status` implements `PartialEq`, `Eq`, `Hash`, `PartialOrd`, `Ord`. * Authority and origin part of `Absolute` can be modified with new `Absolute::{with,set}_authority()`, `Absolute::{with,set}_origin()` methods. * `Origin::segments()` was removed in favor of methods split into query and path parts and into raw and decoded versions. * The `Segments` iterator is smarter, returns decoded `&str` items. * `Segments::into_path_buf()` is now `Segments::to_path_buf()`. * A new `QuerySegments` is the analogous query segment iterator. * Once set, `expires` on private cookies is not overwritten. (resolves #1506) * `Origin::path()` and `Origin::query()` return `&RawStr`, not `&str`. Codegen * Preserve more spans in `uri!` macro. * Preserve spans `FromForm` field types. * All dynamic parameters in a query string must typecheck as `FromForm`. * `FromFormValue` derive removed; `FromFormField` added. * The `form` `FromForm` and `FromFormField` field attribute is now named `field`. `#[form(field = ..)]` is now `#[field(name = ..)]`. Contrib * `Json` implements `FromForm`. * `MsgPack` implements `FromForm`. * The `json!` macro is exported as `rocket_contrib::json::json!`. * Added clarifying docs to `StaticFiles`. Examples * `form_validation` and `form_kitchen_sink` removed in favor of `forms`. * The `hello_world` example uses unicode in paths. * The `json` example only allocates as necessary. Internal * Codegen uses new `exports` module with the following conventions: - Locals starts with `__` and are lowercased. - Rocket modules start with `_` and are lowercased. - `std` types start with `_` and are titlecased. - Rocket types are titlecased. * A `header` module was added to `http`, contains header types. * `SAFETY` is used as doc-string keyword for `unsafe` related comments. * The `Uri` parser no longer recognizes Rocket route URIs.
2020-10-30 03:50:06 +00:00
use std::cell::UnsafeCell;
use multer::Multipart;
use parking_lot::{RawMutex, lock_api::RawMutex as _};
use either::Either;
use crate::request::{Request, local_cache};
use crate::data::{Data, Limits, Outcome};
use crate::form::prelude::*;
use crate::http::RawStr;
type Result<'r, T> = std::result::Result<T, Error<'r>>;
type Field<'r, 'i> = Either<ValueField<'r>, DataField<'r, 'i>>;
pub struct Buffer {
strings: UnsafeCell<Vec<String>>,
mutex: RawMutex,
}
pub struct MultipartParser<'r, 'i> {
request: &'r Request<'i>,
buffer: &'r Buffer,
source: Multipart,
done: bool,
}
pub struct RawStrParser<'r> {
buffer: &'r Buffer,
source: &'r RawStr,
}
pub enum Parser<'r, 'i> {
Multipart(MultipartParser<'r, 'i>),
RawStr(RawStrParser<'r>),
}
impl<'r, 'i> Parser<'r, 'i> {
pub async fn new(req: &'r Request<'i>, data: Data) -> Outcome<Parser<'r, 'i>, Errors<'r>> {
let parser = match req.content_type() {
Some(c) if c.is_form() => Self::from_form(req, data).await,
Some(c) if c.is_form_data() => Self::from_multipart(req, data).await,
_ => return Outcome::Forward(data),
};
match parser {
Ok(storage) => Outcome::Success(storage),
Err(e) => Outcome::Failure((e.status(), e.into()))
}
}
async fn from_form(req: &'r Request<'i>, data: Data) -> Result<'r, Parser<'r, 'i>> {
let limit = req.limits().get("form").unwrap_or(Limits::FORM);
let string = data.open(limit).into_string().await?;
if !string.is_complete() {
Err((None, Some(limit.as_u64())))?
}
Ok(Parser::RawStr(RawStrParser {
buffer: local_cache!(req, Buffer::new()),
source: RawStr::new(local_cache!(req, string.into_inner())),
}))
}
async fn from_multipart(req: &'r Request<'i>, data: Data) -> Result<'r, Parser<'r, 'i>> {
let boundary = req.content_type()
.ok_or(multer::Error::NoMultipart)?
.param("boundary")
.ok_or(multer::Error::NoBoundary)?;
let form_limit = req.limits()
.get("data-form")
.unwrap_or(Limits::DATA_FORM);
Ok(Parser::Multipart(MultipartParser {
request: req,
buffer: local_cache!(req, Buffer::new()),
source: Multipart::with_reader(data.open(form_limit), boundary),
done: false,
}))
}
pub async fn next(&mut self) -> Option<Result<'r, Field<'r, 'i>>> {
match self {
Parser::Multipart(ref mut p) => p.next().await,
Parser::RawStr(ref mut p) => p.next().map(|f| Ok(Either::Left(f)))
}
}
}
impl<'r> RawStrParser<'r> {
pub fn new(buffer: &'r Buffer, source: &'r RawStr) -> Self {
RawStrParser { buffer, source }
}
}
impl<'r> Iterator for RawStrParser<'r> {
type Item = ValueField<'r>;
fn next(&mut self) -> Option<Self::Item> {
use std::borrow::Cow::*;
let (name, value) = loop {
if self.source.is_empty() {
return None;
}
let (field_str, rest) = self.source.split_at_byte(b'&');
self.source = rest;
if !field_str.is_empty() {
break field_str.split_at_byte(b'=');
}
};
let name_val = match (name.url_decode_lossy(), value.url_decode_lossy()) {
(Borrowed(name), Borrowed(val)) => (name, val),
(Borrowed(name), Owned(v)) => (name, self.buffer.push_one(v)),
(Owned(name), Borrowed(val)) => (self.buffer.push_one(name), val),
(Owned(mut name), Owned(val)) => {
let len = name.len();
name.push_str(&val);
self.buffer.push_split(name, len)
}
};
Some(ValueField::from(name_val))
}
}
#[cfg(test)]
mod raw_str_parse_tests {
use crate::form::ValueField as Field;
#[test]
fn test_skips_empty() {
let buffer = super::Buffer::new();
let fields: Vec<_> = super::RawStrParser::new(&buffer, "a&b=c&&&c".into()).collect();
assert_eq!(fields, &[Field::parse("a"), Field::parse("b=c"), Field::parse("c")]);
}
#[test]
fn test_decodes() {
let buffer = super::Buffer::new();
let fields: Vec<_> = super::RawStrParser::new(&buffer, "a+b=c%20d&%26".into()).collect();
assert_eq!(fields, &[Field::parse("a b=c d"), Field::parse("&")]);
}
}
impl<'r, 'i> MultipartParser<'r, 'i> {
async fn next(&mut self) -> Option<Result<'r, Field<'r, 'i>>> {
if self.done {
return None;
}
let field = match self.source.next_field().await {
Ok(Some(field)) => field,
Ok(None) => return None,
Err(e) => {
self.done = true;
return Some(Err(e.into()));
}
};
// A field with a content-type is data; one without is "value".
trace_!("multipart field: {:?}", field.name());
let content_type = field.content_type().and_then(|m| m.as_ref().parse().ok());
let field = if let Some(content_type) = content_type {
let (name, file_name) = match (field.name(), field.file_name()) {
(None, None) => ("", None),
(None, Some(file_name)) => ("", Some(self.buffer.push_one(file_name))),
(Some(name), None) => (self.buffer.push_one(name), None),
(Some(a), Some(b)) => {
let (field_name, file_name) = self.buffer.push_two(a, b);
(field_name, Some(file_name))
}
};
Either::Right(DataField {
content_type,
request: self.request,
name: NameView::new(name),
file_name: file_name.and_then(sanitize),
data: Data::from(field),
})
} else {
let (mut buf, len) = match field.name() {
Some(s) => (s.to_string(), s.len()),
None => (String::new(), 0)
};
match field.text().await {
Ok(text) => buf.push_str(&text),
Err(e) => return Some(Err(e.into())),
};
let name_val = self.buffer.push_split(buf, len);
Either::Left(ValueField::from(name_val))
};
Some(Ok(field))
}
}
fn sanitize(file_name: &str) -> Option<&str> {
let file_name = std::path::Path::new(file_name)
.file_name()
.and_then(|n| n.to_str())
.map(|n| n.find('.').map(|i| n.split_at(i).0).unwrap_or(n))?;
if file_name.is_empty()
|| file_name.starts_with(|c| c == '.' || c == '*')
|| file_name.ends_with(|c| c == ':' || c == '>' || c == '<')
|| file_name.contains(|c| c == '/' || c == '\\')
{
return None
}
Some(file_name)
}
impl Buffer {
pub fn new() -> Self {
Buffer {
strings: UnsafeCell::new(vec![]),
mutex: RawMutex::INIT,
}
}
pub fn push_one<'a, S: Into<String>>(&'a self, string: S) -> &'a str {
// SAFETY:
// * Aliasing: We retrieve a mutable reference to the last slot (via
// `push()`) and then return said reference as immutable; these
// occur in serial, so they don't alias. This method accesses a
// unique slot each call: the last slot, subsequently replaced by
// `push()` each next call. No other method accesses the internal
// buffer directly. Thus, the outstanding reference to the last slot
// is never accessed again mutably, preserving aliasing guarantees.
// * Liveness: The returned reference is to a `String`; we must ensure
// that the `String` is never dropped while `self` lives. This is
// guaranteed by returning a reference with the same lifetime as
// `self`, so `self` can't be dropped while the string is live, and
// by never removing elements from the internal `Vec` thus not
// dropping `String` itself: `push()` is the only mutating operation
// called on `Vec`, which preserves all previous elements; the
// stability of `String` itself means that the returned address
// remains valid even after internal realloc of `Vec`.
// * Thread-Safety: Parallel calls to `push_one` without exclusion
// would result in a race to `vec.push()`; `RawMutex` ensures that
// this doesn't occur.
unsafe {
self.mutex.lock();
let vec: &mut Vec<String> = &mut *self.strings.get();
vec.push(string.into());
let last = vec.last().expect("push() => non-empty");
self.mutex.unlock();
last
}
}
pub fn push_split(&self, string: String, len: usize) -> (&str, &str) {
let buffered = self.push_one(string);
let a = &buffered[..len];
let b = &buffered[len..];
(a, b)
}
pub fn push_two<'a>(&'a self, a: &str, b: &str) -> (&'a str, &'a str) {
let mut buffer = String::new();
buffer.push_str(a);
buffer.push_str(b);
self.push_split(buffer, a.len())
}
}
unsafe impl Sync for Buffer {}