TableSet¶
The TableSet class collects a set of related tables in a single data
structure. The most common way of creating a TableSet is using the
Table.group_by() method, which is similar to SQL’s GROUP BY keyword.
The resulting set of tables will all have identical columns structure.
TableSet functions as a dictionary. Individual tables in the set can
be accessed by using their name as a key. If the table set was created using
Table.group_by() then the names of the tables will be the grouping
factors found in the original data.
TableSet replicates the majority of the features of Table.
When methods such as TableSet.select(), TableSet.where() or
TableSet.order_by() are used, the operation is applied to each table
in the set and the result is a new TableSet instance made up of
entirely new Table instances.
TableSet instances can also contain other TableSet’s. This means you
can chain calls to Table.group_by() and TableSet.group_by()
and end up with data grouped across multiple dimensions.
TableSet.aggregate() on nested TableSets will then group across multiple
dimensions.
An group of named tables with identical column definitions. |
Properties¶
Creating¶
Saving¶
Write each table in this set to a separate CSV in a given directory. |
|
Write |
Processing¶
Previewing¶
Print the keys and row counts of each table in the tableset. |
Charting¶
Render a lattice/grid of bar charts using |
|
Render a lattice/grid of column charts using |
|
Render a lattice/grid of line charts using |
|
Render a lattice/grid of scatterplots using |
Table Proxy Methods¶
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
|
Calls |
Detailed list¶
- class agate.TableSet(tables, keys, key_name='group', key_type=None, _is_fork=False)¶
Bases:
MappedSequenceAn group of named tables with identical column definitions. Supports (almost) all the same operations as
Table. When executed on aTableSet, any operation that would have returned a newTableinstead returns a newTableSet. Any operation that would have returned a single value instead returns a dictionary of values.TableSet is implemented as a subclass of
MappedSequence- Parameters:
tables – A sequence
Tableinstances.keys – A sequence of keys corresponding to the tables. These may be any type except
int.key_name – A name that describes the grouping properties. Used as the column header when the groups are aggregated. Defaults to the column name that was grouped on.
key_type – An instance some subclass of
DataType. If not provided it will default to a :class`.Text`._is_fork – Used internally to skip certain validation steps when data is propagated from an existing tablset.
- property key_name¶
Get the name of the key this TableSet is grouped by. (If created using
Table.group_by()then this is the original column name.)
- property key_type¶
Get the
DataTypethis TableSet is grouped by. (If created usingTable.group_by()then this is the original column type.)
- property column_names¶
Get an ordered list of this
TableSet’s column names.- Returns:
A
tupleof strings.
- aggregate(aggregations)¶
Aggregate data from the tables in this set by performing some set of column operations on the groups and coalescing the results into a new
Table.aggregationsmust be a sequence of tuples, where each has two parts: anew_column_nameand aAggregationinstance.The resulting table will have the keys from this
TableSet(and any nested TableSets) set as itsrow_names. SeeTable.__init__()for more details.- Parameters:
aggregations – A list of tuples in the format
(new_column_name, aggregation), where eachaggregationis an instance ofAggregation.- Returns:
A new
Table.
- bar_chart(label=0, value=1, path=None, width=None, height=None)¶
Render a lattice/grid of bar charts using
leather.Lattice.- Parameters:
label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.
value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
Noneand running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- bins(*args, **kwargs)¶
Calls
Table.bins()on each table in the TableSet.
- column_chart(label=0, value=1, path=None, width=None, height=None)¶
Render a lattice/grid of column charts using
leather.Lattice.- Parameters:
label – The name or index of a column to plot as the labels of the chart. Defaults to the first column in the table.
value – The name or index of a column to plot as the values of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
Noneand running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- compute(*args, **kwargs)¶
Calls
Table.compute()on each table in the TableSet.
- count(value) integer -- return number of occurrences of value¶
- denormalize(*args, **kwargs)¶
Calls
Table.denormalize()on each table in the TableSet.
- dict()¶
Retrieve the contents of this sequence as an
collections.OrderedDict.
- distinct(*args, **kwargs)¶
Calls
Table.distinct()on each table in the TableSet.
- exclude(*args, **kwargs)¶
Calls
Table.exclude()on each table in the TableSet.
- find(*args, **kwargs)¶
Calls
Table.find()on each table in the TableSet.
- classmethod from_csv(dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs)¶
Create a new
TableSetfrom a directory of CSVs.See
Table.from_csv()for additional details.- Parameters:
dir_path – Path to a directory full of CSV files. All CSV files in this directory will be loaded.
column_names – See
Table.__init__().column_types – See
Table.__init__().row_names – See
Table.__init__().header – See
Table.from_csv().
- classmethod from_json(path, column_names=None, column_types=None, keys=None, **kwargs)¶
Create a new
TableSetfrom a directory of JSON files or a single JSON object with key value (Table key and list of row objects) pairs for eachTable.See
Table.from_json()for additional details.- Parameters:
path – Path to a directory containing JSON files or filepath/file-like object of nested JSON file.
keys – A list of keys of the top-level dictionaries for each file. If specified, length must be equal to number of JSON files in path.
column_types – See
Table.__init__().
- get(key, default=None)¶
Equivalent to
collections.OrderedDict.get().
- group_by(*args, **kwargs)¶
Calls
Table.group_by()on each table in the TableSet.
- having(aggregations, test)¶
Create a new
TableSetwith only those tables that pass a test.This works by applying a sequence of
Aggregationinstances to each table. The resulting dictionary of properties is then passed to thetestfunction.This method does not modify the underlying tables in any way.
- Parameters:
aggregations – A list of tuples in the format
(name, aggregation), where eachaggregationis an instance ofAggregation.test (
function) – A function that takes a dictionary of aggregated properties and returnsTrueif it should be included in the newTableSet.
- Returns:
A new
TableSet.
- homogenize(*args, **kwargs)¶
Calls
Table.homogenize()on each table in the TableSet.
- index(value[, start[, stop]]) integer -- return first index of value.¶
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- items()¶
Equivalent to
collections.OrderedDict.items().
- join(*args, **kwargs)¶
Calls
Table.join()on each table in the TableSet.
- keys()¶
Equivalent to
collections.OrderedDict.keys().
- limit(*args, **kwargs)¶
Calls
Table.limit()on each table in the TableSet.
- line_chart(x=0, y=1, path=None, width=None, height=None)¶
Render a lattice/grid of line charts using
leather.Lattice.- Parameters:
x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.
y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
Noneand running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- merge(groups=None, group_name=None, group_type=None)¶
Convert this TableSet into a single table. This is the inverse of
Table.group_by().Any row_names set on the merged tables will be lost in this process.
- Parameters:
groups – A list of grouping factors to add to merged rows in a new column. If specified, it should have exactly one element per
Tablein theTableSet. If not specified or None, the grouping factor will be the name of theRow’s original Table.group_name – This will be the column name of the grouping factors. If None, defaults to the
TableSet.key_name.group_type – This will be the column type of the grouping factors. If None, defaults to the
TableSet.key_type.
- Returns:
A new
Table.
- normalize(*args, **kwargs)¶
Calls
Table.normalize()on each table in the TableSet.
- order_by(*args, **kwargs)¶
Calls
Table.order_by()on each table in the TableSet.
- pivot(*args, **kwargs)¶
Calls
Table.pivot()on each table in the TableSet.
- print_structure(max_rows=20, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)¶
Print the keys and row counts of each table in the tableset.
- Parameters:
max_rows – The maximum number of rows to display before truncating the data. Defaults to 20.
output – The output used to print the structure of the
Table.
- Returns:
None
- scatterplot(x=0, y=1, path=None, width=None, height=None)¶
Render a lattice/grid of scatterplots using
leather.Lattice.- Parameters:
x – The name or index of a column to plot as the x axis of the chart. Defaults to the first column in the table.
y – The name or index of a column to plot as the y axis of the chart. Defaults to the second column in the table.
path – If specified, the resulting SVG will be saved to this location. If
Noneand running in IPython, then the SVG will be rendered inline. Otherwise, the SVG data will be returned as a string.width – The width of the output SVG.
height – The height of the output SVG.
- select(*args, **kwargs)¶
Calls
Table.select()on each table in the TableSet.
- to_csv(dir_path, **kwargs)¶
Write each table in this set to a separate CSV in a given directory.
See
Table.to_csv()for additional details.- Parameters:
dir_path – Path to the directory to write the CSV files to.
- to_json(path, nested=False, indent=None, **kwargs)¶
Write
TableSetto either a set of JSON files for each table or a single nested JSON file.See
Table.to_json()for additional details.- Parameters:
path – Path to the directory to write the JSON file(s) to. If nested is True, this should be a file path or file-like object to write to.
nested – If True, the output will be a single nested JSON file with each Table’s key paired with a list of row objects. Otherwise, the output will be a set of files for each table. Defaults to False.
indent – See
Table.to_json().
- values()¶
Equivalent to
collections.OrderedDict.values().
- where(*args, **kwargs)¶
Calls
Table.where()on each table in the TableSet.