singleview_6_27_wordcloud.vg.json 9.83 KB
Newer Older
Danyel Fisher committed
1 2 3
{
    "$schema": "https://vega.github.io/schema/vega/v3.0.json",
    "name": "wordcloud",
4 5
    "width": 900,
    "height": 500,
Danyel Fisher committed
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
    "padding": 0,
    "data": [
      {
        "name": "table",
        "__comment": "this is a cleaned-up copy of the text from Chapter 1. Manually removed (s) from visualization, question, others",
        "values": [
  "Visualization is a vital tool to understand and share insights around data. The right visualization can help express a core idea, or open a space to examination; it can get the world talking about a dataset, or sharing an insight.",
  "visualization can take many forms, from views that support exploratory analysis (top left), to those that provide quick overviews in a dashboard (bottom), to an infographic about popular topics (top right).",
  "visualization provide a direct and tangible representation of data. They allow us to confirm hypotheses and gain insights. When incorporated into the data analysis process early and often, visualization can fundamentally alter the question that someone is asking.",
  "Creating effective visualization is hard. Not because a data set requires an exotic and bespoke visual representation -- for many problems standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language -- off-the-shelf tools like Excel, Tableau, and R are ample. ",
  "Rather, visualization are challenging to create because the problems that are best addressed by visualization are complex and ill-formed. The task of figuring out _what attributes_ of a data set are important is often conflated with figuring out _what type of visualization_ to use. Comparatively speaking, though, picking a chart type to represent specific attributes in a data set is easy. In contrast, deciding on which data attributes will help answer a question is a complex, poorly-defined, and user-driven process; it can require rounds of visualization and exploration to resolve. In this book we focus on the process of going from high-level question to well-defined data analysis tasks, and how to incorporate visualization along the way to clarify understanding and gain insights.",
  "Who is this book for?",
  "This book is for people with access to data and, perhaps, a suite of computational tools, but who are less than sure how to turn that data into visual insight. We have sometimes found that many data science books that you can figure out what how to visualize the data once collected; and that visualization books can often assume that you already have a well-defined question, ready to be visualized. If, like us, you would like to address these assumptions, then this book is for you.",
  "This book does not teach the reader how to clean and manage data in detail, or how to write visualization code: there are already great books on these topics (and, when relevant, we point to some of them). Rather, we will talk about why those processes are important. Similarly, this book will not teach you how to choose a beautiful colormap or select a typeface. Instead, we lay out a framework for how to think about data given the possibilities, and constraints, of visual exploration. Our goal is to show how to effectively use visualization to make sense of data.",
  "Who are we?",
  "The authors of this book have a combined three decades of experience in make sense of data through designing and using visualization. We have worked with data from a broad range of fields: biology and urban transportation, business intelligence and scientific visualization, debugging code and building maps. We have worked with analysts from a variety of organizations, from small, academic science labs to teams of data analysts embedded in large companies. Some of the projects we have worked on result in sophisticated, bespoke visualization systems designed collaboratively with domain specialists, and other times we have pointed people to off-the-shelf visualization tools after a few conversations. We have taught university classes in visualization, and given lectures and tutorials. All in all, we have visualized thousands of data sets.",
  "We have found that our knowledge about visualization techniques, solutions, and systems shapes the way that we think and reason about data. Visualization is fundamentally about presenting data in a way that elicits human reasoning,  that makes room for individual interpretations, and supports exploration. We help our collaborators to make their question and data reflect these values. The process we lay out in this book describes our method.",
  "Overview of chapters",
  "illustrates the process of make sense with visualization in a quick example, exposing the role that a visual representation can play in data discovery.",
  "starts to get into details. It discusses a mechanism to help narrow a question from a broad task into something that can be addressed with an iterative visualization process. For example, the broad question _“Who are the best movie directors?”_ does not necessarily suggest a specific visualization – but _“Find movie directors who directed top-grossing movies using an IMDB data set.”_ can lead more directly to an answer by way of a visualization or two. This process creates an operationalized question, one that consists of particular tasks that can be addressed with data.",
  "This process of narrowing a question down to actionable tasks requires input from multiple stakeholders. <<DataCounseling>> lays out an iterative set of steps for getting to the operationalization, which we call _data counseling_. These steps include finding the right people to talk to, asking effective question, and rapidly exploring the data through increasingly sophisticated prototypes.",
  "The numerical nitty-gritty of the book follows.  discusses types and relations of data, and defines and uses terms like dimensions and measures, or categorical and quantitative.  then organizes common visualization types by the tasks that they fulfill and the data that they use. Last, <<multiview>> explores visualization techniques that use multiple views and interaction.",
  "These chapters are meant to provide an overview of some of the most effective and commonly used ideas for supporting sense-make with visualization, and are are framed using the operationalization and data counseling process to help guide decision-make about which visualization to choose.",
  "With this understanding of getting to insight--from question to data to visualization-—the remainder of the book illustrates two examples of carrying out these steps. A case study in <<casestudies_lync>> describes the creation of a business intelligence dashboard, in collaboration with a team of developers and analysts at Microsoft. <<casestudies_fruitfly>> draws from science, instead, presenting an example with a team of scientists who work with biological data.  These problems illustrate the flexibility of the process, as well as the diverse types of outcomes that are possible.",
  "There are many things that will not be covered in this book. Space does not allow for the perceptual aspects of visualization, human factors components of interfaces, or instructions for using toolkits. We think, though, that the aspect that we have chosen to present here--this discussion of how to conceptualize a problem for visualization--will help make other books and resources more useful."
        ],
        "transform": [
          {
            "type": "countpattern",
            "field": "data",
            "case": "upper",
            "pattern": "[\\w']{3,}",
            "stopwords": "(i|me|my|myself|we|us|our|ours|ourselves|you|your|yours|yourself|yourselves|he|him|his|himself|she|her|hers|herself|it|its|itself|they|them|their|theirs|themselves|what|which|who|whom|whose|this|that|these|those|am|is|are|was|were|be|been|being|have|has|had|having|do|does|did|doing|will|would|should|can|could|ought|i'm|you're|he's|she's|it's|we're|they're|i've|you've|we've|they've|i'd|you'd|he'd|she'd|we'd|they'd|i'll|you'll|he'll|she'll|we'll|they'll|isn't|aren't|wasn't|weren't|hasn't|haven't|hadn't|doesn't|don't|didn't|won't|wouldn't|shan't|shouldn't|can't|cannot|couldn't|mustn't|let's|that's|who's|what's|here's|there's|when's|where's|why's|how's|a|an|the|and|but|if|or|because|as|until|while|of|at|by|for|with|about|against|between|into|through|during|before|after|above|below|to|from|up|upon|down|in|out|on|off|over|under|again|further|then|once|here|there|when|where|why|how|all|any|both|each|few|more|most|other|some|such|no|nor|not|only|own|same|so|than|too|very|say|says|said|shall)"
          },
          {
            "type": "formula", "as": "angle",
            "expr": "[-45, -30, 0, 30, 45][~~(random() * 5)]"
          },
          {
            "type": "formula", "as": "weight",
            "expr": "200"
          }
        ]
      }
    ],
  
    "scales": [
      {
        "name": "colors",
        "type": "ordinal",
        "range": ["#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e", "#e6ab02", "#a6761d", "#666666"]
      }
    ],
  
    "marks": [
      {
        "type": "text",
        "from": {"data": "table"},
        "encode": {
          "enter": {
            "text": {"field": "text"},
            "align": {"value": "center"},
            "baseline": {"value": "alphabetic"},
            "fill": {"scale": "colors", "field": "text"}
          },
          "update": {
            "fillOpacity": {"value": 1}
          },
          "hover": {
            "fillOpacity": {"value": 0.5}
          }
        },
        "transform": [
          {
            "type": "wordcloud",
            "size": [800, 400],
            "text": {"field": "text"},
            "rotate": {"field": "datum.angle"},
            "font": "Helvetica Neue, Arial",
            "fontSize": {"field": "datum.count"},
            "fontWeight": {"field": "datum.weight"},
            "fontSizeRange": [2, 56],
            "padding": 2
          }
        ]
      }
    ]
  }