2016

September 2016

# Making the Most out of Your Marketo A/B Test:  The Statistics you Need to Know

Posted by Jessica Kao Sep 26, 2016

We’ve all used Marketo or other automation tools to A/B test emails and landing pages. We do it because we want to optimize engagement through constant iterations, and we can use the results to give our content its best shot at provoking responses from our prospects.

But have you ever had the nagging feeling that your high school statistics teacher wouldn’t approve of your testing technique? You remember terms like sample size, variables, and p-value that were important parts of your hypothesis testing, but they all seem to be missing in Marketo’s tool today.

It turns out those principles we learned are still integral to executing a successful A/B test and preventing incorrect conclusions. Luckily you don’t need a stats degree to implement these principles and enhance the tests that your organization performs. So let’s dive into how to design and interpret a more meaningful Marketo A/B test.

Your test design is the most important factor in determining whether you will get insightful information from your results. Over and over, we see the same common experimental design fallacies in tests run by marketers. Let’s take a look at what they are and how to overcome them.

Sample Size is Too Small

How large does my sample size really need to be? We get this question a lot and wish there was a definitive answer. But we would by lying to you if we said there was because it depends on how big the difference is that you want to see.

Say you want to do a simple subject line A/B test and you send to 1000 recipients.

Subject Line A: [Webinar] How to make the most of your A/B tests

Subject Line B: [Webinar] Register Now: How to make the most of your A/B tests

Half get Subject A and half get Subject B. If 6% open A and 7.4% open B, can you draw the conclusion that having a CTA “Register Now” performed better? Is the difference between A and B significant enough to declare that B is “better”? We can’t really answer that until we look at the p-value and how to get the p-value, which is covered later. For now, smaller p-values are better and in this case the p-value = 0.376 which is not good. You might think “Subject Line B still got higher number of opens, so why don’t we just go with that?” What the results are also saying is that the chances of you getting the opposite results if you ran the test again is pretty high.

If we run the test with 10,000 recipients total with the same percentages opening A and B respectively, the p-value is significantly smaller at 0.0051 which is excellent. (Scientific publication guidelines accept <0.05 and this is just marketing.) With the results from the second scenario you can confidently conclude that adding a CTA makes a difference. The combination of your target size and the difference between your two test groups determines what conclusions you can draw from your results.

Changing too many variables at once

As marketers we get excited about testing different variables. Sometimes we go overboard and test too many variables at once which leads to the failure to conclude anything. Let’s demonstrate with a landing page test.

Landing Page A: Blue button with CTA = Submit

In this case we have a question: Which button performs better? If Landing Page A has a significantly higher conversion rate than Landing Page B, what is my actionable intelligence moving forward? Unfortunately, we do not know if it is the color or the words on the button or both that was the contributing factor. (If you want to geek out this is called a confounded experiment.)

The proper way to carry this out is to break out the testing out into two rounds.

Test #1

Landing Page A: Blue button with CTA = Submit

Landing Page B: Green button with CTA = Submit

Result: Landing Page A performed significantly better.

Test #2

Result: Landing Page A performed significantly better.

Conclusion: LP with a blue button and an active CTA should be implemented.

If you vary multiple factors at once in the two test groups, you will not be able to conclude which of the variables that you changed contributed to the performance of one group over the other. Setting a series of tests to vary one variable at a time allows you to truly understand the contribution of each.

Testing without a clear question or hypothesis

Have you ever carried out an A/B test and then asked yourself “What do I do with the result? How can I apply this to future campaigns?” This confusion often occurs because you designed your test without a clear hypothesis.

Here’s an example of a subject line test with 6 groups.

A: Learn from CMOs: Engagement Strategies

B: How to effectively market to your prospects

C: Top strategies for engaging your prospects

D: Top strategies for reaching your prospects

E: Web Personalization: Reach and engage your prospects

F: Drive greater engagement this holiday season

If subject line C was declared the winner with the greatest number of clicks (albeit by a slim margin), what have we learned to apply for the next time? Also, with this many variables you will need a very large sample size to declare this result to be significant.

A better strategy would be to break out into a series of tests where we can test a single variable at a time with a clearly defined question or hypothesis.

Question #1: Does having CMO in the subject line drive more opens?

Test#1:

Subj A: Learn from CMOs: Engagement Strategies for your Marketing

Subj B: Learn Engagement Strategies for your Marketing

Question #2 Does the word “reaching” or “engaging” drive more opens?

Test #2 (Assuming CMO won test #1):

Subj A: Learn from CMOs: Top strategies for reaching your prospects

Subj B: Learn from CMOs: Top strategies for engaging your prospects

Question #3: Does mentioning “holiday season” results in a greater open rate?

Test #3 (Assuming reaching won test #2):

Subj A: Learn from CMOs: Top strategies for reaching your prospects

Subj B: Learn from CMOs: Top strategies for reaching your prospects this holiday season

Remember that it’s called an A/B test, not an A/B/C/D/E/F test. Break down your question into specific parts that can be tested in a series of A/B tests, rather than trying to get an immediate answer by testing all at once. The next time you are deciding what individual elements of a subject line will maximize engagement, you can look back at the results of these tests.

Using the Email Program A/B test results to declare a “Winner”

In the Marketo, it is really easy to set up an A/B test using the Email Program and see the results. Let’s go back to our simple subject line test for registering for a webinar.

Subject Line A: [Webinar] How to make the most of your A/B tests

Subject Line B: [Webinar] Register Now: How to make the most of your A/B tests

Say you have 50,000 leads in your target list and you choose to test 20% of your list and send the remainder the winner. That means 5,000 will get subject line A and 5,000 will get subject line B. The subject line that is declared the winner will be sent to the remaining 40,000. That sounds pretty straight forward. But (and you knew there was a but...) how is a winner determined and which one should you choose?

Marketo lets you set the winning criteria and automatically send the winner a minimum of 4 hours later. You can choose from the following:

Opens

Clicks

Clicks to Open %

Engagement Score

Custom Conversion

In this case if we choose opens, that means that the difference in the subject line is the difference in whether someone opened the email or not. Is this the behavior that matters most? In some cases that might be, but in a webinar we probably want to look at clicks instead. For example, we once saw an email that had the larger open rate also had less registrations and a 10 times higher unsubscribe rate. This led us to conclude that our message was not resonating with the target audience.

Setting the winning criteria to Clicks to Open % could also be problematic. If email A had 1000 opens and 40 clicks (4%) but email B had 200 opens and 20 clicks (10%), email B would be declared the winner even though the absolute number of people who clicked is lower.

What about setting the winning criteria to clicks? If Email A had 1000 clicks and Email B had 100 clicks, Email A would be declared the winner. But if the desired behavior is registering for the webinar and Email A had 10 people register for the webinar vs 25 for Email B, was email A really the “winner”?

So… which one should you pick?

Unfortunately you won’t know until you look at the data after the results come in. There is no way to predict. We can think of a potential situation where any of the choices above would work or not work, it will just depend on what the data says. So if you are going to declare a winner n a Marketo A/B test, we prefer to do it manually.

“When I test, I typically test on 100% of my target list. If I have an A/B test with 2 groups, I set the slider bar to 100%. That way, 50% get A and 50% get B. I do this for a number of reasons. Because, you won’t know if you have a large enough sample size until after the test. If you run 10 different tests on 1000 people and the difference is small, your results will all be inconclusive. I would rather run 1 test on 10,000 targets and get a really solid conclusion.“

When you are designing a test, ask yourself, “What am I going to do with this information? What am I going to change?” Don’t test for the sake of testing. Whatever you decide to test, ensure that the question you are asking is going to be actionable. Now that you know how to design robust A/B tests, how do you interpret those results?

II.  Testing and Interpretation of Results

Setting up the test correctly is half the story, making sure that we are drawing the correct conclusions is the other half and just as important.

Unfortunately, we cannot “declare a winner” by simply picking the test group with the most opens or clicks.  When we run a test we are saying, this small population of 1000 people is a representation of the whole universe.  It is not possible to test everyone in the whole world.  We are extrapolating that how this sample population behaves is going to predict how the rest of the world would behave.  But. . . we know that if we ran the test on 10 different sets of 1000 people, I would get slightly different results, so there is a chance albeit small, that I might have picked a sample population that is an outlier so different then the rest of the world my results could lead me astray.  This slight variation is what we need to account for by calculating a p-value.

Let’s go back to our subject line test.

If you sent a total of 1000 emails and 30 people opened email A and 31 people opened email B, could you say email B leads to more opens? The answer is no (based on the calculation of the p-value).  Just because Opens of email B is > than opens of email A doesn’t mean that if you hypothetically ran the test again you would get the same results. In this case it’s about as good as flipping a coin. You could get either result.

The real question in A/B testing is:  “Is the difference between A and B SIGNIFICANTLY different enough for you to draw the conclusion CONFIDENTLY that B is greater than A when you run the test again and again.  You want to be able to confidently say, based on the results of the test, I believe B will most likely yield more than A if I were to run the same test in the future.  Therefore, we should move forward with B.  That’s the goal.

To determine whether the difference is significant or not we look at the p-value of our test.  We are not going to go into how this value is calculated, but we will examine:

1. How to use a very simple tool to obtain the p-value
2. How to interpret the p-value
3. What it means in plain english

You can use this website to input the results of your A/B test and generate a p-value.  (This calculator was posted by @ Phillip Wild.  A/B Testing and Statistical Significance.  Great suggestion)

Let’s take a look at another example.

You run an Email A/B test separated into groups with two different button colors, green and blue for the call to action. Your question is which button color is associated with more clicks.

Green: 93 clicks on 4,000 emails delivered

Blue: 68 clicks on 4,000 emails delivered

You take the number of clicks for each group and plug them into the Calculatorhttp://www.measuringu.com/ab-calc.phpA/B test under the successes for each group. You enter 4000 into the total for each group.

The resulting two-tail p-value = .047.

It is generally accepted that a p-value of <=0.05 is considered a significant result.    The smaller the p-value, the better and more confident you can be in your results.  We can conclude that there is significantly higher number of clicks using Green vs Blue.  I am confident that if I were to run this experiment again and again, I would obtain the same result.  Therefore, I would make the recommendation to change the CTA button color to green.

What does this p-value number mean in plain english?

A p-value of 0.047 is saying is that there is a 4.7% chance that you could have obtained these results by random chance and that if you were to run this experiment again you would not see the same result.

What is so special about a p-value cut off of 0.05?

It is in fact an arbitrary cut off but is the absolute gold standard and is used in the scientific and medical community in the most highly respected peer reviewed publication.

If your p-value is slightly more than .05, say .052, don’t automatically write off the result as inconclusive. If you have the ability, test the same hypothesis again with a different or larger sample size.

Note:  When using this tool, plug in your number of successes (opens, clicks etc.) and total (number of delivered emails) for each group. Note that when using click to open ratio, you will be using number of clicks as the success and number of opens as the total, NOT the number of emails sent.

This calculator gives us the p-value of the test, and we want to look at the two-tail value specifically. The p-value of a two-tail test represents the likelihood that there is a statistically significant difference in what we are measuring between the two groups in the test, compared to when there is actually no true difference. If the p-value is smaller than .05 we can conclude that there is a 95% or more chance that there is a difference between the two statistics (open rate, clicks) and act upon that in our decisions for future marketing communications. If the p-value is above .05, then the results of the test are inconclusive. This value and interpretation allows us to stay consistent from test to test.

A key here is to not consider the test a failure if the results are inconclusive (p-value is greater than .05). Knowing that changes to certain email content or timing won’t likely have an affect on your audience is just as useful for future communication strategies. If you still feel strongly that the first experiment wasn’t enough to capture the difference in your groups’ responses, then replicate the experiment to add to the strength of your results.

Organizing your results for future use

“As a lab scientist, I was taught to keep meticulous records of every experiment that I did. My professor once said to me, if you got hit by a bus or abducted by aliens I need to be able to reproduce and interpret what you did. As a marketer you probably don’t need to be that detailed but nonetheless it’s nice to have a record of what you have done so you can refer back to but more importantly share with your colleagues. For testing marketing campaigns, I kept a google doc, excel sheet, or a collection of paper napkins (true story). “

Keep a record of what the test was, the results, and the conclusions. And don’t be afraid to share your results in a presentation once a quarter. You immediately increase the value of your hard work by sharing your findings with your organization.

Here’s an example of a test result entry:

Aug 4, 2015

Test day of the week

Target Audience: All leads with job title = Manager, Director, VP

Email A - Send on Wednesday 10 AM

# Sent = 5,000

# Opens = 624

# Clicks = 65

# of Unsubscribes = 68

Email B - Send on Sunday 10 AM

# Sent = 5,000

# Opens = 580

# Clicks = 94

# of Unsubscribes = 74

P-value (Opens) = 0.176

P-value (Clicks) = .020

P-value (# Unsubscribes) = .612

Conclusion: Emails sent on Sunday resulted in more clicks, but there was not a difference in opens or unsubscribes.

If you clearly document and organize your test results, you’ll soon have a customer engagement reference guide that’s unique to your organization.  And if you’ve designed your experiments as advised above, you’ll know that the conclusions drawn are based on sound statistical analyses of your data. Put those “fire and forget” Marketo A/B tests to rest and you’ll make your way towards optimal customer engagement.

What is your experience with Marketo’s A/B testing? Have you found any results that are interesting or unexpected?  Feel free to share your experiences with testing.

I'd like to thank Nate Hall for co-authoring and editing this blog post.

# #KREWECHATS Episode 6: Account Based Marketing

Posted by Geoff Krajeski Sep 26, 2016

See the video here: #KreweChats Episode 6: The One About ABM - YouTube

Learn each of our favorite songs!! (Playlist available here: KreweChats Favorite Songs - YouTube)

Watch our panel of Ande Kempf, Dory Viscogliosi, Julz James, Sydney Mulligan (aka SMUGS), Jenn DiMaria, Joe Reitz, and guest chatter Brent Evans discuss the ever popular topic of ABM, Account Based Marketing.

Account Based Marketing or ABM is the latest and greatest buzz word in the marketing world.  In this episode we discuss what it means to us as Marketo has just announce the release of their new ABM module that can be added on to our subscriptions.

Many of us have or will soon be getting our sneak peek at this exciting development, but wanted to bring our take to you and how this will be the path forward toward unifying Marketing and Sales teams for companies in achievement of further returns.

# Looking for topics for upcoming KreweChats

Posted by Geoff Krajeski Sep 12, 2016

To the Community at large,

The members behind KreweChats are curious to know what you want to hear us talk about next!

What topics are you stumped on?

Where could you use help?

What do you yearn to know about Marketo and how others use it?

We are extremely open to topics as submitted by the community and welcome any and all feedback!

# #KREWECHATS Episode 5: Workspaces and Lead Partitions

Posted by Jenn DiMaria Sep 12, 2016

So it seems like all the cool kids are using (or at least know about) workspaces and partitions, but there’s not a whole lot of information to help out those of us who are in dire need of seeing them in practice. And as we all know with Marketo, there are about a billion different ways to go about doing something and still get it done correctly. Today, we’re hoping to get into at least a little bit more use-case detail about both workspaces and partitions, when it’s best to use on or the other (or both), and hopefully get into more detailed Q&A.

1. What are workspaces and how do they differ from lead partitions?
2. What’s a use case (ie: why would you want a lead partition)?
3. Can anything be shared across partitions?
4. So let’s say a lead exists in two partitions and proceeds to fill out a form. Does their information get updated in both partitions?
5. Can partitions sync to multiple SFDC instances?
6. How do you deal with shared data?
7. How do companies standardize on naming, program statuses, etc considering the other division might be around the world operating as different businesses?

# Marketo Analytics Tips: How to restate historical data when FT and MT program reports are telling lies

Posted by Jessica Kao Sep 8, 2016

Are your Marketo First Touch and Multi Touch reports lying to you? The answer depends on what you did --- or didn’t do weeks or months ago when you setup and ran your marketing programs.  Getting Marketo First Touch and Multi-Touch attribution right depends on getting the right values in these Marketo native fields:

• Acquisition Program
• Acquisition Date
• Success (status within the program)
• Success Date

I see many Marketo users discover that sins of the past – setting up and running programs incorrectly – come back to bite them when it’s reporting time. The truth is, even the most diligent Marketo user will now and then miss setting up one or more of these fields correctly to get the right value. So, it’s important that you know how to restate the data when you need to get your reports dialed.

First Touch attribution helps you address the question what programs brought new names into the database that directly impacted pipeline and revenue.  Multi-touch attribution addresses what programs influenced and played a role in generating pipeline and revenue.

Let's look at how these fields impact attribution and how to restate the data.

FT Attribution - Acquisition Program and Acquisition Date

FT credit is given based on the acquisition program.  As a marketer I want to get all the credit I rightfully deserve.  In order to get FT attribution all records should have an acquisition program. (Note:  For the people that are created from sales, set their acquisition program to a specific sales generated program and make that program operational.)  This will make it easier to identify any gaps.

Setting the acquisition program isn't enough.  The date of acquisition matters and will affect whether you get FT pipeline credit. Therefore, in some cases you will also need to restate the acquisition date.

Use Case #1

The person was given an acquisition program upon entry into the database.  However, it was not the correct acquisition program.

Fix:

Change the acquisition program.  The acquisition date does not have to be changed because the date is not linked to the specific program.

Use Case #2

The person never had an acquisition program and entered your database sometime in the past.

Fix:

Change the acquisition program date and set the acquisition program.  If you set the person’s acquisition program today, the acquisition date will also be set for today.  For an accurate picture of program influence on FT pipeline for historic data, then you will also have to restate acquisition date as best you can.  With a little bit of detective work and depending on your record keeping in the past, you will be able to.  If you are in a smart list and using a single flow action, change the acquisition date first, then change the acquisition program.  The reason for this is presumably in your smart list, one of your criteria is acquisition program is empty.  Once you assign people to an acquisition program, they are no longer empty.  I know this sounds obvious, but there have been  many times where I have said “oh S@!t.” Now I have to go and find them to change the date.

MT Attribution - Success and Success Date

After you have assigned people the correct acquisition program, you should go back and check to make sure that they are in the correct progression status.  Depending on whether the progression status is a success step will impact whether this program   will get MT credit for pipeline and revenue.  The same thing goes for the success date as well.  Depending on when that person reached success in relation to the opportunity created date, will impact whether that program gets credit for MT pipeline.

If placing the person in the acquisition program automatically puts them at a success step, then the success date is also set for the day that success step was reached (probably today). If you are backfilling historical data, you will need to also change the success date.  Unlike the acquisition date, changing the success date can only happen in a smart campaign since it has to be associated with a specific program.  You can not change the success date using a single flow action in a smart list.  Here is a quick chart showing what fields can be changed via which method.

Use Case #1

Change someone’s progression status to a success status and you need to change their success date and they are not in the program already.   You will encounter this scenario if you are backfilling programs (i.e. tradeshow or events, etc) that happened in the past.

Fix:

You will need to set up two smart campaigns.  The first smart campaign is a batch campaign where it sets the acquisition program, acquisition date, and program status.  If this is a success step, then you will need a second smart campaign that is requested to set the success date.  Because setting the success date can only happen after a person is a member of the program,  use a request campaign flow step.   You cannot do this in a single smart campaign with multiple flow steps.  I tried this even with adding wait steps and it didn’t  work.

Use Case #2

You know a group of people were acquired or obtained a success by filling out a specific form, but someone forgot to put them in a program for the past year, so what acquisition date and/or success date should I use for this group of people? How do accomplish this in the most efficient way possible.

Fix:

First you need to decide how granular you want the data to be and that will change depending on how far back this specific activity happened.  Meaning, do I care that Joe Smith filled out a form in 2012 or April 2012 or April 7 2012?  Most likely, if an opportunity was created from Joe and it happened in 2013, MT credit will be assigned as long as the success date was sometime after 2012.  Plus if it’s not 100% accurate, I’m ok with that being so far in the past.  So for any filled out form activity that happened in 2012, I am ok with assigning the success date to be Dec 31, 2012.  At least I can compare year to year.

As you get to more recent activity, you might want to be able to get to a more granular view.  So for activity in the past 12-18 months I might want to state successes in the month that it happened in.  So for any activity deemed a success for a particular program, I will set the success date for the last day of the month.  For example, I want to restate people who filled out a form to download content X.  For people that performed this activity between 1/1/2016 and 1/31/2016, I will set the success date to be 1/31/2016.  In a single flow step, choose the change success flow action and use add choice, you can either go by created date if you know for certain that the date will correlate with the success activity.  Or create a smart list for each of the specific actions happening in the time frame i.e. Filled out form Jan, Filled out Form Feb etc and use a flow step add choice where if member of smart list is in Filled out form Jan, set success date to 1/31/2016.

Summary:

When restating data, make sure you have accounted for what goes into each of these four fields:

Acquisition Program

Acquisition Date

Success (status within the program)

Success Date

. . . and your FT and MT attribution reports will make you look like a hero.

Since you have spent the time restating this data, you probably never want to go through that again, so how do you set up programs to ensure that this data is being captured the right way in the first place.  Well, stay tuned for part II.  In the mean time, if you have any questions, just shoot me line.

# Everything you ever wanted to know about Marketo Landing Pages

Posted by Pierce Ujjainwalla Sep 8, 2016

One of the things that impressed me the most when I saw my first demo of Marketo 6 years ago was the ability to create my own landing pages without needing my web development team. The ability to become self-sufficient was really exciting to me since I came from IBM where making a landing page took literally 2 weeks... if I was lucky.

This post is dedicated to everything you need to know about Marketo landing pages.

Free Form vs. Guided
The first thing you need to start with is whether you want to make a free-form or a guided landing page. IMO guided is the only way to go, because Free-form pages are tough to make responsive. They are nice in the sense that you can literally drag and drop anything you want, but with that comes the tricky bit of making everything line up which can also be a bit of a pain. Anyone who has tried to make a pixel-perfect landing page with the free-form editor knows what I'm talking about. It's a lot of nudging and going into the details to set the alignment. At one point this was all we had and we made them responsive using some fancy javascript hacks, but it was not super intuitive and was easy to break.

The Guided editor is much more flexible. The variable functionality provided Marketers with a lot of flexibility in adding or removing sections, changing colours on a mass scale and making things look really good (assuming you start with a good template). It still requires some code to make the base template, but once you have that you're off to the races. IMO this is the only way to make pages now.

Advantages of using native Marketo Landing Pages

Does Marketo have the most advanced or sophisticated landing page editor out there? No. There are some amazing landing page editors by some really cool vendors. However, what Marketo does have with their editor that no one else has is the following:

1. Native Marketo Forms
I can't stress enough about the importance of using native Marketo forms. You have automatic form pre-fill for all your cookied users, which may be your single biggest conversion rate optimizer right there. You get the inferred data, you have easy access to trigger or filter based on the form submissions, and you don't need to setup any complicated API calls or new subdomains.
2. Dynamic Content
Using native Marketo landing page you can take advantage of one of Marketo's most powerful features, which is dynamic content. This allows you to segment your known universe by any data you have on them and show them images, messages or content that is specifically tailored to them. The only way to truly take advantage of this is by using native Marketo landing pages.
3. Tokens
This is a major lifesaver for marketers who can seriously improve the efficiency of their campaign builds. Using program tokens or lead tokens on your landing pages is something that is only available if your pages are... you guessed it, in Marketo!
4. Reporting
Having your landing pages live in Marketo makes it easier for you to get an overall picture of how your campaign is performing. In your program you can see how its performing overall looking at the program statuses, and then dive deeper to see the bigger picture by looking at the total views and conversion rate of your landing page. If you page lives in another system, you are not going to get a full 360 degree view without leaving Marketo and going somewhere else.

Custom Fonts
Yes it is entirely possible to use custom fonts on Marketo landing pages. It takes a bit of coding but you can apply them just like you would on any other page.

Lightbox Forms

It is also possible to do lightbox forms in Marketo landing pages using a bit of javascript code. You can still use native Marketo forms and don't need any third party software to accomplish this.

Video Backgrounds

It is also possible to do video backgrounds on Marketo landing pages. We recently published a landing page template where Marketers can grab video backgrounds off of www.coverr.co and paste them in variables that take each version of the file necessary to do a video background.

A/B Testing
Marketo has some seriously awesome A/B testing capabilities built right into their platform. You can test many pages against each other and Marketo will take care of pushing an equal amount of traffic to each page.

The Verdict
Marketo landing pages are an essential part of the overall Marketing Platform. There are many advantages of having your landing pages in Marketo, and although Marketo's landing page editor may not be the most sophisticated, with the right template the landing page editor provides all the flexibility that a Marketer would need to customize their page accordingly. The loss of functionality and reporting having your pages in another system is a major gap that Marketers need to be aware of.

What do you guys think? Do you keep your pages in Marketo or put them in another system?

By date: By tag: